Large Language Model
Definition
A neural network with billions of parameters trained on massive text datasets, capable of understanding and generating human language with remarkable fluency.
Large Language Models (LLMs) are the driving force behind the current AI revolution. Models like GPT-4, Claude, Gemini, and Llama are trained on trillions of tokens of text using the transformer architecture. They predict the next token in a sequence, and through this simple objective, they develop sophisticated capabilities including reasoning, coding, translation, and creative writing. LLMs are typically pre-trained on general text and then fine-tuned with instruction tuning and RLHF to follow user instructions safely. The scale of these models ranges from billions to potentially trillions of parameters, requiring massive GPU clusters for training. LLMs have become the foundation for chatbots, coding assistants, search engines, and AI agents.
Related Terms
Foundation Model
A large AI model trained on broad data at scale that can be adapted to a wide range of downstream ta...
GPT (Generative Pre-trained Transformer)
A family of autoregressive language models developed by OpenAI that generate text by predicting the ...
Tokenizer
A component that splits text into smaller units called tokens (words, subwords, or characters) that ...
Transformer
A neural network architecture introduced in 2017 that uses self-attention mechanisms to process sequ...