GPT (Generative Pre-trained Transformer)
Definition
A family of autoregressive language models developed by OpenAI that generate text by predicting the next token in a sequence, pre-trained on large text corpora.
GPT models use a decoder-only transformer architecture trained to predict the next token given all previous tokens. GPT-1 (2018) demonstrated the power of unsupervised pre-training followed by supervised fine-tuning. GPT-2 (2019) showed strong zero-shot performance at 1.5 billion parameters. GPT-3 (2020) at 175 billion parameters demonstrated few-shot learning and sparked the LLM revolution. GPT-4 (2023) introduced multimodal capabilities and significantly improved reasoning. The GPT approach of scaling up decoder-only transformers with more data and parameters has been adopted across the industry, inspiring models like Claude, Gemini, and Llama.
Related Terms
Foundation Model
A large AI model trained on broad data at scale that can be adapted to a wide range of downstream ta...
Large Language Model
A neural network with billions of parameters trained on massive text datasets, capable of understand...
Pre-Training
The initial phase of training a foundation model on a large, general-purpose dataset before it is fi...
Transformer
A neural network architecture introduced in 2017 that uses self-attention mechanisms to process sequ...