Back to GlossaryArchitecture

GPT (Generative Pre-trained Transformer)

Definition

A family of autoregressive language models developed by OpenAI that generate text by predicting the next token in a sequence, pre-trained on large text corpora.

GPT models use a decoder-only transformer architecture trained to predict the next token given all previous tokens. GPT-1 (2018) demonstrated the power of unsupervised pre-training followed by supervised fine-tuning. GPT-2 (2019) showed strong zero-shot performance at 1.5 billion parameters. GPT-3 (2020) at 175 billion parameters demonstrated few-shot learning and sparked the LLM revolution. GPT-4 (2023) introduced multimodal capabilities and significantly improved reasoning. The GPT approach of scaling up decoder-only transformers with more data and parameters has been adopted across the industry, inspiring models like Claude, Gemini, and Llama.

Companies in Architecture

View Architecture companies →