Context Window
Definition
The maximum number of tokens (input plus output) that a language model can process in a single interaction, determining how much information the model can consider at once.
The context window defines the model's "working memory" — everything the model can see and reference when generating a response. Early models had small context windows (2K-4K tokens), severely limiting their ability to process long documents. Modern models have expanded dramatically: GPT-4 Turbo offers 128K tokens, Claude supports 200K tokens, and Gemini 2.5 Pro provides up to 1M tokens. Larger context windows enable processing entire books, long codebases, and extensive conversation histories. However, longer contexts increase inference cost and latency. Research has shown that models may not utilize information uniformly across the context — the "lost in the middle" phenomenon describes reduced attention to information in the middle of long contexts. Context window management is a key application design consideration, often involving summarization or RAG to work within limits.
Related Terms
Large Language Model
A neural network with billions of parameters trained on massive text datasets, capable of understand...
Tokenizer
A component that splits text into smaller units called tokens (words, subwords, or characters) that ...
RAG (Retrieval-Augmented Generation)
A technique that enhances LLM responses by first retrieving relevant documents from an external know...
Token
The basic unit of text that language models process, typically representing a word, subword, or char...