Back to GlossaryBusiness

Context Window

Definition

The maximum number of tokens (input plus output) that a language model can process in a single interaction, determining how much information the model can consider at once.

The context window defines the model's "working memory" — everything the model can see and reference when generating a response. Early models had small context windows (2K-4K tokens), severely limiting their ability to process long documents. Modern models have expanded dramatically: GPT-4 Turbo offers 128K tokens, Claude supports 200K tokens, and Gemini 2.5 Pro provides up to 1M tokens. Larger context windows enable processing entire books, long codebases, and extensive conversation histories. However, longer contexts increase inference cost and latency. Research has shown that models may not utilize information uniformly across the context — the "lost in the middle" phenomenon describes reduced attention to information in the middle of long contexts. Context window management is a key application design consideration, often involving summarization or RAG to work within limits.

Companies in Business

View Business companies →