AI Fundamentals

Context Window

Last updated: February 16, 2026

The context window is the maximum number of tokens a large language model can process in a single interaction, encompassing both the input prompt and the generated output. It defines the boundary of what the model can "see" and reason about at any given moment.

How It Works

Every LLM has a fixed context window size determined by its architecture and training. When you send a request, the system prompt, conversation history, user message, and any injected context (such as retrieved documents) all consume tokens from this window. The model's response also counts against it. If the total exceeds the limit, older content must be truncated or summarized to fit.

Context window sizes vary significantly across models. Earlier models offered 4,000 or 8,000 tokens, while modern models support 128,000 tokens or more. Larger windows allow the model to consider more information simultaneously, but they also increase computational cost and latency.

Why It Matters

The context window is one of the most practical constraints in AI application design. It determines how much conversation history an agent can remember, how many documents it can analyze at once, and how detailed its instructions can be. Running out of context mid-conversation can cause the model to lose track of earlier instructions or forget important details.

In Practice

When deploying an AI assistant, managing the context window is critical. Strategies include summarizing older conversation turns, using retrieval-augmented generation to fetch only relevant information on demand, and keeping system prompts concise. Understanding your model's context window helps you design interactions that stay within limits while delivering high-quality responses. Monitoring token usage also helps control API costs, since most providers charge per token processed.