Long Context
The ability of AI models to process very large amounts of input text — typically 100K tokens or more — enabling analysis of entire books, codebases, or document collections.
Why It Matters
Long context eliminates the need to chunk and summarize inputs, enabling direct analysis of complete documents and reducing information loss.
Example
Claude processing a 200K-token legal contract in its entirety, cross-referencing clauses from page 3 with definitions on page 150 — impossible with shorter context windows.
Think of it like...
Like having a desk large enough to spread out an entire newspaper versus only seeing one article at a time — more context enables better understanding.
Related Terms
Context Window
The maximum amount of text (measured in tokens) that a language model can process in a single interaction. It includes both the input prompt and the generated output. Larger context windows allow models to handle longer documents.
Token
The basic unit of text that language models process. A token can be a word, part of a word, or a punctuation mark. Text is broken into tokens before being fed into an LLM, and the model generates output one token at a time.
Retrieval-Augmented Generation
A technique that enhances LLM outputs by first retrieving relevant information from external knowledge sources and then using that information as context for generation. RAG combines the power of search with the fluency of language models.
Flash Attention
An optimized implementation of the attention mechanism that reduces memory usage and increases speed by tiling the computation and avoiding materializing the full attention matrix in memory.
Transformer
A neural network architecture introduced in 2017 that uses self-attention mechanisms to process sequential data in parallel rather than sequentially. Transformers are the foundation of modern LLMs like GPT, Claude, and Gemini.