Sparse Retrieval
Information retrieval using traditional keyword matching and term frequency methods (like BM25). Called 'sparse' because document representations have mostly zero values.
Why It Matters
Sparse retrieval remains the backbone of search engines. It excels at exact matching and handles specific terms (product IDs, names) better than dense retrieval.
Example
Using BM25 to find documents containing the exact phrase 'error code 404' — dense retrieval might find semantically related errors, but sparse finds the exact match.
Think of it like...
Like looking up a specific word in a dictionary — you need an exact match, not a synonym or related concept.
Related Terms
BM25
Best Matching 25 — a widely used ranking function for keyword-based information retrieval. BM25 scores documents based on query term frequency, document length, and corpus statistics.
Hybrid Search
A search approach that combines keyword-based (lexical) search with semantic (vector) search to get the benefits of both — exact matching for specific terms and meaning-based matching for conceptual queries.
Dense Retrieval
Information retrieval using learned vector embeddings to find semantically similar documents. Called 'dense' because document representations are dense numerical vectors with no zero values.