BM25
Best Matching 25 — a widely used ranking function for keyword-based information retrieval. BM25 scores documents based on query term frequency, document length, and corpus statistics.
Why It Matters
BM25 remains surprisingly competitive even in the era of neural search. It is fast, interpretable, and requires no training — making it an essential baseline.
Example
Elasticsearch using BM25 to rank documents for the query 'machine learning optimization,' boosting documents that use these terms frequently and penalizing very long documents.
Think of it like...
Like a librarian who recommends books based on how often they mention your topic, with adjustments for book length — a short book mentioning your topic 10 times is more relevant than a 1000-page book mentioning it 10 times.
Related Terms
Hybrid Search
A search approach that combines keyword-based (lexical) search with semantic (vector) search to get the benefits of both — exact matching for specific terms and meaning-based matching for conceptual queries.
Semantic Search
Search that understands the meaning and intent behind a query rather than just matching keywords. It uses embeddings to find results that are conceptually related even if they use different words.
TF-IDF
Term Frequency-Inverse Document Frequency — a statistical measure that evaluates how important a word is to a document within a collection. Words frequent in one document but rare across documents score high.