Artificial Intelligence

Retrieval Evaluation

Methods for measuring how well a retrieval system finds relevant documents. Key metrics include recall at K, mean reciprocal rank, and normalized discounted cumulative gain.

Why It Matters

Retrieval evaluation is the overlooked half of RAG quality. You can have a perfect LLM, but if retrieval returns wrong documents, answers will be wrong.

Example

Testing a RAG system on 500 questions where you know the correct source documents, measuring that it retrieves the right document in the top 5 results 85% of the time.

Think of it like...

Like grading a research assistant on whether they pulled the right files from the cabinet, before evaluating what they wrote with those files.

Related Terms