Data Science

Data Lineage

The tracking of data's origins, transformations, and movements throughout its lifecycle. Data lineage answers the question 'Where did this data come from and what happened to it?'

Why It Matters

Data lineage is critical for debugging, compliance, and trust. When a model produces strange results, lineage lets you trace the issue back to its source.

Example

Tracing a model prediction back through the pipeline: prediction ← model ← training data ← feature store ← ETL pipeline ← raw database ← CRM system.

Think of it like...

Like the chain of custody for evidence in a court case — every handoff and transformation is documented so you can verify the integrity of the final result.

Related Terms