Data Lineage
The tracking of data's origins, transformations, and movements throughout its lifecycle. Data lineage answers the question 'Where did this data come from and what happened to it?'
Why It Matters
Data lineage is critical for debugging, compliance, and trust. When a model produces strange results, lineage lets you trace the issue back to its source.
Example
Tracing a model prediction back through the pipeline: prediction ← model ← training data ← feature store ← ETL pipeline ← raw database ← CRM system.
Think of it like...
Like the chain of custody for evidence in a court case — every handoff and transformation is documented so you can verify the integrity of the final result.
Related Terms
Data Governance
The overall management of data availability, usability, integrity, and security in an organization. It includes policies, standards, and practices for how data is collected, stored, and used.
Data Pipeline
An automated workflow that extracts data from sources, transforms it through processing steps, and loads it into a destination for use. In ML, data pipelines ensure consistent data flow from raw sources to model training.
MLOps
Machine Learning Operations — the set of practices that combine ML, DevOps, and data engineering to deploy and maintain ML models in production reliably and efficiently.
Compliance
The process of ensuring AI systems meet regulatory requirements, industry standards, and organizational policies. AI compliance is becoming increasingly complex as regulations proliferate.