Data Science

Training Data

The dataset used to teach a machine learning model. It contains examples (and often labels) that the model learns patterns from during the training process. The quality and quantity of training data directly impact model performance.

Why It Matters

Garbage in, garbage out — training data quality is often the single biggest factor in model success. Biased or incomplete data leads to biased or unreliable models.

Example

ImageNet, a dataset of 14 million labeled images across 20,000 categories, used to train many breakthrough computer vision models.

Think of it like...

Like the textbooks and practice problems a student uses to learn — better study materials lead to better understanding and test performance.

Related Terms