Data Science

Validation Data

A subset of data used during training to tune hyperparameters and monitor model performance without touching the test set. It acts as an intermediate checkpoint between training and final evaluation.

Why It Matters

Validation data prevents you from accidentally overfitting to the test set by giving you a separate dataset to make design decisions against.

Example

Using 60% of data for training, 20% for validation (to pick the best model configuration), and 20% for final testing.

Think of it like...

Like practice exams before the real test — they help you gauge readiness and adjust your study strategy without wasting the actual exam.

Related Terms