Annotation
The process of adding labels, tags, or metadata to raw data to make it suitable for supervised machine learning. Annotation can involve labeling images, transcribing audio, or tagging text.
Why It Matters
Annotation quality directly determines model quality. High-quality annotations are expensive but essential — they are the ground truth your model learns from.
Example
Annotators drawing bounding boxes around every pedestrian in thousands of dashcam frames, labeling each with attributes like 'walking,' 'standing,' or 'crossing.'
Think of it like...
Like an art teacher who marks up student drawings with corrections and guidance — the annotations show the model what 'correct' looks like.
Related Terms
Data Labeling
The process of assigning meaningful tags or annotations to raw data so it can be used for supervised learning. Labels tell the model what the correct answer should be for each training example.
Training Data
The dataset used to teach a machine learning model. It contains examples (and often labels) that the model learns patterns from during the training process. The quality and quantity of training data directly impact model performance.
Crowdsourcing
Using a large group of distributed workers (often through platforms like Amazon Mechanical Turk or Scale AI) to perform data annotation and labeling tasks.
Active Learning
A training strategy where the model identifies the most informative unlabeled examples and requests human labels only for those. This minimizes labeling effort by focusing on the examples that matter most.