Machine Learning

Self-Supervised Learning

A training approach where the model generates its own labels from the data, typically by masking or hiding parts of the input and learning to predict them. No human-annotated labels are needed.

Why It Matters

Self-supervised learning enabled the training of foundation models on internet-scale data. It eliminated the bottleneck of manual labeling for pre-training.

Example

BERT masking random words in sentences and learning to predict them, or GPT learning to predict the next word — both create their own training signal from raw text.

Think of it like...

Like a student who covers parts of a textbook page and quizzes themselves on the hidden content — they create their own learning exercises from the material.

Related Terms