Machine Learning

Masked Language Model

A training approach where random tokens in the input are replaced with a special [MASK] token and the model learns to predict the original tokens from context. This is how BERT was pre-trained.

Why It Matters

Masked language modeling enables bidirectional understanding — the model learns from both left and right context simultaneously, producing richer representations.

Example

Input: 'The [MASK] sat on the mat.' The model learns to predict 'cat' by understanding the surrounding context from both directions.

Think of it like...

Like a fill-in-the-blank test where you use surrounding clues to figure out the missing word — the more context you consider, the better your guess.

Related Terms