Artificial Intelligence

Self-Attention

A mechanism where each element in a sequence attends to all other elements to compute a representation, determining how much focus to place on each part of the input. It is the core innovation of the transformer.

Why It Matters

Self-attention is what allows transformers to understand context and relationships across an entire document, not just nearby words. It is arguably the most important mechanism in modern AI.

Example

In 'The animal didn't cross the street because it was too tired,' self-attention helps the model understand that 'it' refers to 'the animal' (not the street).

Think of it like...

Like a group discussion where each person considers what everyone else said before forming their own contribution — everyone's input influences everyone else.

Related Terms