
Knowledge Distillation
A model compression technique where a smaller 'student' model is trained to mimic the behavior of a larger 'teacher' model. The student learns not just correct answers but the teacher's nuanced probability distributions.
Why It Matters
Distillation lets you deploy AI at a fraction of the cost and latency. A distilled model can retain 90%+ of the teacher's capability at 10% the size.
Example
OpenAI's GPT-4o mini being a distilled version of GPT-4 — smaller, faster, and cheaper while retaining most of the larger model's capabilities.
Think of it like...
Like a senior mentor training a junior colleague — the junior person cannot replicate all the senior's experience but can learn their key decision-making patterns.
Related Terms
Quantization
The process of reducing the precision of a model's numerical weights (e.g., from 32-bit to 8-bit or 4-bit), making the model smaller and faster while accepting a small trade-off in accuracy.
Pruning
A model compression technique that removes unnecessary or redundant weights, neurons, or layers from a trained neural network. Like pruning a plant, it removes parts that are not contributing to overall health.
Transfer Learning
A technique where a model trained on one task is repurposed as the starting point for a model on a different but related task. Instead of training from scratch, you leverage knowledge the model has already acquired.