Pruning
A model compression technique that removes unnecessary or redundant weights, neurons, or layers from a trained neural network. Like pruning a plant, it removes parts that are not contributing to overall health.
Why It Matters
Pruning can reduce model size by 50-90% with minimal accuracy loss, enabling deployment on resource-constrained devices.
Example
Removing 80% of the smallest weights in a neural network, finding that the remaining 20% of connections maintain 95% of the original model's accuracy.
Think of it like...
Like editing a draft essay — cutting redundant sentences and filler words makes it shorter and punchier without losing the core message.
Related Terms
Quantization
The process of reducing the precision of a model's numerical weights (e.g., from 32-bit to 8-bit or 4-bit), making the model smaller and faster while accepting a small trade-off in accuracy.
Knowledge Distillation
A model compression technique where a smaller 'student' model is trained to mimic the behavior of a larger 'teacher' model. The student learns not just correct answers but the teacher's nuanced probability distributions.