Machine Learning

Pruning

A model compression technique that removes unnecessary or redundant weights, neurons, or layers from a trained neural network. Like pruning a plant, it removes parts that are not contributing to overall health.

Why It Matters

Pruning can reduce model size by 50-90% with minimal accuracy loss, enabling deployment on resource-constrained devices.

Example

Removing 80% of the smallest weights in a neural network, finding that the remaining 20% of connections maintain 95% of the original model's accuracy.

Think of it like...

Like editing a draft essay — cutting redundant sentences and filler words makes it shorter and punchier without losing the core message.

Related Terms

Quantization

The process of reducing the precision of a model's numerical weights (e.g., from 32-bit to 8-bit or 4-bit), making the model smaller and faster while accepting a small trade-off in accuracy.

Knowledge Distillation

A model compression technique where a smaller 'student' model is trained to mimic the behavior of a larger 'teacher' model. The student learns not just correct answers but the teacher's nuanced probability distributions.

Back to Glossary