Machine Learning

Quantization-Aware Training

Training a model while simulating the effects of quantization, so the model learns to maintain accuracy even when weights are later reduced to lower precision.

Why It Matters

QAT produces quantized models with minimal accuracy loss — much better than quantizing after training, especially for aggressive compression (4-bit, 2-bit).

Example

Training a model where forward passes simulate INT8 precision, teaching the model to maintain accuracy within the constraints of reduced precision from the start.

Think of it like...

Like a musician who practices on a small stage before performing there — they learn to adapt their performance to the constraints rather than being surprised by them.

Related Terms