Machine Learning

Gradient Accumulation

A technique that simulates larger batch sizes by accumulating gradients over multiple forward passes before performing a single weight update. This enables large effective batch sizes on limited hardware.

Why It Matters

Gradient accumulation lets you train with large batch sizes on a single GPU, achieving training dynamics similar to expensive multi-GPU setups.

Example

Running 8 forward passes with batch size 32 and accumulating gradients before updating, giving an effective batch size of 256 on hardware that can only handle 32.

Think of it like...

Like filling a swimming pool with a garden hose — each hose-fill is small, but by accumulating many fills, you achieve the same result as a fire hose.

Related Terms