Machine Learning

Stochastic Gradient Descent

A variant of gradient descent that updates model parameters using a single random training example (or small batch) at each step instead of the entire dataset. It is faster and can escape local minima.

Why It Matters

SGD is the most widely used optimization algorithm in deep learning. Its randomness actually helps find better solutions than deterministic approaches.

Example

Updating a neural network's weights after seeing each individual training image, rather than computing the average error across all 1 million images first.

Think of it like...

Like adjusting your golf swing after every shot rather than waiting until the end of a round — more frequent adjustments lead to faster improvement.

Related Terms