Back to GlossaryTraining

Gradient Descent

Definition

An optimization algorithm that iteratively adjusts model parameters in the direction that most reduces the loss function, guided by the gradient (slope) of the loss.

Gradient descent is the optimization backbone of virtually all neural network training. The basic idea is simple: compute the gradient of the loss with respect to each parameter, then take a step in the opposite direction (downhill). Stochastic gradient descent (SGD) computes gradients on small batches of data rather than the full dataset, adding noise that often helps escape local minima. Modern variants like Adam, AdaGrad, and RMSprop adaptively adjust learning rates per parameter, converging faster and more reliably. The learning rate controls step size — too large causes divergence, too small causes slow convergence. Gradient descent, combined with backpropagation, has scaled to train models with hundreds of billions of parameters across thousands of GPUs.

Companies in Training

View Training companies →