Back to GlossaryTraining

Learning Rate

Definition

A hyperparameter that controls how much model weights are adjusted in response to the estimated error during each training step.

The learning rate is arguably the most important hyperparameter in neural network training. It determines the step size during gradient descent — how aggressively the model updates its weights based on the computed gradients. A learning rate that is too high causes the model to overshoot optimal values and potentially diverge. A learning rate that is too low results in extremely slow training and the risk of getting stuck in suboptimal solutions. Modern practice uses learning rate schedules that change the rate during training: warmup (gradually increasing from a small value), cosine decay, or step decay. Techniques like learning rate warmup have become standard for training large transformers, preventing training instability in early stages.

Companies in Training

View Training companies →