Hyperparameter
Definition
A configuration value set before training begins that controls the training process itself, as opposed to model parameters (weights) that are learned from data.
Hyperparameters are the settings that define how a model trains and are not updated by the learning algorithm itself. Common hyperparameters include learning rate, batch size, number of layers, number of attention heads, dropout rate, weight decay, and number of training epochs. Choosing good hyperparameters is critical for model performance and often requires extensive experimentation. Techniques for hyperparameter optimization include grid search, random search, Bayesian optimization, and population-based training. The distinction matters: model parameters (weights and biases) are learned automatically through gradient descent, while hyperparameters are chosen by engineers or through automated search. Modern LLM training involves hundreds of hyperparameters, and their tuning can meaningfully impact final model quality.
Related Terms
Learning Rate
A hyperparameter that controls how much model weights are adjusted in response to the estimated erro...
Epoch
One complete pass through the entire training dataset during model training.
Regularization
A set of techniques used to prevent overfitting by adding constraints or penalties to the model duri...
Batch Size
The number of training examples processed together in one forward and backward pass before the model...