Perplexity
Definition
A metric that measures how well a language model predicts text, calculated as the exponential of the average negative log-likelihood per token — lower perplexity indicates better prediction.
Perplexity is the most fundamental intrinsic evaluation metric for language models. Intuitively, it measures how "surprised" the model is by real text — a perplexity of 10 means the model is, on average, as uncertain as if choosing between 10 equally likely options for each next token. Lower perplexity indicates the model assigns higher probability to the actual text and therefore has a better understanding of language patterns. Perplexity is useful for comparing models during development, detecting distribution shift (test data that differs from training), and tracking improvement across model generations. However, perplexity alone does not fully capture model quality — a model can have low perplexity while still generating repetitive or unhelpful text. Modern evaluation increasingly relies on task-specific benchmarks and human preference ratings alongside perplexity.
Related Terms
Loss Function
A mathematical function that measures how far a model's predictions are from the actual target value...
Large Language Model
A neural network with billions of parameters trained on massive text datasets, capable of understand...
Benchmark
A standardized test or dataset used to evaluate and compare the performance of AI models on specific...