MMLU
Definition
Massive Multitask Language Understanding — a benchmark testing AI models across 57 academic subjects ranging from STEM to humanities, measuring broad knowledge and reasoning ability.
MMLU (introduced in 2021) has become one of the most cited benchmarks for evaluating large language models. It contains approximately 16,000 multiple-choice questions spanning 57 subjects including mathematics, history, law, medicine, computer science, and philosophy. The test ranges from elementary to professional difficulty levels, making it a comprehensive measure of a model's world knowledge and reasoning. Top models in 2025 score above 90% on MMLU, compared to expert human performance around 89.8%. MMLU Pro, a harder variant with more challenging questions, was introduced to maintain benchmark difficulty as models improved. While widely used, MMLU has been criticized for occasional incorrect ground-truth answers and for being solvable through pattern matching rather than deep understanding in some cases.
Related Terms
Accuracy
The proportion of correct predictions out of total predictions made by a model, the simplest and mos...
Benchmark
A standardized test or dataset used to evaluate and compare the performance of AI models on specific...
Large Language Model
A neural network with billions of parameters trained on massive text datasets, capable of understand...