Back to GlossaryEvaluation

MMLU

Definition

Massive Multitask Language Understanding — a benchmark testing AI models across 57 academic subjects ranging from STEM to humanities, measuring broad knowledge and reasoning ability.

MMLU (introduced in 2021) has become one of the most cited benchmarks for evaluating large language models. It contains approximately 16,000 multiple-choice questions spanning 57 subjects including mathematics, history, law, medicine, computer science, and philosophy. The test ranges from elementary to professional difficulty levels, making it a comprehensive measure of a model's world knowledge and reasoning. Top models in 2025 score above 90% on MMLU, compared to expert human performance around 89.8%. MMLU Pro, a harder variant with more challenging questions, was introduced to maintain benchmark difficulty as models improved. While widely used, MMLU has been criticized for occasional incorrect ground-truth answers and for being solvable through pattern matching rather than deep understanding in some cases.

Companies in Evaluation

View Evaluation companies →