Back to GlossaryEvaluation

Accuracy

Definition

The proportion of correct predictions out of total predictions made by a model, the simplest and most intuitive classification metric.

Accuracy measures how often a model is right, calculated as (true positives + true negatives) / total predictions. While intuitive, accuracy can be misleading for imbalanced datasets — a model that always predicts "not fraud" achieves 99.9% accuracy on a dataset where 0.1% of transactions are fraudulent, despite being completely useless for fraud detection. For this reason, accuracy is typically supplemented with precision, recall, F1 score, and AUC-ROC for classification tasks. In the context of LLM evaluation, accuracy is commonly reported on multiple-choice benchmarks like MMLU, where it measures the percentage of questions answered correctly. Despite its limitations, accuracy remains the default reporting metric for many benchmarks due to its simplicity and universal understanding.

Companies in Evaluation

View Evaluation companies →