F1 Score
Definition
The harmonic mean of precision and recall, providing a single metric that balances both the completeness and correctness of a model's predictions.
F1 Score is one of the most widely used classification metrics, particularly valuable when classes are imbalanced. It is calculated as 2 * (precision * recall) / (precision + recall), ranging from 0 (worst) to 1 (perfect). The harmonic mean ensures that the F1 score is low if either precision or recall is low, unlike a simple average that could mask poor performance in one metric. Micro-F1 aggregates true positives and false positives across all classes, while macro-F1 computes F1 per class and averages them equally. F1 is standard for NLP tasks like named entity recognition, text classification, and question answering. When positive cases are rare (fraud detection, disease screening), F1 provides a more informative picture than accuracy alone, which can be misleadingly high simply by predicting the majority class.
Related Terms
Accuracy
The proportion of correct predictions out of total predictions made by a model, the simplest and mos...
Precision
The proportion of positive predictions that are actually correct, measuring how reliable a model's p...
Benchmark
A standardized test or dataset used to evaluate and compare the performance of AI models on specific...
Recall
The proportion of actual positive cases that the model correctly identifies, measuring how completel...