Back to GlossaryEvaluation

F1 Score

Definition

The harmonic mean of precision and recall, providing a single metric that balances both the completeness and correctness of a model's predictions.

F1 Score is one of the most widely used classification metrics, particularly valuable when classes are imbalanced. It is calculated as 2 * (precision * recall) / (precision + recall), ranging from 0 (worst) to 1 (perfect). The harmonic mean ensures that the F1 score is low if either precision or recall is low, unlike a simple average that could mask poor performance in one metric. Micro-F1 aggregates true positives and false positives across all classes, while macro-F1 computes F1 per class and averages them equally. F1 is standard for NLP tasks like named entity recognition, text classification, and question answering. When positive cases are rare (fraud detection, disease screening), F1 provides a more informative picture than accuracy alone, which can be misleadingly high simply by predicting the majority class.

Companies in Evaluation

View Evaluation companies →