Annotation
Definition
The specific labels, tags, or metadata added to data elements during the data labeling process, providing the ground truth that supervised learning models learn from.
Annotations come in many forms depending on the task: classification labels (positive/negative), bounding boxes (object detection), pixel-level masks (semantic segmentation), entity tags (NER), dependency trees (syntax parsing), and preference rankings (RLHF). Annotation quality directly impacts model quality — noisy or incorrect annotations lead to poorly performing models (garbage in, garbage out). Annotation guidelines must be carefully designed to ensure consistency across annotators. Modern annotation tools provide specialized interfaces for different data types and include quality assurance features like reviewer workflows and agreement metrics. The cost and time required for high-quality annotation is often the bottleneck in ML projects. Active learning techniques can reduce annotation costs by selecting the most informative examples for human labeling.
Related Terms
Named Entity Recognition
An NLP task that identifies and classifies named entities in text into predefined categories such as...
Training Data
The dataset used to teach a machine learning model, consisting of examples from which the model lear...
Data Labeling
The process of adding informative tags or annotations to raw data (images, text, audio) so that mach...
Dataset
A structured collection of data organized for training, evaluating, or testing machine learning mode...