Back to GlossaryTraining

Data Augmentation

Definition

Techniques for artificially expanding training datasets by creating modified versions of existing data, such as rotating images, adding noise, or paraphrasing text.

Data augmentation helps prevent overfitting and improves model robustness by increasing the effective size and diversity of training data without collecting new samples. In computer vision, common augmentations include random cropping, flipping, rotation, color jittering, and cutout. Advanced techniques like CutMix and MixUp blend multiple images together. In NLP, augmentations include synonym replacement, back-translation, random insertion, and paraphrasing with AI models. For speech, techniques include speed perturbation, noise injection, and SpecAugment. Data augmentation is especially valuable when labeled data is limited or expensive to obtain. Modern self-supervised learning can be viewed as a form of data augmentation where different views of the same data are created for contrastive learning.

Companies in Training

View Training companies →