Data Augmentation
Definition
Techniques for artificially expanding training datasets by creating modified versions of existing data, such as rotating images, adding noise, or paraphrasing text.
Data augmentation helps prevent overfitting and improves model robustness by increasing the effective size and diversity of training data without collecting new samples. In computer vision, common augmentations include random cropping, flipping, rotation, color jittering, and cutout. Advanced techniques like CutMix and MixUp blend multiple images together. In NLP, augmentations include synonym replacement, back-translation, random insertion, and paraphrasing with AI models. For speech, techniques include speed perturbation, noise injection, and SpecAugment. Data augmentation is especially valuable when labeled data is limited or expensive to obtain. Modern self-supervised learning can be viewed as a form of data augmentation where different views of the same data are created for contrastive learning.
Related Terms
Regularization
A set of techniques used to prevent overfitting by adding constraints or penalties to the model duri...
Overfitting
When a model learns the training data too well, including its noise and peculiarities, and fails to ...
Training Data
The dataset used to teach a machine learning model, consisting of examples from which the model lear...
Synthetic Data
Artificially generated data created by algorithms or AI models rather than collected from real-world...