Pre-Training
Definition
The initial phase of training a foundation model on a large, general-purpose dataset before it is fine-tuned for specific tasks.
Pre-training is the computationally expensive first stage of building a large AI model. For language models, pre-training typically involves predicting the next token on trillions of tokens of text from books, websites, code, and other sources. This phase can cost millions of dollars and take weeks or months on thousands of GPUs. During pre-training, the model learns grammar, facts, reasoning patterns, and broad world knowledge. The resulting pre-trained model is a general-purpose system that can then be efficiently adapted through fine-tuning. Pre-training data quality, scale, and composition are critical factors in model capability. Companies closely guard their pre-training data recipes as competitive advantages.
Related Terms
Fine-Tuning
The process of taking a pre-trained model and further training it on a smaller, task-specific datase...
Foundation Model
A large AI model trained on broad data at scale that can be adapted to a wide range of downstream ta...
Large Language Model
A neural network with billions of parameters trained on massive text datasets, capable of understand...
Training Data
The dataset used to teach a machine learning model, consisting of examples from which the model lear...