Infrastructure

Distributed Training

Definition

The practice of training AI models across multiple GPUs or machines simultaneously to handle models too large for a single device. Distributed training techniques include data parallelism, model parallelism, and pipeline parallelism. Training frontier models requires thousands of GPUs coordinated through frameworks like DeepSpeed and Megatron.

Related Terms

No related terms linked yet.

Explore all terms →

Explore companies in this space

Infrastructure Companies

View Infrastructure companies