Infrastructure

Quantization

Definition

The process of reducing the precision of model weights (e.g., from 32-bit to 4-bit) to decrease model size and inference cost while maintaining acceptable accuracy. Quantization enables large language models to run on consumer GPUs and mobile devices, democratizing access to powerful AI.

Related Terms

No related terms linked yet.

Explore all terms →

Explore companies in this space

Infrastructure Companies

View Infrastructure companies