Infrastructure
Quantization
Definition
“
The process of reducing the precision of model weights (e.g., from 32-bit to 4-bit) to decrease model size and inference cost while maintaining acceptable accuracy. Quantization enables large language models to run on consumer GPUs and mobile devices, democratizing access to powerful AI.
”
Related Terms
No related terms linked yet.
Explore all terms →