Edge AI
Definition
Running AI models directly on local devices (phones, IoT sensors, vehicles) rather than in the cloud, enabling real-time processing without internet connectivity.
Edge AI brings inference to the device where data is generated, eliminating network latency and cloud dependency. This is critical for applications requiring real-time responses (autonomous vehicles), operating in connectivity-limited environments (remote sensors), or handling sensitive data that shouldn't leave the device (medical devices, security cameras). Running large models on edge hardware requires aggressive optimization through quantization (reducing precision from 32-bit to 8-bit or even 4-bit), pruning, and specialized model architectures designed for efficiency. Apple's Neural Engine, Qualcomm's AI Engine, and Google's Edge TPU are hardware solutions designed for on-device AI. As models become more efficient and edge hardware more powerful, an increasing share of AI inference is moving from cloud to edge.
Related Terms
Cloud AI
AI services and infrastructure provided through cloud computing platforms, allowing organizations to...
Distillation
A technique where a smaller "student" model is trained to replicate the behavior of a larger "teache...
Inference
The process of using a trained AI model to generate predictions or outputs on new data, as opposed t...
Latency
The time delay between sending a request to an AI model and receiving the first response, typically ...