Infrastructure
Inference Endpoint
Definition
“
A deployed API server that hosts a trained AI model and accepts requests to generate predictions or outputs. Inference endpoints handle load balancing, auto-scaling, and latency optimization. Major providers include AWS SageMaker, Hugging Face Inference Endpoints, and Replicate, enabling developers to serve models without managing infrastructure.
”
Related Terms
No related terms linked yet.
Explore all terms →