Infrastructure

Inference Endpoint

Definition

A deployed API server that hosts a trained AI model and accepts requests to generate predictions or outputs. Inference endpoints handle load balancing, auto-scaling, and latency optimization. Major providers include AWS SageMaker, Hugging Face Inference Endpoints, and Replicate, enabling developers to serve models without managing infrastructure.

Related Terms

No related terms linked yet.

Explore all terms →

Explore companies in this space

Infrastructure Companies

View Infrastructure companies