Home icon

Capacity-aware inference: Automatic instance fallback for SageMaker AI endpoints

Machine Learning Blog



This article announces capacity-aware instance pools for Amazon SageMaker AI inference endpoints, enabling automatic fallback to alternative instance types when preferred hardware is unavailable.

  • Define prioritized instance type lists; SageMaker automatically provisions on available capacity
  • Eliminates manual retries during endpoint creation, scale-out, and scale-in operations
  • Supports Single Model Endpoints, Inference Component-based endpoints, and Asynchronous Inference
  • Fleet naturally trends toward preferred hardware during scale-in and subsequent scale-out events
  • CloudWatch metrics now include InstanceType dimension for per-type monitoring and observability
  • Two optimization approaches: bring your own optimized models or use SageMaker inference recommendations
  • Weighted scaling metrics handle mixed fleets with different throughput capacities
  • Blue/green and rolling deployments supported with automatic rollback on health check failures
  • Available in all commercial AWS Regions at no additional cost

Instance pools reduce operational overhead for ML inference by automating capacity resolution and providing flexible fallback strategies without manual intervention.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

May 4
2026
Amazon SageMaker AI Now Supports Capacity-Aware Inference with Automatic Instance Fallback
Mar 24
2026
Deploy SageMaker AI inference endpoints with set GPU capacity using training plans
May 21
2026
Amazon SageMaker AI now supports OpenAI-compatible APIs for inference endpoints
Mar 19
2026
Enhanced metrics for Amazon SageMaker AI endpoints: deeper visibility for better performance

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.