Amazon SageMaker AI Now Supports Capacity-Aware Inference with Automatic Instance Fallback
News
This article announces capacity-aware inference with automatic instance fallback for Amazon SageMaker AI endpoints.
- SageMaker AI automatically provisions from prioritized instance types when preferred capacity unavailable
- Supports Single Model Endpoints, InferenceComponent-based endpoints, and Asynchronous Inference endpoints
- Scales down by removing lowest-priority instances first, preserving preferred infrastructure
- Specify different optimized models per instance type or use SageMaker inference recommendations
- Per-instance-type CloudWatch metrics provide visibility into latency, throughput, and GPU utilization
- Available in 16 AWS regions globally
SageMaker AI now handles capacity constraints automatically, ensuring reliable endpoint creation and scaling without manual intervention.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
May 4
2026
2026
Capacity-aware inference: Automatic instance fallback for SageMaker AI endpoints
Dec 6
2024
2024
Amazon SageMaker introduces new capabilities to accelerate scaling of Generative AI Inference
Dec 11
2024
2024
Amazon SageMaker AI announces availability of P5e and G6e instances for Inference
Nov 27
2025
2025
Amazon SageMaker AI now supports Flexible Training Plans capacity for Inference
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.