Capacity-aware inference: Automatic instance fallback for SageMaker AI endpoints

Machine Learning Blog

This article announces capacity-aware instance pools for Amazon SageMaker AI inference endpoints, enabling automatic fallback to alternative instance types when preferred hardware is unavailable.

Define prioritized instance type lists; SageMaker automatically provisions on available capacity
Eliminates manual retries during endpoint creation, scale-out, and scale-in operations
Supports Single Model Endpoints, Inference Component-based endpoints, and Asynchronous Inference
Fleet naturally trends toward preferred hardware during scale-in and subsequent scale-out events
CloudWatch metrics now include InstanceType dimension for per-type monitoring and observability
Two optimization approaches: bring your own optimized models or use SageMaker inference recommendations
Weighted scaling metrics handle mixed fleets with different throughput capacities
Blue/green and rolling deployments supported with automatic rollback on health check failures
Available in all commercial AWS Regions at no additional cost

Instance pools reduce operational overhead for ML inference by automating capacity resolution and providing flexible fallback strategies without manual intervention.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

May 4
2026

Amazon SageMaker AI Now Supports Capacity-Aware Inference with Automatic Instance Fallback

Mar 24
2026

Deploy SageMaker AI inference endpoints with set GPU capacity using training plans

Jun 18
2026

Amazon SageMaker AI Announces New observability capability For Inference Endpoints

Mar 19
2026

Enhanced metrics for Amazon SageMaker AI endpoints: deeper visibility for better performance

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Capacity-aware inference: Automatic instance fallback for SageMaker AI endpoints

Related articles