Amazon SageMaker AI Now Supports Capacity-Aware Inference with Automatic Instance Fallback

News

This article announces capacity-aware inference with automatic instance fallback for Amazon SageMaker AI endpoints.

SageMaker AI automatically provisions from prioritized instance types when preferred capacity unavailable
Supports Single Model Endpoints, InferenceComponent-based endpoints, and Asynchronous Inference endpoints
Scales down by removing lowest-priority instances first, preserving preferred infrastructure
Specify different optimized models per instance type or use SageMaker inference recommendations
Per-instance-type CloudWatch metrics provide visibility into latency, throughput, and GPU utilization
Available in 16 AWS regions globally

SageMaker AI now handles capacity constraints automatically, ensuring reliable endpoint creation and scaling without manual intervention.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

May 4
2026

Capacity-aware inference: Automatic instance fallback for SageMaker AI endpoints

Dec 6
2024

Amazon SageMaker introduces new capabilities to accelerate scaling of Generative AI Inference

Dec 11
2024

Amazon SageMaker AI announces availability of P5e and G6e instances for Inference

Nov 27
2025

Amazon SageMaker AI now supports Flexible Training Plans capacity for Inference

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Amazon SageMaker AI Now Supports Capacity-Aware Inference with Automatic Instance Fallback

Related articles