Deploy SageMaker AI inference endpoints with set GPU capacity using training plans

Machine Learning Blog

This article explains how to use Amazon SageMaker AI training plans to reserve GPU capacity for inference endpoints, ensuring predictable availability for time-bound workloads.

Training plans now support inference endpoints, not just training jobs
Search for available p-family GPU capacity using search-training-plan-offerings API
Create a training plan reservation and receive an ARN for capacity reference
Deploy inference endpoints using the reserved capacity ARN in endpoint configuration
Set CapacityReservationPreference to "capacity-reservations-only" to restrict to reserved capacity
Endpoint stops serving traffic when reservation expires if using capacity-reservations-only
Update endpoints to new model versions while maintaining reserved capacity
Migrate from reserved to on-demand capacity if needed beyond reservation window
Scale endpoints within reservation limits; scaling beyond fails with validation error
Full reservation cost charged upfront regardless of actual usage duration
Deleting endpoint doesn't cancel or refund the training plan reservation

SageMaker AI training plans provide transparent, upfront pricing for predictable GPU availability during model evaluation, testing, and burst workloads.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

May 4
2026

Capacity-aware inference: Automatic instance fallback for SageMaker AI endpoints

May 18
2026

Amazon SageMaker Studio now supports GPU capacity reservation through SageMaker Flexible Training Plans

Dec 3
2024

Speed up your AI inference workloads with new NVIDIA-powered capabilities in Amazon SageMaker

May 21
2026

Amazon SageMaker AI now supports OpenAI-compatible APIs for inference endpoints

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Deploy SageMaker AI inference endpoints with set GPU capacity using training plans

Related articles