Home icon
Deploy SageMaker AI inference endpoints with set GPU capacity using training plans

Machine Learning Blog



This article explains how to use Amazon SageMaker AI training plans to reserve GPU capacity for inference endpoints, ensuring predictable availability for time-bound workloads.

  • Training plans now support inference endpoints, not just training jobs
  • Search for available p-family GPU capacity using search-training-plan-offerings API
  • Create a training plan reservation and receive an ARN for capacity reference
  • Deploy inference endpoints using the reserved capacity ARN in endpoint configuration
  • Set CapacityReservationPreference to "capacity-reservations-only" to restrict to reserved capacity
  • Endpoint stops serving traffic when reservation expires if using capacity-reservations-only
  • Update endpoints to new model versions while maintaining reserved capacity
  • Migrate from reserved to on-demand capacity if needed beyond reservation window
  • Scale endpoints within reservation limits; scaling beyond fails with validation error
  • Full reservation cost charged upfront regardless of actual usage duration
  • Deleting endpoint doesn't cancel or refund the training plan reservation

SageMaker AI training plans provide transparent, upfront pricing for predictable GPU availability during model evaluation, testing, and burst workloads.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.