Amazon SageMaker AI in 2025, a year in review part 1: Flexible Training Plans and improvements to price performance for inference workloads

Machine Learning Blog

This article reviews Amazon SageMaker AI's 2025 improvements across capacity, price performance, observability, and usability, focusing on training and inference enhancements.

Flexible Training Plans now support inference endpoints with transparent upfront pricing for GPU capacity reservations
Inference components add Multi-AZ high availability for fault tolerance across Availability Zones
Parallel scaling deploys multiple model copies simultaneously, reducing response time to traffic surges
NVMe caching accelerates model scaling and reduces inference latency during traffic spikes
EAGLE-3 speculative decoding predicts tokens from hidden layers, improving throughput without quality loss
Dynamic multi-adapter inference loads LoRA adapters on-demand, optimizing resource utilization
Intelligent memory management automatically evicts least popular adapters when capacity reached

These enhancements make generative AI inference more accessible, reliable, and cost-effective for production workloads by addressing GPU availability, low-latency scaling, and multi-model deployment complexity.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Feb 20
2026

Amazon SageMaker AI in 2025, a year in review part 2: Improved observability and enhanced features for SageMaker AI model customization and hosting

Nov 27
2025

Amazon SageMaker AI now supports Flexible Training Plans capacity for Inference

Jan 14
2026

Transform AI development with new Amazon SageMaker AI model customization and large-scale training capabilities

Jul 9
2024

Achieve up to ~2x higher throughput while reducing costs by up to ~50% for generative AI inference on Amazon SageMaker with the new inference optimization toolkit – Part 2

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Amazon SageMaker AI in 2025, a year in review part 1: Flexible Training Plans and improvements to price performance for inference workloads

Related articles