Home icon

Amazon SageMaker AI in 2025, a year in review part 1: Flexible Training Plans and improvements to price performance for inference workloads

Machine Learning Blog



This article reviews Amazon SageMaker AI's 2025 improvements across capacity, price performance, observability, and usability, focusing on training and inference enhancements.

  • Flexible Training Plans now support inference endpoints with transparent upfront pricing for GPU capacity reservations
  • Inference components add Multi-AZ high availability for fault tolerance across Availability Zones
  • Parallel scaling deploys multiple model copies simultaneously, reducing response time to traffic surges
  • NVMe caching accelerates model scaling and reduces inference latency during traffic spikes
  • EAGLE-3 speculative decoding predicts tokens from hidden layers, improving throughput without quality loss
  • Dynamic multi-adapter inference loads LoRA adapters on-demand, optimizing resource utilization
  • Intelligent memory management automatically evicts least popular adapters when capacity reached

These enhancements make generative AI inference more accessible, reliable, and cost-effective for production workloads by addressing GPU availability, low-latency scaling, and multi-model deployment complexity.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Feb 20
2026
Amazon SageMaker AI in 2025, a year in review part 2: Improved observability and enhanced features for SageMaker AI model customization and hosting
Nov 27
2025
Amazon SageMaker AI now supports Flexible Training Plans capacity for Inference
Jan 14
2026
Transform AI development with new Amazon SageMaker AI model customization and large-scale training capabilities
Jul 9
2024
Achieve up to ~2x higher throughput while reducing costs by up to ~50% for generative AI inference on Amazon SageMaker with the new inference optimization toolkit – Part 2

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.