Home icon

Amazon SageMaker introduces Scale Down to Zero for AI inference to help customers save costs

News



Amazon SageMaker has introduced a new "Scale Down to Zero" feature for AI inference endpoints, designed to help customers reduce costs for running machine learning models.

  • Allows inference endpoints to automatically scale to zero instances during periods of inactivity
  • Particularly beneficial for applications with variable traffic like chatbots and content moderation systems
  • Endpoints can quickly scale back up when traffic resumes
  • Configuration possible through AWS SDK for Python, SageMaker Python SDK, or AWS CLI
  • Supports scenarios with predictable or intermittent inference traffic

The feature is now generally available across all AWS regions where SageMaker is supported, offering a cost-effective solution for managing generative AI deployments.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Dec 3
2024
Unlock cost savings with the new scale down to zero feature in SageMaker Inference
Dec 6
2024
Amazon SageMaker introduces new capabilities to accelerate scaling of Generative AI Inference
Jul 25
2024
Amazon SageMaker inference launches faster auto scaling for generative AI models
May 4
2026
Amazon SageMaker AI Now Supports Capacity-Aware Inference with Automatic Instance Fallback

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.