Amazon SageMaker introduces Scale Down to Zero for AI inference to help customers save costs

News

Amazon SageMaker has introduced a new "Scale Down to Zero" feature for AI inference endpoints, designed to help customers reduce costs for running machine learning models.

Allows inference endpoints to automatically scale to zero instances during periods of inactivity
Particularly beneficial for applications with variable traffic like chatbots and content moderation systems
Endpoints can quickly scale back up when traffic resumes
Configuration possible through AWS SDK for Python, SageMaker Python SDK, or AWS CLI
Supports scenarios with predictable or intermittent inference traffic

The feature is now generally available across all AWS regions where SageMaker is supported, offering a cost-effective solution for managing generative AI deployments.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Dec 3
2024

Unlock cost savings with the new scale down to zero feature in SageMaker Inference

Dec 6
2024

Amazon SageMaker introduces new capabilities to accelerate scaling of Generative AI Inference

Jul 25
2024

Amazon SageMaker inference launches faster auto scaling for generative AI models

May 4
2026

Amazon SageMaker AI Now Supports Capacity-Aware Inference with Automatic Instance Fallback

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Amazon SageMaker introduces Scale Down to Zero for AI inference to help customers save costs

Related articles