Amazon SageMaker introduces Scale Down to Zero for AI inference to help customers save costs
News
Amazon SageMaker has introduced a new "Scale Down to Zero" feature for AI inference endpoints, designed to help customers reduce costs for running machine learning models.
- Allows inference endpoints to automatically scale to zero instances during periods of inactivity
- Particularly beneficial for applications with variable traffic like chatbots and content moderation systems
- Endpoints can quickly scale back up when traffic resumes
- Configuration possible through AWS SDK for Python, SageMaker Python SDK, or AWS CLI
- Supports scenarios with predictable or intermittent inference traffic
The feature is now generally available across all AWS regions where SageMaker is supported, offering a cost-effective solution for managing generative AI deployments.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
Dec 3
2024
2024
Unlock cost savings with the new scale down to zero feature in SageMaker Inference
Dec 6
2024
2024
Amazon SageMaker introduces new capabilities to accelerate scaling of Generative AI Inference
Jul 25
2024
2024
Amazon SageMaker inference launches faster auto scaling for generative AI models
May 4
2026
2026
Amazon SageMaker AI Now Supports Capacity-Aware Inference with Automatic Instance Fallback
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.