Unlock cost savings with the new scale down to zero feature in SageMaker Inference
Machine Learning Blog
AWS has announced a new feature for SageMaker inference endpoints that allows scaling down to zero instances, providing significant cost savings for machine learning deployments. Key highlights of the new scale to zero feature include:
- Ability to automatically scale inference endpoints to zero instances during periods of inactivity
- Supports three primary use cases:
- Predictable traffic patterns
- Sporadic or variable traffic
- Development and testing environments
- Performance metrics for Llama3 models show:
- Scale-in time: 25 minutes total
- Scale-out time: Approximately 5-6 minutes
- Requires configuring managed instance scaling and setting minimum instances to zero
- Can be implemented using target tracking and step scaling policies
The feature helps organizations optimize machine learning infrastructure costs by closely matching compute resources to actual usage needs, especially for endpoints with inconsistent traffic patterns.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
2024
2024
2024
2022
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.