Amazon SageMaker HyperPod now supports continuous provisioning for Slurm-orchestrated clusters
News
This article announces continuous provisioning support for Amazon SageMaker HyperPod clusters using the Slurm orchestrator, improving efficiency for large-scale AI/ML training workloads.
- Training jobs start immediately on available instances while remaining capacity provisions in background
- Priority-based provisioning brings up controller node first, then login and worker nodes in parallel
- Failed node launches retry asynchronously; nodes added automatically as they become available
- Concurrent, non-blocking scaling across multiple instance groups eliminates blocking delays
- Enable via NodeProvisioningMode parameter set to "Continuous" in CreateCluster API
- Available in all AWS Regions supporting SageMaker HyperPod
Continuous provisioning reduces time-to-training and manual intervention for Slurm-based HyperPod clusters.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
May 27
2026
2026
Amazon SageMaker HyperPod Slurm clusters now support specifying minimum capacity requirements with continuous provisioning
Aug 8
2025
2025
Amazon SageMaker HyperPod now supports continuous provisioning for enhanced cluster operations
May 7
2026
2026
Amazon SageMaker HyperPod now supports AMI-based node lifecycle configuration for Slurm clusters
Mar 3
2026
2026
Amazon SageMaker HyperPod now supports API-driven Slurm configuration
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.