Home icon

Amazon SageMaker HyperPod Slurm clusters now support specifying minimum capacity requirements with continuous provisioning

News



This article announces that Amazon SageMaker HyperPod now supports minimum capacity requirements (MinCount) for Slurm clusters with continuous provisioning.

  • MinCount specifies minimum instances required before cluster becomes available for job scheduling
  • Particularly useful for distributed training frameworks like PyTorch FSDP and Megatron-LM
  • Guarantees baseline GPU count for SLA and cost-efficiency targets
  • Set MinInstanceCount via CreateCluster or UpdateCluster API requests
  • Instance group remains in Creating status until threshold met, then transitions to InService
  • System automatically rolls back if MinCount cannot be satisfied within 3 hours
  • Available in all AWS Regions supporting Amazon SageMaker HyperPod

MinCount provides greater control over cluster availability for distributed training workloads requiring guaranteed minimum node capacity.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Mar 25
2026
Amazon SageMaker HyperPod now supports continuous provisioning for Slurm-orchestrated clusters
May 7
2026
Amazon SageMaker HyperPod now supports AMI-based node lifecycle configuration for Slurm clusters
Aug 8
2025
Amazon SageMaker HyperPod now supports continuous provisioning for enhanced cluster operations
Apr 23
2026
Amazon SageMaker HyperPod now supports automatic Slurm topology management

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.