Amazon SageMaker HyperPod now supports automatic Slurm topology management
News
This article announces automatic Slurm topology management for Amazon SageMaker HyperPod, which optimizes network configuration for distributed training clusters.
- Automatically selects optimal network topology based on GPU instance types
- Dynamically maintains topology as cluster scales or nodes are replaced
- Supports tree topology for hierarchical interconnects like ml.p5 instances
- Supports block topology for uniform high-bandwidth connectivity like ml.p6e
- Handles mixed instance type clusters with compatible topology selection
- Enabled by default with no manual configuration required
- Available in all AWS Regions supporting SageMaker HyperPod
This feature improves distributed training performance by automatically optimizing GPU-to-GPU communication and NCCL operations without manual topology management.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
Mar 3
2026
2026
Amazon SageMaker HyperPod now supports API-driven Slurm configuration
Mar 25
2026
2026
Amazon SageMaker HyperPod now supports continuous provisioning for Slurm-orchestrated clusters
May 7
2026
2026
Amazon SageMaker HyperPod now supports AMI-based node lifecycle configuration for Slurm clusters
Mar 26
2025
2025
Announcing multi-head node support in Slurm for Amazon SageMaker HyperPod clusters
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.