Schedule topology-aware workloads using Amazon SageMaker HyperPod task governance

Machine Learning Blog

The article introduces topology-aware scheduling for Amazon SageMaker HyperPod task governance, which helps optimize AI workload efficiency by considering network topology during job scheduling.

Reduces network latency by minimizing network hops between instances
Improves training efficiency by strategically placing workloads across network resources
Supports two scheduling methods: required and preferred topology placement
Can be implemented via Kubernetes manifest file modifications or SageMaker HyperPod CLI
Helps data scientists optimize GPU cluster performance during large language model training

The solution enables more precise control over job placement, helping organizations accelerate generative AI innovation by reducing communication overhead and improving resource utilization.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Aug 14
2025

SageMaker HyperPod now supports Topology Aware Scheduling of LLM tasks

Aug 8
2025

Amazon SageMaker HyperPod now supports continuous provisioning for enhanced cluster operations

Feb 19
2025

Best practices for Amazon SageMaker HyperPod task governance

Dec 4
2024

Task governance is now generally available for Amazon SageMaker HyperPod

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Schedule topology-aware workloads using Amazon SageMaker HyperPod task governance

Related articles