Amazon EKS support in Amazon SageMaker HyperPod to scale foundation model development
News
The article announces the general availability of Amazon EKS support in SageMaker HyperPod, enabling customers to run and manage their Kubernetes workloads on SageMaker HyperPod, a purpose-built infrastructure for foundation model development that reduces time to train models by up to 40%.
Specifically, the article covers:
- Integration of Amazon EKS, a managed Kubernetes service, with SageMaker HyperPod, providing benefits like automated hardware failure management and containerization capabilities.
- Features like deep health checks during cluster creation, automatic node replacement, and training resumption from checkpoints.
- Flexibility to use the HyperPod CLI or preferred tools to manage workloads, and persistent cluster environment with customization and SSM access.
- Observability through integration with CloudWatch Container Insights for monitoring node health and visualizing dashboards.
- General availability in AWS Regions where SageMaker HyperPod is available, except Europe (London).
- Links to resources like the webpage, AWS News Blog, documentation, and GitHub repository for more information.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
Sep 12
2024
2024
Introducing Amazon EKS support in Amazon SageMaker HyperPod
Sep 10
2024
2024
Amazon SageMaker HyperPod introduces Amazon EKS support
Jul 10
2025
2025
Accelerate foundation model development with one-click observability in Amazon SageMaker HyperPod
Dec 4
2024
2024
Accelerate foundation model training and fine-tuning with new Amazon SageMaker HyperPod recipes
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.