Manage Amazon SageMaker HyperPod clusters using the HyperPod CLI and SDK
Machine Learning Blog
This article explains how to manage Amazon SageMaker HyperPod clusters using the new HyperPod CLI and SDK, which simplify distributed computing infrastructure management for ML practitioners.
- HyperPod CLI and SDK abstract complexity of distributed systems for data scientists
- Multi-layered architecture uses CloudFormation for infrastructure and Kubernetes CRDs for workloads
- Install via pip: sagemaker-hyperpod package (version 3.5.0 or later)
- Create clusters with hyp init, configure settings, validate, then submit via hyp create
- Monitor cluster creation using hyp list and hyp describe commands
- Connect to clusters with hyp set-cluster-context for kubectl and CLI access
- Update instance groups and node recovery settings with hyp update cluster
- Delete clusters with hyp delete cluster-stack (irreversible action)
- Python SDK provides programmatic control for complex workflows and automation
- CLI best for quick experimentation; SDK better for production integration
The HyperPod CLI and SDK streamline cluster lifecycle management, provide declarative configuration control, and offer integrated observability for distributed training and inference workloads.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
2025
2025
2025
2024
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.