Home icon

Under the hood: Amazon EKS ultra scale clusters

Containers Blog



Amazon EKS has announced support for ultra-scale Kubernetes clusters with up to 100,000 nodes, enabling massive AI/ML workloads with unprecedented computational capabilities. Key highlights of this breakthrough include:

  • Support for up to 1.6 million AWS Trainium chips or 800,000 NVIDIA GPUs in a single cluster
  • Architectural innovations in etcd data store, including consensus offloading and in-memory database
  • Significant improvements in API server throughput and controller performance
  • Enhanced Karpenter node management with static capacity and auto-repair features
  • Network scaling optimizations to support massive cluster sizes

The improvements enable customers like Anthropic to run cutting-edge AI model training and inference workloads at an unprecedented scale, with robust performance, reliability, and Kubernetes conformance.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Jul 16
2025
Amazon EKS enables ultra scale AI/ML workloads with support for 100K nodes per cluster
Jul 21
2025
Deep dive into cluster networking for Amazon EKS Hybrid Nodes
Jul 15
2025
Amazon EKS now supports up to 100,000 worker nodes per cluster
Jun 3
2025
Deep Dive: Amazon EKS Dashboard for Visibility into Multi-Cluster Operations and Governance

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.