Amazon EKS introduces node monitoring and auto repair capabilities
Containers Blog
Amazon EKS has introduced node monitoring and auto repair capabilities to improve Kubernetes cluster reliability and reduce operational overhead. This new feature provides automatic detection and remediation of node-level issues.
- Key capabilities include Node Monitoring Agent (NMA) that detects various failures, including: • GPU hardware errors • Kubelet and container runtime issues • Networking and storage problems • System resource constraints
- Node auto repair can: • Replace or reboot unhealthy nodes within 30 minutes • Respect Kubernetes Pod Disruption Budgets • Automatically log and audit repair actions
- Available for EKS Managed Node Groups, Karpenter, and EKS Auto Mode nodes
- Provides streamlined log collection through a Kubernetes Custom Resource Definition (CRD)
The feature aims to improve workload availability, reduce manual intervention, and help cluster administrators focus on higher-value tasks.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.