Home icon

Monitoring and automating recovery from AZ impairments in Amazon EKS with Istio and ARC Zonal Shift

Containers Blog



This article discusses a comprehensive approach to monitoring and automating recovery from Availability Zone (AZ) impairments in Amazon EKS using Istio, Prometheus, Grafana, and Amazon Application Recovery Controller (ARC) zonal shift.

  • Key solution components:
    • Istio service mesh for network observability
    • Prometheus for metrics collection
    • Grafana for visualization and alerting
    • ARC zonal shift for traffic redirection during AZ impairments
  • Monitoring strategy involves:
    • Evaluating network responses from Istio sidecar proxies
    • Tracking server-side errors in different AZs
    • Creating Grafana dashboards and alerts
  • Zonal shift process automatically:
    • Cordons nodes in the impaired AZ
    • Suspends AZ rebalancing
    • Removes Pod endpoints from unhealthy AZ
  • Benefits include:
    • Improved application resilience
    • Automated traffic redirection
    • Minimal service disruption

The solution provides a robust mechanism for detecting and responding to AZ-level infrastructure issues in Kubernetes environments.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Jan 20
2026
End-to-end recovery from AZ impairments in Amazon EKS using EKS Zonal shift and Istio
Oct 22
2024
Amazon EKS now supports Amazon Application Recovery Controller (ARC)
Oct 22
2024
Amazon Application Recovery Controller zonal shift and zonal autoshift extends support for two new multi-AZ resources
Dec 16
2024
Announcing Node Health Monitoring and Auto-Repair for Amazon EKS

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.