Home icon
Monitor Apache Spark applications on Amazon EMR with Amazon Cloudwatch

Blog



This article explains how to monitor Apache Spark applications on Amazon EMR by publishing detailed metrics to Amazon CloudWatch for performance optimization and bottleneck identification.

  • Custom Spark metrics sink collects and publishes metrics to CloudWatch every 30 seconds
  • CloudWatch agent aggregates metric data from each EMR cluster node
  • Metricfilter.json defines which metrics to capture, avoiding unnecessary data
  • Solution includes bootstrap script, metrics library, and CloudFormation template
  • Supports EMR versions 5.x.x, 6.x.x, and 7.x.x with version-specific JAR files
  • CloudWatch dashboard provides real-time visibility into Spark job, stage, and task performance
  • Metrics include I/O, garbage collection, memory, CPU, and executor-level data
  • Custom metrics incur CloudWatch charges; EMR metrics do not
  • Metrics can trigger CloudWatch alarms and SNS notifications for automation

This solution enables effective real-time monitoring of Spark applications on EMR through CloudWatch integration, helping identify performance issues and optimize resource utilization.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.