Home icon

Build stateful streaming applications with Apache Spark 4.0 on Amazon EMR Serverless

Big Data Blog



This article demonstrates building stateful streaming applications using Apache Spark 4.0's new transformWithState API on Amazon EMR Serverless, with an IoT device heartbeat monitoring use case.

  • Spark 4.0 introduces native timer support, automatic state TTL, schema evolution, and multiple state variables per key
  • transformWithState API enables first-class support for complex event processing without manual state management
  • Solution architecture: IoT devices → Kinesis → EMR Serverless → RocksDB state store → S3 checkpoints → SNS alerts
  • HeartbeatMonitor class implements three methods: init() for state setup, handleInputRows() for timer management, handleExpiredTimer() for alert generation
  • Automatic state persistence to RocksDB and S3 enables fault-tolerant recovery with exactly-once semantics
  • EMR Serverless provides automatic scaling, no cluster management, and streaming mode keeps driver alive between micro-batches
  • Step-by-step implementation includes creating EMR application, implementing stateful processor, configuring IAM roles, uploading dependencies, and submitting streaming job
  • Testing demonstrates normal operation, offline detection after 30 seconds, repeat alerts every 60 seconds, and device recovery
  • Real-world applications: telecom SLA monitoring, financial fraud detection, e-commerce cart abandonment detection

Spark 4.0 on EMR Serverless simplifies building production-ready stateful streaming applications with automatic scaling and minimal operational overhead.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Dec 10
2024
Run Apache Spark Structured Streaming jobs at scale on Amazon EMR Serverless
Nov 21
2025
Amazon EMR Serverless now supports Apache Spark 4.0.1 (preview)
Jan 26
2026
Apache Spark 4.0.1 preview now available on Amazon EMR Serverless
Sep 4
2024
Use Apache Spark on Amazon EMR Serverless directly from Amazon Sagemaker Studio

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.