Run Apache Spark and Apache Iceberg write jobs 2x faster with Amazon EMR

Big Data Blog

This article demonstrates that Amazon EMR 7.12 achieves 2x faster Apache Spark and Iceberg write performance compared to open source alternatives on 3TB merge workloads.

Amazon EMR 7.12 runs 3TB merge workloads 2.08x faster than open source Spark 3.5.6
Benchmark tested 37 merge queries covering INSERT, UPDATE, and DELETE operations
Metadata-only delete operations eliminate unnecessary data file rewrites
Bloom filter joins reduce data read and processing during merge operations
Parallel file write optimization improves throughput to Amazon S3
1.7x cost efficiency improvement over open source Spark with Iceberg
Tested on 9 r5d.4xlarge instances with 144 vCPUs and 1,152 GB memory

Amazon EMR 7.12 delivers significant performance and cost improvements for data ingestion and ETL pipelines while maintaining Iceberg's ACID guarantees and data consistency.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Nov 27
2025

Run Apache Spark and Iceberg 4.5x faster than open source Spark with Amazon EMR

Aug 26
2024

Amazon EMR 7.1 runtime for Apache Spark and Iceberg can run Spark workloads 2.7 times faster than Apache Spark 3.5.1 and Iceberg 1.5.2

Dec 27
2024

Amazon EMR 7.5 runtime for Apache Spark and Iceberg can run Spark workloads 3.6 times faster than Spark 3.5.3 and Iceberg 1.6.1

Jul 30
2025

Optimize Amazon EMR runtime for Apache Spark with EMR S3A

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Run Apache Spark and Apache Iceberg write jobs 2x faster with Amazon EMR

Related articles