Run Apache Spark and Apache Iceberg write jobs 2x faster with Amazon EMR
Big Data Blog
This article demonstrates that Amazon EMR 7.12 achieves 2x faster Apache Spark and Iceberg write performance compared to open source alternatives on 3TB merge workloads.
- Amazon EMR 7.12 runs 3TB merge workloads 2.08x faster than open source Spark 3.5.6
- Benchmark tested 37 merge queries covering INSERT, UPDATE, and DELETE operations
- Metadata-only delete operations eliminate unnecessary data file rewrites
- Bloom filter joins reduce data read and processing during merge operations
- Parallel file write optimization improves throughput to Amazon S3
- 1.7x cost efficiency improvement over open source Spark with Iceberg
- Tested on 9 r5d.4xlarge instances with 144 vCPUs and 1,152 GB memory
Amazon EMR 7.12 delivers significant performance and cost improvements for data ingestion and ETL pipelines while maintaining Iceberg's ACID guarantees and data consistency.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
Nov 27
2025
2025
Run Apache Spark and Iceberg 4.5x faster than open source Spark with Amazon EMR
Aug 26
2024
2024
Amazon EMR 7.1 runtime for Apache Spark and Iceberg can run Spark workloads 2.7 times faster than Apache Spark 3.5.1 and Iceberg 1.5.2
Dec 27
2024
2024
Amazon EMR 7.5 runtime for Apache Spark and Iceberg can run Spark workloads 3.6 times faster than Spark 3.5.3 and Iceberg 1.6.1
Jul 30
2025
2025
Optimize Amazon EMR runtime for Apache Spark with EMR S3A
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.