Run Apache Spark 3.5.1 workloads 4.5 times faster with Amazon EMR runtime for Apache Spark
Big Data Blog
This article discusses the performance improvements in the latest Amazon EMR runtime for Apache Spark, which is optimized to run Spark workloads faster than open-source Apache Spark.
Specifically, the article covers:
- Benchmark results showing Amazon EMR 7.1 runs Apache Spark 3.5.1 workloads 4.5 times faster and with 2.8 times better price-performance
- Recent improvements in the Amazon EMR runtime, including optimizations to physical operators, query planning, Amazon S3 requests, and using Java 17
- Methodology and configurations used for benchmarking Apache Spark 3.5.1 and Amazon EMR
- Instructions for running the TPC-DS benchmark on Apache Spark and Amazon EMR clusters
- Conclusion recommending using the latest Amazon EMR release to benefit from performance optimizations
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
Aug 8
2024
2024
Amazon EMR 7.2 now supports Apache Spark 3.5.1
Aug 26
2024
2024
Amazon EMR 7.1 runtime for Apache Spark and Iceberg can run Spark workloads 2.7 times faster than Apache Spark 3.5.1 and Iceberg 1.5.2
Dec 27
2024
2024
Amazon EMR 7.5 runtime for Apache Spark and Iceberg can run Spark workloads 3.6 times faster than Spark 3.5.3 and Iceberg 1.6.1
May 27
2026
2026
Amazon EMR now supports Apache Spark 4.0.2 in general availability
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.