Accelerate Apache Hive read and write on Amazon EMR using enhanced S3A

Big Data Blog

This article details performance improvements to Apache Hive on Amazon EMR through enhanced S3A filesystem implementation, replacing the legacy EMRFS.

Amazon EMR 7.10 Hive is 1.5x faster for reads, 3x faster for writes than EMR 7.0
Transition from proprietary EMRFS to open-source S3A as default filesystem
New S3A-optimized committer eliminates rename operations using multipart uploads
S3A supports AWS SDK v2, S3 Glacier, S3 Express One Zone, vector reads, prefetching
Read queries show 33% cost improvement compared to EMR 7.0
Write performance improves 2.91x with new committer; benefits increase with partition count
S3A committer unavailable for merge small files, ACID tables, multi-filesystem partitions

Amazon EMR's shift to S3A improves performance while maintaining open-source standardization, portability, and community support benefits.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Dec 15
2025

Amazon EMR HBase on Amazon S3 transitioning to EMR S3A with comparable EMRFS performance

Nov 27
2025

Run Apache Spark and Apache Iceberg write jobs 2x faster with Amazon EMR

Jul 30
2025

Optimize Amazon EMR runtime for Apache Spark with EMR S3A

Nov 28
2024

Amazon EMR streamlines big data processing with simplified Amazon S3 Glacier access

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Accelerate Apache Hive read and write on Amazon EMR using enhanced S3A

Related articles