Home icon

Accelerate Apache Hive read and write on Amazon EMR using enhanced S3A

Big Data Blog



This article details performance improvements to Apache Hive on Amazon EMR through enhanced S3A filesystem implementation, replacing the legacy EMRFS.

  • Amazon EMR 7.10 Hive is 1.5x faster for reads, 3x faster for writes than EMR 7.0
  • Transition from proprietary EMRFS to open-source S3A as default filesystem
  • New S3A-optimized committer eliminates rename operations using multipart uploads
  • S3A supports AWS SDK v2, S3 Glacier, S3 Express One Zone, vector reads, prefetching
  • Read queries show 33% cost improvement compared to EMR 7.0
  • Write performance improves 2.91x with new committer; benefits increase with partition count
  • S3A committer unavailable for merge small files, ACID tables, multi-filesystem partitions

Amazon EMR's shift to S3A improves performance while maintaining open-source standardization, portability, and community support benefits.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Dec 15
2025
Amazon EMR HBase on Amazon S3 transitioning to EMR S3A with comparable EMRFS performance
Nov 27
2025
Run Apache Spark and Apache Iceberg write jobs 2x faster with Amazon EMR
Jul 30
2025
Optimize Amazon EMR runtime for Apache Spark with EMR S3A
Nov 28
2024
Amazon EMR streamlines big data processing with simplified Amazon S3 Glacier access

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.