Amazon EMR streamlines big data processing with simplified Amazon S3 Glacier access

Big Data Blog

Amazon EMR 7.2 introduces significant improvements in handling S3 Glacier objects, enabling more flexible and cost-effective big data processing across different storage tiers. The key enhancements include:

Direct reading of restored S3 Glacier objects using the S3A protocol
Intelligent handling of S3 Glacier objects with three configurable read modes:
- READ_ALL (default): Processes all objects regardless of storage class
- SKIP_ALL_GLACIER: Ignores S3 Glacier-tagged objects
- READ_RESTORED_GLACIER_OBJECTS: Processes only restored Glacier objects
Ability to differentiate between S3 Glacier storage classes and prevent unnecessary exceptions
Support for selective read operations on archived data

These improvements enable organizations to optimize storage costs by seamlessly processing data across different Amazon S3 storage tiers, particularly for long-term archival and compliance use cases.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Jan 29
2025

Hybrid big data analytics with Amazon EMR on AWS Outposts

Aug 18
2025

Achieve low-latency data processing with Amazon EMR on AWS Local Zones

Dec 15
2025

Accelerate Apache Hive read and write on Amazon EMR using enhanced S3A

Jul 30
2025

Optimize Amazon EMR runtime for Apache Spark with EMR S3A

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Amazon EMR streamlines big data processing with simplified Amazon S3 Glacier access

Related articles