Home icon

Amazon EMR streamlines big data processing with simplified Amazon S3 Glacier access

Big Data Blog



Amazon EMR 7.2 introduces significant improvements in handling S3 Glacier objects, enabling more flexible and cost-effective big data processing across different storage tiers. The key enhancements include:

  • Direct reading of restored S3 Glacier objects using the S3A protocol
  • Intelligent handling of S3 Glacier objects with three configurable read modes:
    • READ_ALL (default): Processes all objects regardless of storage class
    • SKIP_ALL_GLACIER: Ignores S3 Glacier-tagged objects
    • READ_RESTORED_GLACIER_OBJECTS: Processes only restored Glacier objects
  • Ability to differentiate between S3 Glacier storage classes and prevent unnecessary exceptions
  • Support for selective read operations on archived data

These improvements enable organizations to optimize storage costs by seamlessly processing data across different Amazon S3 storage tiers, particularly for long-term archival and compliance use cases.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Jan 29
2025
Hybrid big data analytics with Amazon EMR on AWS Outposts
Aug 18
2025
Achieve low-latency data processing with Amazon EMR on AWS Local Zones
Dec 15
2025
Accelerate Apache Hive read and write on Amazon EMR using enhanced S3A
Jul 30
2025
Optimize Amazon EMR runtime for Apache Spark with EMR S3A

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.