Home icon

Accelerate Apache Hadoop and Apache Iceberg on Amazon S3 with the Analytics Accelerator Library

Storage Blog



This article introduces the Analytics Accelerator Library for S3 (AAL), an open-source tool that optimizes Apache Hadoop and Apache Iceberg performance on Amazon S3 for analytics workloads.

  • AAL implements S3 client-side best practices and intelligent prefetching strategies automatically
  • Included by default in Apache Hadoop 3.4.3 and available in Apache Iceberg 1.9.0+
  • Format-agnostic optimizations: request-reshaping, sequential prefetching, small object prefetching, metadata caching
  • Parquet-specific optimizations: footer prefetching, vector reads, predictive column prefetching
  • TPC-DS benchmark testing showed up to 1.2X faster query execution on IO-intensive workloads
  • Reduced data accessed by 44% on Hive/Parquet and 36% on Iceberg/Parquet
  • No application code changes required; works seamlessly with existing connectors

AAL delivers significant performance improvements and cost reductions for analytics workloads without requiring workflow modifications or new APIs.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Dec 3
2024
Announcing Amazon S3 Tables – Fully managed Apache Iceberg tables optimized for analytics workloads
Nov 22
2024
How Amazon Ads uses Iceberg optimizations to accelerate their Spark workload on Amazon S3
Jun 24
2025
New: Improve Apache Iceberg query performance in Amazon S3 with sort and z-order compaction
May 22
2025
Scalable analytics and centralized governance for Apache Iceberg tables using Amazon S3 Tables and Amazon Redshift

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.