Home icon

How Amazon Ads uses Iceberg optimizations to accelerate their Spark workload on Amazon S3

Storage Blog



Amazon Ads successfully optimized their Spark workload on Amazon S3 by leveraging Apache Iceberg's new base-2 object store file layout, resulting in significant performance and cost improvements.

  • Reduced total EMR processing time by 22% (from 11.5 to 9 hours)
  • Decreased EMR compute costs by 20%
  • Reduced S3 storage costs by 32%
  • Eliminated manual job retries
  • Reduced S3 5xx errors by 77%

The optimization was achieved by using Iceberg's new 20-character base-2 hash for object key names, which allows for more even distribution of traffic across S3 prefixes and improves request scaling performance.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Dec 3
2024
Announcing Amazon S3 Tables – Fully managed Apache Iceberg tables optimized for analytics workloads
Apr 20
2026
Accelerate Apache Hadoop and Apache Iceberg on Amazon S3 with the Analytics Accelerator Library
Nov 27
2025
Run Apache Spark and Iceberg 4.5x faster than open source Spark with Amazon EMR
Jun 24
2025
New: Improve Apache Iceberg query performance in Amazon S3 with sort and z-order compaction

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.