Unlock the power of Apache Iceberg v3 deletion vectors on Amazon EMR
Big Data Blog
The article discusses the improvements in Apache Iceberg v3's deletion vectors on Amazon EMR, focusing on more efficient row-level delete handling compared to previous versions.
- Iceberg v3 introduces binary deletion vectors stored in compact Puffin files
- Replaces positional delete files with more efficient bitmap-based deletion tracking
- Performance tests showed significant improvements:
- Delete operation time reduced by 55%
- Delete file size reduced by 73.6%
- Full table read performance improved by 28.5%
- Filtered read performance improved by 23%
- Tested on Amazon EMR 7.10.0 with Spark 3.5.5
- Provides more efficient metadata handling and query performance
The new approach reduces I/O overhead, improves query performance, and increases storage efficiency for data management.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
Nov 26
2025
2025
AWS announces support for Apache Iceberg V3 deletion vectors and row lineage
Nov 21
2025
2025
Amazon EMR 7.12 now supports the Apache Iceberg v3 table format
Nov 26
2025
2025
Accelerate data lake operations with Apache Iceberg V3 deletion vectors and row lineage
Nov 27
2025
2025
Run Apache Spark and Apache Iceberg write jobs 2x faster with Amazon EMR
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.