Accelerate data lake operations with Apache Iceberg V3 deletion vectors and row lineage
Big Data Blog
This article explains how Apache Iceberg V3's deletion vectors and row lineage capabilities improve data lake operations across AWS services.
- Deletion vectors replace positional delete files with efficient binary Puffin format, reducing write amplification and storage costs
- Row lineage adds _row_id and _last_updated_sequence_number metadata fields for precise change tracking and audit trails
- V3 support available across EMR 7.12, AWS Glue, SageMaker, S3 Tables, and Glue Data Catalog
- Use cases include GDPR compliance, ecommerce inventory updates, healthcare data tracking, and media recommendation engines
- Enable V3 by setting format-version=3 table property; existing V2 tables upgrade atomically without data rewrites
- Configure merge-on-read mode for delete, update, and merge operations to maximize deletion vector benefits
Iceberg V3 delivers faster writes, lower storage costs, comprehensive audit trails, and efficient incremental processing across AWS analytics services without custom implementations.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
Nov 26
2025
2025
Achieve 2x faster data lake query performance with Apache Iceberg on Amazon Redshift
Nov 26
2025
2025
AWS announces support for Apache Iceberg V3 deletion vectors and row lineage
Apr 3
2024
2024
Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake
Oct 30
2024
2024
Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.