Efficiently compare items across two Amazon DynamoDB tables
Database Blog
This article presents an efficient algorithm for comparing two Amazon DynamoDB tables to identify differences in items, implemented in the open-source Bulk Executor tool.
- Algorithm leverages DynamoDB's consistent hashing and ordered scan results for efficient comparison
- Compares scan sequences using parallel segmented scans for independent, parallel processing
- Handles schema validation, identical items, changed attributes, missing items, and partition key differences
- Demonstrated on 500M-item tables (~180GB each) compared in 6.5 minutes for under $10
- Uses AWS Glue to run hundreds of segmented scans in parallel
- Outputs differences as added (+), removed (-), or changed (*) items with optional full details
- Results can be stored in Amazon S3 for large datasets
- Useful for migrations, point-in-time recovery verification, and data propagation validation
The Bulk Executor tool provides a fast, scalable, and cost-effective solution for comparing large DynamoDB tables using linear-time algorithm with minimal memory overhead.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.