Implementing AWS DataSync with hundreds of millions of objects
Storage Blog
This article discusses techniques for using AWS DataSync to efficiently transfer and validate large datasets, even exceeding 100 million objects, in hybrid cloud environments.
Specifically, the article covers:
- Overview of AWS DataSync and the new manifest file feature for specifying files to transfer
- Solution overview involving restructuring data, event-driven approach with manifest files, and using include filters for large batches
- Detailed walkthrough of implementing the event-driven approach with manifest files using AWS services like Lambda, SQS, and S3
- Techniques for batching objects using UUID prefixes and include filters while staying within DataSync quotas
- Conclusion highlighting the benefits of using manifests and include filters for transferring and validating large datasets with DataSync
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
Oct 30
2024
2024
AWS DataSync increases performance and scalability for data transfers
Mar 13
2024
2024
Replicate objects Using AWS DataSync with Amazon S3 compatible storage on Snowball Edge
May 29
2025
2025
AWS DataSync simplifies and accelerates cross-cloud data transfers
Dec 12
2025
2025
AWS DataSync increases scalability and performance for on-premises file transfers
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.