Managing duplicate objects in Amazon S3
Storage Blog
This article provides a solution to manage duplicate objects in Amazon S3 and reduce storage costs. It covers how to identify and delete duplicate objects using Amazon Athena, AWS Lambda, and S3 Batch Operations.
Specifically, the article covers:
- Identifying duplicate objects by comparing their ETags (content hashes) using an Athena query on the S3 Inventory report
- Creating a Lambda function to delete a single S3 object
- Configuring an S3 Batch Operations job to invoke the Lambda function and delete the identified duplicate objects
- Prerequisites, walkthrough steps, and things to know about the solution
- Cleaning up the resources created for the solution
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
Mar 27
2024
2024
Maintaining object immutability by automatically extending Amazon S3 Object Lock retention periods
Jul 17
2025
2025
Copy objects between any Amazon S3 storage classes using S3 Batch Operations
Jan 16
2025
2025
Preventing unintended encryption of Amazon S3 objects
Jan 23
2026
2026
Applying Amazon S3 Object Lock at scale for petabytes of existing data
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.