Solving data duplication by identifying and removing duplicate objects in Amazon S3 buckets using Amazon Athena to query S3 Inventory reports, AWS Lambda, and S3 Batch Operations to optimize storage costs.

<div>
<p>This article provides a solution to manage duplicate objects in Amazon S3 and reduce storage costs. It covers how to identify and delete duplicate objects using Amazon Athena, AWS Lambda, and S3 Batch Operations.</p>
<p>Specifically, the article covers:</p>
<ul>
<li>Identifying duplicate objects by comparing their ETags (content hashes) using an Athena query on the S3 Inventory report</li>
<li>Creating a Lambda function to delete a single S3 object</li>
<li>Configuring an S3 Batch Operations job to invoke the Lambda function and delete the identified duplicate objects</li>
<li>Prerequisites, walkthrough steps, and things to know about the solution</li>
<li>Cleaning up the resources created for the solution</li>
</ul>
</div>


Managing duplicate objects in Amazon S3

Related articles

Related articles

Mar 27
2024
Maintaining object immutability by automatically extending Amazon S3 Object Lock retention periods

Jul 17
2025
Copy objects between any Amazon S3 storage classes using S3 Batch Operations

Jan 16
2025
Preventing unintended encryption of Amazon S3 objects

Jan 23
2026
Applying Amazon S3 Object Lock at scale for petabytes of existing data