Home icon

How to enhance Amazon Macie data discovery capabilities using Amazon Textract

Security Blog



This article explains how to enhance Amazon Macie's data discovery capabilities by using Amazon Textract to extract text from images, enabling sensitive data detection in image files.

  • The solution uses AWS SAM to deploy a serverless workflow that processes images in S3 buckets
  • Amazon Textract extracts text from images with file extensions like .png, .jpg, and .jpeg
  • Extracted text is converted to a text file and stored in the same S3 bucket
  • Macie then scans the text file using managed and custom data identifiers
  • The solution supports detecting sensitive data like identification numbers in images
  • Currently supports text extraction in English, German, French, Italian, and Portuguese

The workflow enables organizations to discover sensitive data embedded in images, addressing a significant challenge for regulated industries with strict data protection requirements.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Jun 30
2025
Amazon Textract announces accuracy and feature updates to DetectDocumentText and AnalyzeDocument APIs
Jan 8
2024
Create a document lake using large-scale text extraction from documents with Amazon Textract
Oct 1
2024
How to perform a proof of concept for automated discovery using Amazon Macie
Jul 1
2024
Amazon DataZone enhances data discovery with advanced search filtering

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.