Home icon

Automatically redact PII for machine learning using Amazon SageMaker Data Wrangler

Blog



This blog post demonstrates how to use Amazon SageMaker Data Wrangler and Amazon Comprehend to automatically redact personally identifiable information (PII) from tabular data as part of a machine learning operations (MLOps) workflow.

Specifically, the article covers:

  • Problem: ML data often contains PII which needs to be redacted for privacy and compliance reasons.
  • Solution overview: Using Amazon Comprehend to detect PII and SageMaker Data Wrangler to integrate PII redaction into an MLOps workflow.
  • Step-by-step walkthrough of a SageMaker Data Wrangler flow that redacts PII using custom transformations and Amazon Comprehend.
  • Conclusion: How to download the example flow and start redacting PII from tabular data.


Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.