Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler
Blog
This article demonstrates how to integrate AWS Lake Formation with Amazon SageMaker Data Wrangler and Amazon EMR to enable fine-grained data access controls for ML data preparation.
- SageMaker Data Wrangler now supports Lake Formation with Amazon EMR for data access control
- Lake Formation provides centralized governance with fine-grained permissions across accounts
- Solution uses two AWS accounts: data lake account (A) and data science account (B)
- CloudFormation template automates deployment of EMR, IAM roles, SageMaker Studio, and Lake Formation
- Different users see different tables based on Lake Formation permissions (e.g., David sees non-sensitive customer data only)
- Data Wrangler connects to EMR via Hive/Presto using IAM authentication
- Users can transform data visually without writing code, then export to SageMaker Feature Store
- EMR encryption in transit required; self-signed certificates for proof-of-concept only
This solution enables data scientists to prepare ML data securely with granular access controls while maintaining a no-code visual interface.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.