AWS Clean Rooms launches privacy-enhancing synthetic dataset generation for ML model training
AWS News Blog
This article announces privacy-enhancing synthetic dataset generation for AWS Clean Rooms, enabling organizations to generate synthetic training datasets from collective data while protecting individual privacy.
- Generate synthetic datasets preserving statistical patterns without exposing original records
- Addresses tension between data utility and privacy protection in ML model training
- Uses advanced ML techniques to de-identify subjects while maintaining data characteristics
- Mitigates re-identification risk through model capacity reduction techniques
- Organizations control privacy parameters including noise levels and membership inference attack protection
- Provides fidelity and privacy quality metrics using KL-divergence and protection scores
- Integrated into AWS Clean Rooms ML workflow with enhanced analysis templates
- Supports classification and regression models on tabular data
- Billed separately based on Synthetic Data Generation Units (SDGUs)
- Available in all AWS regions where Clean Rooms is available
This capability enables secure collaborative ML model training on sensitive data while quantifiably reducing re-identification risks and maintaining compliance requirements.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
2025
2024
2024
2025
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.