Build a serverless data quality pipeline using Deequ on AWS Lambda
Big Data Blog
This article discusses how to build a serverless data quality pipeline using Deequ, an open-source framework from AWS, on AWS Lambda. It covers the importance of data quality checks and how to implement them using Deequ's PyDeequ library.
Specifically, the article covers:
- Overview of the serverless data quality pipeline architecture using AWS services like Lambda, Step Functions, S3, and SNS
- Implementation of data quality checks like completeness, uniqueness, and non-negativity using PyDeequ
- Steps to deploy and run the sample application from the provided GitHub repository
- How to review data quality check results and metrics generated by Deequ
- Considerations for running PyDeequ on AWS Lambda
- Conclusion on the importance of data quality and using Deequ for data quality checks
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.