Home icon

Introducing AWS Glue Data Quality anomaly detection

Big Data Blog



This article introduces the new anomaly detection capability in AWS Glue Data Quality, which uses machine learning to detect anomalies in datasets based on patterns learned from historical data.

Specifically, the article covers:

  • The challenges of using fixed data quality rules that can't adapt to changing business environments
  • An overview of the anomaly detection solution using an example of monitoring NYC taxi ride data
  • Steps to set up resources using a CloudFormation template and generate sample NYC taxi data
  • Creating an AWS Glue visual ETL job with data quality rules and anomaly detection analyzers
  • Running the job over multiple days to train the anomaly detection model and analyze results
  • Updating data quality rules based on anomaly detection recommendations
  • Excluding anomalies from the training data to refine the model
  • Adding an anomaly detection rule to fail jobs or adjust data quality scores when anomalies are detected
  • A use case demonstrating detection of anomalies in seasonal patterns


Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Aug 8
2024
AWS Glue announces GA of new ML-powered Glue Data Quality capability
Jun 28
2024
Announcing Data Quality Definition Language (DQDL) enhancements for AWS Glue Data Quality
Jul 28
2025
AWS Glue Data Quality now supports Amazon S3 Tables and Iceberg Tables
Nov 25
2025
AWS Glue Data Quality now supports rule labeling for enhanced reporting

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.