Home icon

AWS Glue Data Catalog now supports scheduled generation of column level statistics

News



This article announces that AWS Glue Data Catalog now supports scheduled generation of column-level statistics for Apache Iceberg tables and various file formats like Parquet, JSON, CSV, XML, ORC, and ION.

Specifically, the article covers:

  • The ability to create a recurring schedule in Glue Data Catalog for automated statistics generation
  • Integration with cost-based optimizers in Amazon Redshift Spectrum and Amazon Athena for improved query performance
  • Collection of statistics like number of distinct values, nulls, max, min, and average length
  • Visibility into the status and timing of statistics generation runs
  • Getting started using the Glue Data Catalog Console or APIs
  • General availability in regions where Amazon EventBridge Scheduler is available


Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Dec 3
2024
AWS Glue Data catalog now automates generating statistics for new tables
Dec 3
2024
Introducing AWS Glue Data Catalog automation for table statistics collection for improved query performance on Amazon Redshift and Amazon Athena
Jul 9
2024
AWS Glue Data catalog now supports generating statistics for Apache Iceberg tables
Jun 26
2025
AWS Glue Data Catalog usage metrics now available with Amazon CloudWatch

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.