AWS Glue Data Catalog now supports scheduled generation of column level statistics
News
This article announces that AWS Glue Data Catalog now supports scheduled generation of column-level statistics for Apache Iceberg tables and various file formats like Parquet, JSON, CSV, XML, ORC, and ION.
Specifically, the article covers:
- The ability to create a recurring schedule in Glue Data Catalog for automated statistics generation
- Integration with cost-based optimizers in Amazon Redshift Spectrum and Amazon Athena for improved query performance
- Collection of statistics like number of distinct values, nulls, max, min, and average length
- Visibility into the status and timing of statistics generation runs
- Getting started using the Glue Data Catalog Console or APIs
- General availability in regions where Amazon EventBridge Scheduler is available
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
Dec 3
2024
2024
AWS Glue Data catalog now automates generating statistics for new tables
Dec 3
2024
2024
Introducing AWS Glue Data Catalog automation for table statistics collection for improved query performance on Amazon Redshift and Amazon Athena
Jul 9
2024
2024
AWS Glue Data catalog now supports generating statistics for Apache Iceberg tables
Jun 26
2025
2025
AWS Glue Data Catalog usage metrics now available with Amazon CloudWatch
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.