Introducing AWS Glue Data Catalog automation for table statistics collection for improved query performance on Amazon Redshift and Amazon Athena

Big Data Blog

AWS introduces automated table statistics collection for the AWS Glue Data Catalog, improving query performance on Amazon Redshift and Amazon Athena. Key highlights include:

Automatic generation of table statistics for new and updated tables
Supports file formats like Parquet, ORC, JSON, CSV, and Apache Iceberg tables
Enables cost-based optimizer (CBO) to improve query performance and efficiency
Provides catalog-level and table-level configuration options
Samples 20% of records for statistics collection by default
Allows customization of collection frequency, target columns, and sampling percentage

This feature simplifies statistics management for data lake administrators and provides flexibility for data owners to optimize their data platforms.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Dec 3
2024

AWS Glue Data catalog now automates generating statistics for new tables

Nov 13
2024

AWS Glue Data Catalog now supports scheduled generation of column level statistics

Oct 1
2024

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Aug 8
2024

AWS Glue Data Catalog views are now GA with Amazon Athena and Amazon Redshift

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Introducing AWS Glue Data Catalog automation for table statistics collection for improved query performance on Amazon Redshift and Amazon Athena

Related articles