AWS Glue Data catalog now supports generating statistics for Apache Iceberg tables

News

The article discusses the new support in AWS Glue Data Catalog for generating column-level aggregated statistics for Apache Iceberg tables. This feature aims to improve query performance and potentially reduce costs when using Amazon Redshift Spectrum.

Specifically, the article covers:

AWS Glue Data Catalog now supports generating statistics like number of distinct values (NDV) for Apache Iceberg tables.
These statistics are stored in Apache Iceberg Puffin files and integrated with Amazon Redshift Spectrum's cost-based optimizer (CBO).
The CBO uses these statistics to optimize queries by applying filters early in the query processing, reducing memory usage and the number of records read.
Users can generate statistics for Iceberg tables using AWS Glue Console or APIs, and the statistics will be updated with each table snapshot.
The feature is generally available in several AWS regions, as listed in the article.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Jul 9
2024

Accelerate query performance with Apache Iceberg statistics on the AWS Glue Data Catalog

Dec 19
2024

AWS Glue Data Catalog offers advanced automatic optimization for Apache Iceberg tables

Sep 12
2024

The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

Sep 12
2024

AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

AWS Glue Data catalog now supports generating statistics for Apache Iceberg tables

Related articles