AWS Glue Data catalog now supports generating statistics for Apache Iceberg tables
News
The article discusses the new support in AWS Glue Data Catalog for generating column-level aggregated statistics for Apache Iceberg tables. This feature aims to improve query performance and potentially reduce costs when using Amazon Redshift Spectrum.
Specifically, the article covers:
- AWS Glue Data Catalog now supports generating statistics like number of distinct values (NDV) for Apache Iceberg tables.
- These statistics are stored in Apache Iceberg Puffin files and integrated with Amazon Redshift Spectrum's cost-based optimizer (CBO).
- The CBO uses these statistics to optimize queries by applying filters early in the query processing, reducing memory usage and the number of records read.
- Users can generate statistics for Iceberg tables using AWS Glue Console or APIs, and the statistics will be updated with each table snapshot.
- The feature is generally available in several AWS regions, as listed in the article.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
Jul 9
2024
2024
Accelerate query performance with Apache Iceberg statistics on the AWS Glue Data Catalog
Dec 19
2024
2024
AWS Glue Data Catalog offers advanced automatic optimization for Apache Iceberg tables
Sep 12
2024
2024
The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables
Sep 12
2024
2024
AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.