Introducing AWS Glue 5.0 for Apache Spark
Big Data Blog
AWS has announced the release of AWS Glue 5.0, a new version of their serverless data integration service with significant performance and feature improvements.
- Upgraded to Apache Spark 3.5.2, Python 3.11, and Java 17
- 32% faster performance and 22% cost reduction compared to Glue 4.0
- Updated open table format libraries (Hudi, Iceberg, Delta Lake)
- Added support for Amazon SageMaker Unified Studio and Lakehouse
- Introduced Spark-native fine-grained access control with AWS Lake Formation
- Added support for S3 Access Grants and requirements.txt for Python libraries
- Preview support for data lineage in Amazon DataZone
Key improvements include performance optimizations, enhanced Structured Streaming, and support for Apache Arrow-optimized Python UDFs and user-defined table functions.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.