Apache Spark 4.0.1 preview now available on Amazon EMR Serverless
Big Data Blog
This article announces Apache Spark 4.0.1 preview availability on Amazon EMR Serverless, introducing major enhancements for analytics, data engineering, and governance.
- ANSI SQL mode now default, enforcing standard SQL behavior for data integrity
- VARIANT data type efficiently handles JSON/XML without repeated parsing overhead
- SQL scripting enables loops, conditionals, and session variables directly in SQL
- Pipe syntax (|>) chains SQL operations for improved readability and maintainability
- Python data source API allows building custom connectors without Scala expertise
- Queryable streaming state enables debugging and monitoring of stateful applications
- Apache Iceberg v3 support provides transaction guarantees and audit trails
- AWS S3 Tables integration with automatic optimization and maintenance
- Lake Formation full table access supported for Iceberg, Delta, and Hive tables
- Runtime requirements: Scala 2.13.16, Java 17+, Python 3.9+, Pandas 2.0.0+
- Preview limitations: No fine-grained access control, Spark Connect, Hudi, or interactive applications
Spark 4.0.1 on EMR Serverless simplifies data engineering workflows with SQL enhancements, Python improvements, and streaming capabilities while maintaining governance controls for semi-structured data.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
2025
2026
2026
2025
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.