Announcing general availability of Apache Spark 4.0 on Amazon EMR

Big Data Blog

This article announces the general availability of Apache Spark 4.0 on Amazon EMR, introducing major improvements for data processing, semi-structured data handling, and streaming workloads.

Spark Connect enables remote PySpark development from IDEs without local Spark installation
VARIANT data type natively supports semi-structured JSON without upfront schema definition
Apache Iceberg V3 integration enables efficient semi-structured storage with schema evolution
SQL scripting adds procedural logic (variables, conditionals, loops) directly in SQL
Python Data Source API allows building custom connectors entirely in Python
Queryable state for streaming enables live state inspection without stopping jobs
EMR Serverless runs Spark workloads up to 4.5× faster than open-source Apache Spark
EMR-spark-8.0 includes Python 3.11, Java 17, and simplified patch management
Available across EMR on EC2, EMR on EKS, and EMR Serverless deployment options

Spark 4.0 on Amazon EMR simplifies data processing by reducing schema complexity, enabling interactive development at production scale, and providing better observability for streaming workloads.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

May 27
2026

Amazon EMR now supports Apache Spark 4.0.2 in general availability

Jan 26
2026

Apache Spark 4.0.1 preview now available on Amazon EMR Serverless

Nov 21
2025

Amazon EMR Serverless now supports Apache Spark 4.0.1 (preview)

Aug 8
2024

Amazon EMR 7.2 now supports Apache Spark 3.5.1

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Announcing general availability of Apache Spark 4.0 on Amazon EMR

Related articles