Home icon

Use Apache Spark on Amazon EMR Serverless directly from Amazon Sagemaker Studio

News



This article announces the ability to run Apache Spark on Amazon EMR Serverless directly from Amazon SageMaker Studio notebooks, enabling petabyte-scale data analytics and machine learning.

Specifically, the article covers:

  • EMR Serverless automatically provisions and scales resources, allowing users to focus on data and models without managing clusters
  • Users can create and browse EMR Serverless applications directly from SageMaker Studio and connect to them with a few clicks
  • Once connected, users can use Spark SQL, Scala, Python to interactively query, explore, and visualize data, and run Apache Spark jobs
  • Jobs run faster due to EMR's performance-optimized versions of Spark (e.g., 4.5x faster than open-source Spark on EMR 7.1)
  • EMR Serverless offers fine-grained automatic scaling and users pay for only what they use
  • This feature is supported on SageMaker Distribution 1.10+ in all regions where SageMaker Studio is available


Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Nov 21
2025
Amazon EMR Serverless now supports Apache Spark 4.0.1 (preview)
Jan 26
2026
Apache Spark 4.0.1 preview now available on Amazon EMR Serverless
Dec 2
2025
Amazon EMR Serverless eliminates local storage provisioning for Apache Spark workloads
Dec 10
2024
Run Apache Spark Structured Streaming jobs at scale on Amazon EMR Serverless

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.