Home icon

Build end-to-end Apache Spark pipelines with Amazon MWAA, Batch Processing Gateway, and Amazon EMR on EKS clusters

Big Data Blog



This article discusses building end-to-end Apache Spark pipelines using Amazon Managed Workflows for Apache Airflow (MWAA), Batch Processing Gateway (BPG), and Amazon EMR on EKS clusters.

  • Enables routing Spark workloads across multiple EMR on EKS clusters
  • Introduces a custom Airflow operator (BPGOperator) for seamless job submission
  • Provides solution for healthcare analytics company needing separate data processing environments
  • Offers benefits like separation of responsibilities and centralized code management
  • Demonstrates incremental migration strategy for existing Airflow DAGs

The solution allows organizations to build flexible, scalable data processing pipelines using AWS services, with clear separation between infrastructure and data engineering teams.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Jun 3
2025
Build a centralized observability platform for Apache Spark on Amazon EMR on EKS using external Spark History Server
Feb 25
2025
Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro
Dec 10
2024
Run Apache Spark Structured Streaming jobs at scale on Amazon EMR Serverless
Jun 9
2026
Build stateful streaming applications with Apache Spark 4.0 on Amazon EMR Serverless

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.