Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro

Big Data Blog

The article discusses a data processing framework developed by Wipro that leverages AWS services to build scalable and automated ETL pipelines, addressing challenges with traditional data processing tools.

Uses Amazon EMR with Apache Spark runtime for data processing
Integrates Amazon MWAA for pipeline orchestration and scheduling
Implements a fully automated CI/CD workflow for data pipelines
Supports flexible data transformations via JSON configuration
Provides fault tolerance with ability to resume jobs from last successful phase

Key benefits include full automation, scalability, concurrent job execution, proactive error notifications, and support for multiple data formats. The solution helps businesses overcome traditional ETL tool limitations by leveraging AWS managed services.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

May 1
2025

Build end-to-end Apache Spark pipelines with Amazon MWAA, Batch Processing Gateway, and Amazon EMR on EKS clusters

Aug 22
2024

How Kaplan, Inc. implemented modern data pipelines using Amazon MWAA and Amazon AppFlow with Amazon Redshift as a data warehouse

Apr 10
2024

Run complex queries on massive amounts of data stored on your Amazon DocumentDB clusters using Apache Spark running on Amazon EMR

Sep 15
2025

Streamline Spark application development on Amazon EMR with the Data Solutions Framework on AWS

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro

Related articles