Home icon

Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro

Big Data Blog



The article discusses a data processing framework developed by Wipro that leverages AWS services to build scalable and automated ETL pipelines, addressing challenges with traditional data processing tools.

  • Uses Amazon EMR with Apache Spark runtime for data processing
  • Integrates Amazon MWAA for pipeline orchestration and scheduling
  • Implements a fully automated CI/CD workflow for data pipelines
  • Supports flexible data transformations via JSON configuration
  • Provides fault tolerance with ability to resume jobs from last successful phase

Key benefits include full automation, scalability, concurrent job execution, proactive error notifications, and support for multiple data formats. The solution helps businesses overcome traditional ETL tool limitations by leveraging AWS managed services.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

May 1
2025
Build end-to-end Apache Spark pipelines with Amazon MWAA, Batch Processing Gateway, and Amazon EMR on EKS clusters
Aug 22
2024
How Kaplan, Inc. implemented modern data pipelines using Amazon MWAA and Amazon AppFlow with Amazon Redshift as a data warehouse
Apr 10
2024
Run complex queries on massive amounts of data stored on your Amazon DocumentDB clusters using Apache Spark running on Amazon EMR
Sep 15
2025
Streamline Spark application development on Amazon EMR with the Data Solutions Framework on AWS

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.