Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro
Big Data Blog
The article discusses a data processing framework developed by Wipro that leverages AWS services to build scalable and automated ETL pipelines, addressing challenges with traditional data processing tools.
- Uses Amazon EMR with Apache Spark runtime for data processing
- Integrates Amazon MWAA for pipeline orchestration and scheduling
- Implements a fully automated CI/CD workflow for data pipelines
- Supports flexible data transformations via JSON configuration
- Provides fault tolerance with ability to resume jobs from last successful phase
Key benefits include full automation, scalability, concurrent job execution, proactive error notifications, and support for multiple data formats. The solution helps businesses overcome traditional ETL tool limitations by leveraging AWS managed services.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
2025
2024
2024
2025
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.