Run complex queries on massive amounts of data stored on your Amazon DocumentDB clusters using Apache Spark running on Amazon EMR
Database Blog
This article explains how to set up Amazon EMR to run complex queries on large amounts of data stored in Amazon DocumentDB clusters using Apache Spark.
Specifically, the article covers:
- Creating an Amazon DocumentDB cluster and loading data
- Creating IAM roles for EMR
- Creating an EMR cluster with Apache Spark and configuring it to connect to the DocumentDB cluster
- Running a sample Spark application and query on the DocumentDB data
- Cleaning up the resources
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
Apr 16
2024
2024
Scale write performance on Amazon DocumentDB elastic clusters
Jun 19
2024
2024
Unlock the power of parallel indexing in Amazon DocumentDB
Jun 21
2024
2024
Run Apache Spark 3.5.1 workloads 4.5 times faster with Amazon EMR runtime for Apache Spark
Nov 27
2025
2025
Run Apache Spark and Apache Iceberg write jobs 2x faster with Amazon EMR
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.