<div>
<p>This article explains how to set up Amazon EMR to run complex queries on large amounts of data stored in Amazon DocumentDB clusters using Apache Spark.</p>
<p>Specifically, the article covers:</p>
<ul>
<li>Creating an Amazon DocumentDB cluster and loading data</li>
<li>Creating IAM roles for EMR</li>
<li>Creating an EMR cluster with Apache Spark and configuring it to connect to the DocumentDB cluster</li>
<li>Running a sample Spark application and query on the DocumentDB data</li>
<li>Cleaning up the resources</li>
</ul>
</div>


Run complex queries on massive amounts of data stored on your Amazon DocumentDB clusters using Apache Spark running on Amazon EMR

Related articles

Related articles

Apr 16
2024
Scale write performance on Amazon DocumentDB elastic clusters

Jun 19
2024
Unlock the power of parallel indexing in Amazon DocumentDB

Jun 21
2024
Run Apache Spark 3.5.1 workloads 4.5 times faster with Amazon EMR runtime for Apache Spark

Nov 27
2025
Run Apache Spark and Apache Iceberg write jobs 2x faster with Amazon EMR