Home icon

Run complex queries on massive amounts of data stored on your Amazon DocumentDB clusters using Apache Spark running on Amazon EMR

Database Blog



This article explains how to set up Amazon EMR to run complex queries on large amounts of data stored in Amazon DocumentDB clusters using Apache Spark.

Specifically, the article covers:

  • Creating an Amazon DocumentDB cluster and loading data
  • Creating IAM roles for EMR
  • Creating an EMR cluster with Apache Spark and configuring it to connect to the DocumentDB cluster
  • Running a sample Spark application and query on the DocumentDB data
  • Cleaning up the resources


Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Apr 16
2024
Scale write performance on Amazon DocumentDB elastic clusters
Jun 19
2024
Unlock the power of parallel indexing in Amazon DocumentDB
Jun 21
2024
Run Apache Spark 3.5.1 workloads 4.5 times faster with Amazon EMR runtime for Apache Spark
Nov 27
2025
Run Apache Spark and Apache Iceberg write jobs 2x faster with Amazon EMR

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.