Home icon

Deploying an EMR cluster on AWS Outposts to process data from an on-premises database

Compute Blog



This article explains how to deploy an Amazon EMR cluster on AWS Outposts to process data from an on-premises PostgreSQL database using a PySpark Step, while keeping network traffic local. It discusses the architecture overview, prerequisites, and steps to deploy the EMR cluster, including setting up networking, storage, and security configurations.

Specifically, the article covers:

  • Architecture overview with networking, storage, and security components
  • Prerequisites for deploying the EMR cluster on AWS Outposts
  • Step-by-step instructions for deploying the EMR cluster using the AWS Management Console
  • Setting up a PySpark Step to process data from the on-premises PostgreSQL database
  • Submitting the PySpark Step to the EMR cluster
  • Cleaning up resources after use


Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Jan 29
2025
Hybrid big data analytics with Amazon EMR on AWS Outposts
Aug 18
2025
Achieve low-latency data processing with Amazon EMR on AWS Local Zones
Nov 21
2024
Run high-availability long-running clusters with Amazon EMR instance fleets
Sep 10
2024
Amazon EMR on EC2 improves cluster launch experience with intelligent subnet selection

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.