Secure Apache Spark writes to Amazon S3 on Amazon EMR with dynamic AWS KMS encryption

Big Data Blog

This article explains how to securely write Apache Spark data to Amazon S3 with dynamic AWS KMS encryption on Amazon EMR, addressing file system caching challenges in multi-tenant environments.

File system clients cache encryption settings, causing incorrect key reuse across writes
Method 1: Disable file system cache for fresh S3 client creation per write
Method 2: Use separate Spark applications/sessions for each distinct encryption key
Method 3: Configure S3 bucket-level default encryption with single KMS key
Method 1 suits testing and low-volume jobs; impacts performance with increased latency
Method 2 recommended for production; maintains caching benefits and strong isolation
Method 3 simplest approach; limited to single key per bucket
Choose based on security requirements, performance needs, and operational constraints

The article provides implementation steps and trade-offs for each approach to handle multiple KMS keys when writing Spark outputs to S3.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Nov 27
2025

Apache Spark encryption performance improvement with Amazon EMR 7.9

May 27
2026

Amazon EMR now supports Apache Spark 4.0.2 in general availability

Jul 30
2025

Optimize Amazon EMR runtime for Apache Spark with EMR S3A

Jul 30
2026

Lowering AWS KMS decrypt API costs in EMR Spark jobs

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Secure Apache Spark writes to Amazon S3 on Amazon EMR with dynamic AWS KMS encryption

Related articles