Home icon

Secure Apache Spark writes to Amazon S3 on Amazon EMR with dynamic AWS KMS encryption

Big Data Blog



This article explains how to securely write Apache Spark data to Amazon S3 with dynamic AWS KMS encryption on Amazon EMR, addressing file system caching challenges in multi-tenant environments.

  • File system clients cache encryption settings, causing incorrect key reuse across writes
  • Method 1: Disable file system cache for fresh S3 client creation per write
  • Method 2: Use separate Spark applications/sessions for each distinct encryption key
  • Method 3: Configure S3 bucket-level default encryption with single KMS key
  • Method 1 suits testing and low-volume jobs; impacts performance with increased latency
  • Method 2 recommended for production; maintains caching benefits and strong isolation
  • Method 3 simplest approach; limited to single key per bucket
  • Choose based on security requirements, performance needs, and operational constraints

The article provides implementation steps and trade-offs for each approach to handle multiple KMS keys when writing Spark outputs to S3.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Nov 27
2025
Apache Spark encryption performance improvement with Amazon EMR 7.9
Dec 18
2025
Modernize Apache Spark workflows using Spark Connect on Amazon EMR on Amazon EC2
May 27
2026
Amazon EMR now supports Apache Spark 4.0.2 in general availability
Jul 30
2025
Optimize Amazon EMR runtime for Apache Spark with EMR S3A

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.