Apache Spark workloads on Amazon EKS can achieve 60% faster runtime and 30% higher CPU utilization by optimizing S3 data byte ranges, Kubernetes Pod resources, and DNS configuration settings.


<div><p>This article provides optimization techniques for Apache Spark workloads running on Amazon EKS with S3 storage, achieving 60% runtime reduction and 30% CPU utilization improvement.</p><ul><li>Adjust parquet block size to 512 MB for larger sequential I/O reads</li><li>Increase parquet read allocation size to 128 MB from default 8 MB</li><li>Set maxPartitionBytes to 512 MB to enable efficient data partitioning</li><li>Optimize Kubernetes Pod vCPU requests based on actual usage patterns</li><li>Reduce Kubernetes DNS ndots value from 5 to 2 for faster resolution</li><li>Monitor job runtime, CPU utilization, and network usage throughout tuning</li></ul><p>By implementing these three optimization areas—data byte ranges, Kubernetes resources, and DNS configuration—job runtime reduced from 10 to 5 minutes with 82% throughput increase.</p></div>


Related articles