Improving throughput of serverless streaming workloads for Kafka

Compute Blog

This article provides guidance on optimizing AWS Lambda for processing high-volume Apache Kafka streams with focus on throughput and scaling.

Use Provisioned Mode for bursty workloads to ensure predictable, fast scaling instead of on-demand mode
Apply ESM filtering to drop irrelevant records before Lambda invocation, reducing cost and concurrency
Configure batch window and batch size to process more records per invocation and improve efficiency
Optimize handler code by reducing per-record work and increasing memory allocation for better CPU
Monitor OffsetLag, Duration, Concurrency, and Errors metrics to detect issues and guide tuning
Single provisioned poller can process up to 5 MB/s of Kafka data
Follow iterative optimization loop: baseline, filter, batch, speed up, test spikes, alert, re-evaluate

The article emphasizes that effective Kafka-Lambda optimization requires understanding the poll-filter-batch-invoke workflow and using observability metrics to drive configuration decisions.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Mar 21
2024

Build an end-to-end serverless streaming pipeline with Apache Kafka on Amazon MSK using Python

Jan 16
2024

Real-time serverless data ingestion from your Kafka clusters into Amazon Timestream using Kafka Connect

Aug 2
2024

Improve Apache Kafka scalability and resiliency using Amazon MSK tiered storage

Jun 3
2024

Optimize write throughput for Amazon Kinesis Data Streams

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Improving throughput of serverless streaming workloads for Kafka

Related articles