Use Amazon MSK Connect and Iceberg Kafka Connect to build a real-time data lake

Big Data Blog

This article demonstrates how to build a real-time data lake using Amazon MSK Connect and Iceberg Kafka Connect for continuous data synchronization from transactional databases to Apache Iceberg tables on Amazon S3.

Captures CDC data from Amazon RDS MySQL using Debezium connector
Streams data through Amazon MSK to Iceberg tables with exactly-once delivery
Supports single-table and multi-table synchronization modes
Automatically handles schema evolution and field changes
Achieves approximately 10,000 records per second per MCU throughput
Requires custom Kafka Connect plugins built from open source
Integrates with AWS Glue Data Catalog for table management
Includes compaction workflows to optimize query performance

This solution provides a fully managed, low-operational-complexity approach for real-time data ingestion into data lakes, suitable for high-volume transactional workloads.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Jul 30
2026

Deliver Apache Kafka data to streaming tables for Apache Iceberg with Amazon MSK Express brokers

Apr 3
2024

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Jul 14
2025

Build real-time data lakes with Snowflake and Amazon S3 Tables

Jul 30
2026

Amazon MSK Express brokers now deliver data to streaming tables for Apache Iceberg

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Use Amazon MSK Connect and Iceberg Kafka Connect to build a real-time data lake

Related articles