Use Amazon MSK Connect and Iceberg Kafka Connect to build a real-time data lake
Big Data Blog
This article demonstrates how to build a real-time data lake using Amazon MSK Connect and Iceberg Kafka Connect for continuous data synchronization from transactional databases to Apache Iceberg tables on Amazon S3.
- Captures CDC data from Amazon RDS MySQL using Debezium connector
- Streams data through Amazon MSK to Iceberg tables with exactly-once delivery
- Supports single-table and multi-table synchronization modes
- Automatically handles schema evolution and field changes
- Achieves approximately 10,000 records per second per MCU throughput
- Requires custom Kafka Connect plugins built from open source
- Integrates with AWS Glue Data Catalog for table management
- Includes compaction workflows to optimize query performance
This solution provides a fully managed, low-operational-complexity approach for real-time data ingestion into data lakes, suitable for high-volume transactional workloads.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
2024
2025
2025
2025
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.