Real-time CDC from Aurora PostgreSQL to Amazon S3 Tables using Debezium and Firehose
Big Data Blog
This article demonstrates building a real-time CDC pipeline from Aurora PostgreSQL to Apache Iceberg tables in Amazon S3 Tables using Debezium, MSK, Firehose, and Lambda.
- Debezium captures row-level changes from Aurora PostgreSQL logical replication WAL
- ByLogicalTableRouter SMT consolidates multiple tables into single MSK topic
- Lambda transforms Debezium envelope to flattened JSON with routing metadata
- Firehose delivers records to S3 Tables with automatic Iceberg operations
- S3 Tables handles compaction and snapshot management automatically
- Supports inserts, updates, deletes with row-level Iceberg operations
- Lake Formation provides fine-grained access control for analytics teams
- Infrastructure deployed via AWS CDK with six modular stacks
- Single Firehose stream serves multiple destination tables, reducing costs
- Iceberg time travel enables querying historical table states
The solution provides a fully managed, governed lakehouse architecture enabling near real-time analytics on transactional data without impacting OLTP performance.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
Jun 3
2026
2026
Implementing real-time change data capture with Debezium for Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL
Mar 23
2026
2026
Extract data from Amazon Aurora MySQL to Amazon S3 Tables in Apache Iceberg format
Aug 16
2023
2023
Implement a serverless CDC process with Apache Iceberg using Amazon DynamoDB and Amazon Athena
May 30
2023
2023
Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.