How Synthesia optimizes generative AI video inference on Amazon EC2 G7e instances
Architecture Blog
This article explains how Synthesia optimizes AI video generation on AWS EC2 G7e instances using an asynchronous frame decoding pipeline.
- Synthesia creates AI video avatars using latent diffusion models on GPU-intensive EC2 G7e instances
- Traditional sequential decoding causes GPU stalls when transferring frames from GPU to host memory
- Asynchronous Frame Generation Pipeline overlaps GPU compute, data transfers, and host-side processing
- Implementation uses dual CUDA streams, pinned memory buffers, and dedicated worker threads
- Benchmarks show GPU kernel utilization increased from 82% to 99.9% on G7e instances
- Results: 8.2% latency reduction, approximately $896 savings per 1,000 hours of video decoding
- Technique applicable to any chunked video generation pipeline transferring frames to host memory
The optimization eliminates GPU stalls through architectural changes without affecting model weights or inference quality, providing significant cost savings for video generation workloads.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
Apr 20
2026
2026
Accelerate Generative AI Inference on Amazon SageMaker AI with G7e Instances
Jul 15
2025
2025
5 ways Prime Video improves the viewing experience with generative AI on AWS
Oct 29
2024
2024
Build a video insights and summarization engine using generative AI with Amazon Bedrock
Mar 14
2024
2024
Super slow motion video creation using generative AI on AWS
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.