How Synthesia optimizes generative AI video inference on Amazon EC2 G7e instances

Architecture Blog

This article explains how Synthesia optimizes AI video generation on AWS EC2 G7e instances using an asynchronous frame decoding pipeline.

Synthesia creates AI video avatars using latent diffusion models on GPU-intensive EC2 G7e instances
Traditional sequential decoding causes GPU stalls when transferring frames from GPU to host memory
Asynchronous Frame Generation Pipeline overlaps GPU compute, data transfers, and host-side processing
Implementation uses dual CUDA streams, pinned memory buffers, and dedicated worker threads
Benchmarks show GPU kernel utilization increased from 82% to 99.9% on G7e instances
Results: 8.2% latency reduction, approximately $896 savings per 1,000 hours of video decoding
Technique applicable to any chunked video generation pipeline transferring frames to host memory

The optimization eliminates GPU stalls through architectural changes without affecting model weights or inference quality, providing significant cost savings for video generation workloads.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Apr 20
2026

Accelerate Generative AI Inference on Amazon SageMaker AI with G7e Instances

Jul 15
2025

5 ways Prime Video improves the viewing experience with generative AI on AWS

Oct 29
2024

Build a video insights and summarization engine using generative AI with Amazon Bedrock

Mar 14
2024

Super slow motion video creation using generative AI on AWS

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

How Synthesia optimizes generative AI video inference on Amazon EC2 G7e instances

Related articles