Cost-effective multilingual audio transcription at scale with Parakeet-TDT and AWS Batch

Machine Learning Blog

This article demonstrates building a cost-effective, scalable audio transcription pipeline using NVIDIA's Parakeet-TDT model with AWS Batch and GPU instances.

Parakeet-TDT-0.6B-v3 supports 25 European languages with automatic language detection
Achieves inference speeds of 0.24 seconds per minute of audio
Event-driven architecture using EventBridge triggers AWS Batch jobs on S3 uploads
Costs fractions of a cent per hour: $0.00011 on-demand, $0.00005 with Spot Instances
Buffered streaming inference enables processing long audio on standard g6.xlarge instances
Local attention mode supports up to 3 hours of audio on 80GB VRAM
EC2 Spot Instances provide up to 90% cost savings for stateless, idempotent workloads
CloudFormation template automates infrastructure deployment with GPU monitoring

This solution provides a practical, cost-efficient alternative to managed ASR services for high-volume multilingual transcription workloads.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Sep 30
2024

Reducing transcription costs by 60% using AWS AI/ML services

May 28
2025

Enhanced Performance for Whisper Audio Transcription on AWS Batch and AWS Inferentia

Sep 16
2024

Whisper audio transcription powered by AWS Batch and AWS Inferentia

May 29
2025

Automating audio editing and transcoding using AWS

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Cost-effective multilingual audio transcription at scale with Parakeet-TDT and AWS Batch

Related articles