Home icon

Cost-effective multilingual audio transcription at scale with Parakeet-TDT and AWS Batch

Machine Learning Blog



This article demonstrates building a cost-effective, scalable audio transcription pipeline using NVIDIA's Parakeet-TDT model with AWS Batch and GPU instances.

  • Parakeet-TDT-0.6B-v3 supports 25 European languages with automatic language detection
  • Achieves inference speeds of 0.24 seconds per minute of audio
  • Event-driven architecture using EventBridge triggers AWS Batch jobs on S3 uploads
  • Costs fractions of a cent per hour: $0.00011 on-demand, $0.00005 with Spot Instances
  • Buffered streaming inference enables processing long audio on standard g6.xlarge instances
  • Local attention mode supports up to 3 hours of audio on 80GB VRAM
  • EC2 Spot Instances provide up to 90% cost savings for stateless, idempotent workloads
  • CloudFormation template automates infrastructure deployment with GPU monitoring

This solution provides a practical, cost-efficient alternative to managed ASR services for high-volume multilingual transcription workloads.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Sep 30
2024
Reducing transcription costs by 60% using AWS AI/ML services
May 28
2025
Enhanced Performance for Whisper Audio Transcription on AWS Batch and AWS Inferentia
Sep 16
2024
Whisper audio transcription powered by AWS Batch and AWS Inferentia
May 29
2025
Automating audio editing and transcoding using AWS

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.