Home icon

Accelerated PyTorch inference with torch.compile on AWS Graviton processors

Machine Learning Blog



This article discusses optimizing PyTorch's torch.compile feature for AWS Graviton3 processors, resulting in up to 2x performance improvement for Hugging Face model inference and up to 1.35x for TorchBench model inference compared to eager mode on Graviton3-based EC2 instances.

Specifically, the article covers:

  • Why torch.compile and its goals
  • Optimizations done by the AWS Graviton team
  • Performance results on TorchBench and Hugging Face models
  • How to run inference with torch.compile on Graviton3 instances
  • Conclusion and future work


Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

May 15
2024
Accelerate NLP inference with ONNX Runtime on AWS Graviton processors
Feb 29
2024
Accelerating large-scale neural network training on CPUs with ThirdAI and AWS Graviton
Jul 15
2025
Accelerate generative AI inference with NVIDIA Dynamo and Amazon EKS
Mar 27
2024
Accelerating simulated quantum annealing on AWS Graviton processors

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.