AWS optimized PyTorch's torch.compile feature for AWS Graviton3 processors, yielding up to 2x better performance for Hugging Face model inference and up to 1.35x better performance for TorchBench model inference compared to eager mode across various models on Graviton3-based EC2 instances.

<div>
<p>This article discusses optimizing PyTorch's torch.compile feature for AWS Graviton3 processors, resulting in up to 2x performance improvement for Hugging Face model inference and up to 1.35x for TorchBench model inference compared to eager mode on Graviton3-based EC2 instances.</p>
<p>Specifically, the article covers:</p>
<ul>
<li>Why torch.compile and its goals</li>
<li>Optimizations done by the AWS Graviton team</li>
<li>Performance results on TorchBench and Hugging Face models</li>
<li>How to run inference with torch.compile on Graviton3 instances</li>
<li>Conclusion and future work</li>
</ul>
</div>


Accelerated PyTorch inference with torch.compile on AWS Graviton processors

Related articles

Related articles

May 15
2024
Accelerate NLP inference with ONNX Runtime on AWS Graviton processors

Feb 29
2024
Accelerating large-scale neural network training on CPUs with ThirdAI and AWS Graviton

Jul 15
2025
Accelerate generative AI inference with NVIDIA Dynamo and Amazon EKS

Mar 27
2024
Accelerating simulated quantum annealing on AWS Graviton processors