Ray Integration for AWS Trainium and AWS Inferentia is Now Available

Blog

This article announces the integration of Ray, an open source unified compute framework, with AWS Trainium and AWS Inferentia accelerators on Amazon EC2 instances. This integration enables automatic detection and utilization of these accelerators for improved performance and cost-efficiency in scaling machine learning and generative AI workloads.

Specifically, the article covers:

Ray integration with AWS Trainium on Trn1 instances for distributed training and fine-tuning of PyTorch models
Ray integration with AWS Inferentia on Inf2 instances for low-latency and low-cost inference pipelines via tensor parallelism
Support for sharding large language models across Trainium/Inferentia accelerators using tensor parallelism
Integration with Transformers Neuron for large language model inference on Neuron hardware
An example of deploying the Open LLAMA-3B language model on an Inferentia instance using Ray Serve
Upcoming integration of AWS Trainium with the high-level Ray Train API

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles