Blog
This article announces the integration of Ray, an open source unified compute framework, with AWS Trainium and AWS Inferentia accelerators on Amazon EC2 instances. This integration enables automatic detection and utilization of these accelerators for improved performance and cost-efficiency in scaling machine learning and generative AI workloads.
Specifically, the article covers:
- Ray integration with AWS Trainium on Trn1 instances for distributed training and fine-tuning of PyTorch models
- Ray integration with AWS Inferentia on Inf2 instances for low-latency and low-cost inference pipelines via tensor parallelism
- Support for sharding large language models across Trainium/Inferentia accelerators using tensor parallelism
- Integration with Transformers Neuron for large language model inference on Neuron hardware
- An example of deploying the Open LLAMA-3B language model on an Inferentia instance using Ray Serve
- Upcoming integration of AWS Trainium with the high-level Ray Train API
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.