Boost inference performance for Mixtral and Llama 2 models with new Amazon SageMaker containers

Machine Learning Blog

This article discusses the new version 0.26.0 of Amazon SageMaker Large Model Inference (LMI) Deep Learning Containers, which offers improved performance and new features for efficient inference with large language models.

Specifically, the article covers:

New models supported, including Mixtral, Llama2-70B, and more
Context window extension using rotary position embeddings (RoPE)
Additional generation details like token log probabilities and finish reasons
Performance improvements and new features for various LMI backends (LMI-Distributed, TensorRT-LLM, vLLM, NeuronX)
A detailed guide on deploying the Mixtral model with LMI and utilizing the new generation details
Conclusion highlighting the benefits of the new LMI capabilities

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Jan 9
2024

Inference Llama 2 models with real-time response streaming using Amazon SageMaker

Aug 21
2024

Fine-tune Meta Llama 3.1 models for generative AI inference using Amazon SageMaker JumpStart

Apr 22
2025

Supercharge your LLM performance with Amazon SageMaker Large Model Inference container v15

Jan 17
2024

Fine-tune and deploy Llama 2 models cost-effectively in Amazon SageMaker JumpStart with AWS Inferentia and AWS Trainium

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Boost inference performance for Mixtral and Llama 2 models with new Amazon SageMaker containers

Related articles