Home icon

Amazon SageMaker adds new inference capabilities to help reduce foundation model deployment costs and latency

Blog



The article introduces new Amazon SageMaker inference capabilities that help reduce foundation model deployment costs and latency.

Specifically, the article covers:

  • Key components of the new inference capabilities, including the ability to deploy multiple foundation models on the same SageMaker endpoint and control resource allocation for each model
  • How to use the new capabilities from SageMaker Studio, Python SDK, AWS SDKs, AWS CLI, and CloudFormation
  • A demo showing how to deploy two large language models (Dolly v2 7B and FLAN-T5 XXL) on a SageMaker endpoint using the new inference capabilities
  • Benefits such as improved resource utilization, reduced deployment costs by 50% on average, and 20% lower inference latency on average
  • Availability and pricing details for the new capabilities


Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.