Introducing latency-optimized inference for foundation models in Amazon Bedrock
News
Amazon Bedrock introduces latency-optimized inference for foundation models, enhancing AI application performance.
- Supports Anthropic's Claude 3.5 Haiku and Meta's Llama 3.1 405B and 70B models
- Provides faster response times without compromising accuracy
- Leverages AWS Trainium2 and advanced software optimizations
- Requires no additional setup or model fine-tuning
- Particularly beneficial for latency-sensitive applications like chatbots
The feature is currently available in the US East (Ohio) Region, offering improved inference speed for generative AI applications.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
Mar 5
2025
2025
Announcing latency-optimized inference for Amazon Nova Pro foundation model in Amazon Bedrock
Dec 23
2024
2024
Amazon Bedrock Agents, Flows, and Knowledge Bases now supports Latency Optimized Models
Nov 6
2024
2024
Integrate foundation models into your code with Amazon Bedrock
Jan 28
2025
2025
Optimizing AI responsiveness: A practical guide to Amazon Bedrock latency-optimized inference
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.