Reduce costs and latency with Amazon Bedrock Intelligent Prompt Routing and prompt caching (preview)

AWS News Blog

Amazon Bedrock has introduced two new preview capabilities to help reduce costs and latency for generative AI applications:

Amazon Bedrock Intelligent Prompt Routing: Dynamically routes requests between different-sized models in the same family to optimize quality and cost
Prompt Caching: Allows caching of frequently used context across multiple model invocations

Key features and benefits include:

Can reduce costs by up to 30% with prompt routing
Prompt caching can reduce costs by up to 90% and latency by up to 85%
Currently supports English language prompts
Available in preview in US East and US West regions
Works with models like Anthropic Claude and Meta Llama

These capabilities make it easier to build cost-effective and high-performing generative AI applications by intelligently managing model selection and caching.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Apr 22
2025

Use Amazon Bedrock Intelligent Prompt Routing for cost and latency benefits

Dec 4
2024

Amazon Bedrock Intelligent Prompt Routing is now available in preview

Dec 4
2024

Amazon Bedrock announces preview of prompt caching

Apr 22
2025

Amazon Bedrock Intelligent Prompt Routing is now generally available

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Reduce costs and latency with Amazon Bedrock Intelligent Prompt Routing and prompt caching (preview)

Related articles