Home icon

Reduce costs and latency with Amazon Bedrock Intelligent Prompt Routing and prompt caching (preview)

AWS News Blog



Amazon Bedrock has introduced two new preview capabilities to help reduce costs and latency for generative AI applications:

  • Amazon Bedrock Intelligent Prompt Routing: Dynamically routes requests between different-sized models in the same family to optimize quality and cost
  • Prompt Caching: Allows caching of frequently used context across multiple model invocations

Key features and benefits include:

  • Can reduce costs by up to 30% with prompt routing
  • Prompt caching can reduce costs by up to 90% and latency by up to 85%
  • Currently supports English language prompts
  • Available in preview in US East and US West regions
  • Works with models like Anthropic Claude and Meta Llama

These capabilities make it easier to build cost-effective and high-performing generative AI applications by intelligently managing model selection and caching.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Apr 22
2025
Use Amazon Bedrock Intelligent Prompt Routing for cost and latency benefits
Dec 4
2024
Amazon Bedrock Intelligent Prompt Routing is now available in preview
Dec 4
2024
Amazon Bedrock announces preview of prompt caching
Apr 22
2025
Amazon Bedrock Intelligent Prompt Routing is now generally available

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.