Home icon

Effectively use prompt caching on Amazon Bedrock

Machine Learning Blog



The article discusses the new prompt caching feature on Amazon Bedrock, which helps reduce response latency and costs for generative AI applications by caching frequently used prompt contexts.

  • Prompt caching can reduce latency up to 85% and costs up to 90%
  • Works by caching specific portions of prompts across multiple API calls
  • Best suited for workloads with long, repetitive context like document Q&A, coding assistants, and agentic workflows
  • Requires careful prompt structuring with static and dynamic content
  • Available for select Anthropic Claude and Nova models

The feature allows developers to optimize AI application performance by strategically caching prompt prefixes, with CloudWatch metrics providing insights into cache performance and potential cost savings.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Apr 7
2025
Amazon Bedrock announces general availability of prompt caching
Dec 4
2024
Amazon Bedrock announces preview of prompt caching
Jan 26
2026
Amazon Bedrock now supports 1-hour duration for prompt caching
Apr 23
2025
Prompt Optimization in Amazon Bedrock now generally available

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.