Effectively use prompt caching on Amazon Bedrock

Machine Learning Blog

The article discusses the new prompt caching feature on Amazon Bedrock, which helps reduce response latency and costs for generative AI applications by caching frequently used prompt contexts.

Prompt caching can reduce latency up to 85% and costs up to 90%
Works by caching specific portions of prompts across multiple API calls
Best suited for workloads with long, repetitive context like document Q&A, coding assistants, and agentic workflows
Requires careful prompt structuring with static and dynamic content
Available for select Anthropic Claude and Nova models

The feature allows developers to optimize AI application performance by strategically caching prompt prefixes, with CloudWatch metrics providing insights into cache performance and potential cost savings.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Apr 7
2025

Amazon Bedrock announces general availability of prompt caching

Dec 4
2024

Amazon Bedrock announces preview of prompt caching

Jan 26
2026

Amazon Bedrock now supports 1-hour duration for prompt caching

Apr 23
2025

Prompt Optimization in Amazon Bedrock now generally available

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Effectively use prompt caching on Amazon Bedrock

Related articles