Lower cost and latency for AI using Amazon ElastiCache as a semantic cache with Amazon Bedrock

Database Blog

This article explains how to implement semantic caching using Amazon ElastiCache for Valkey with Amazon Bedrock to reduce AI application costs and latency.

Semantic caching reuses LLM responses for identical or semantically similar queries using vector embeddings
Experiments showed up to 86% LLM cost reduction and 88% latency improvement with semantic caching
ElastiCache for Valkey provides microsecond-latency vector search with 95%+ recall at best price-performance
Solution uses Amazon Titan embeddings, Amazon Nova Premier LLM, and LangGraph orchestration
Cache hits return millisecond responses without LLM inference; cache misses invoke LLM and store results
Evaluation on 63,796 real chatbot queries achieved 91% accuracy at 0.75 similarity threshold
Best practices include caching stable responses, handling multi-turn conversations, setting TTLs, and personalizing outputs

Semantic caching with ElastiCache enables significant cost and performance improvements for generative AI applications by intelligently reusing cached responses for similar queries.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Nov 18
2025

New Amazon Bedrock service tiers help you match AI workload performance with cost

Aug 16
2024

Improve speed and reduce cost for generative AI workloads with a persistent semantic cache in Amazon MemoryDB

Nov 26
2024

Build a read-through semantic cache with Amazon OpenSearch Serverless and Amazon Bedrock

Jun 17
2026

Introducing Amazon Bedrock Managed Knowledge Base for faster, more accurate enterprise AI applications

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Lower cost and latency for AI using Amazon ElastiCache as a semantic cache with Amazon Bedrock

Related articles