Home icon

Build a read-through semantic cache with Amazon OpenSearch Serverless and Amazon Bedrock

Machine Learning Blog



The article discusses building a read-through semantic cache using Amazon OpenSearch Serverless and Amazon Bedrock to optimize large language model (LLM) performance and reduce costs.

  • Addresses latency and cost challenges in generative AI applications
  • Uses a serverless cache that stores and retrieves semantically similar queries
  • Leverages Amazon Bedrock embedding models to transform queries into vector embeddings
  • Enables quick lookups of previously generated responses, reducing LLM call times
  • Demonstrates significant performance improvements, reducing response times from 2 seconds to under 0.5 seconds

The solution provides a flexible, cost-effective approach to improving LLM-based applications by implementing a semantic caching strategy that can be customized based on specific use cases.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Jul 2
2024
Create an end-to-end serverless digital assistant for semantic search with Amazon Bedrock
Jul 15
2024
Amazon OpenSearch Serverless levels up speed and efficiency with smart caching
Aug 4
2025
Amazon OpenSearch Serverless introduces automatic semantic enrichment
Nov 26
2025
Lower cost and latency for AI using Amazon ElastiCache as a semantic cache with Amazon Bedrock

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.