New RAG evaluation and LLM-as-a-judge capabilities in Amazon Bedrock

AWS News Blog

AWS has announced two new evaluation capabilities in Amazon Bedrock for improving generative AI applications:

RAG (Retrieval Augmented Generation) evaluation in Knowledge Bases
Model Evaluation with LLM-as-a-judge capabilities

Key features include:

Automatic knowledge base assessment using large language models
Metrics evaluation for quality and responsible AI criteria
Comparing different configuration settings
Normalized scoring from 0 to 1
Natural language explanations for evaluation results

These capabilities are currently in preview across multiple AWS regions and aim to help developers streamline testing and improve AI-powered applications more efficiently.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Dec 2
2024

Amazon Bedrock Model Evaluation now includes LLM-as-a-judge (Preview)

Dec 2
2024

Amazon Bedrock Knowledge Bases now supports RAG evaluation (Preview)

Mar 20
2025

Amazon Bedrock now supports RAG Evaluation (generally available)

Mar 20
2025

Amazon Bedrock Model Evaluation LLM-as-a-judge is now generally available

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

New RAG evaluation and LLM-as-a-judge capabilities in Amazon Bedrock

Related articles