Evaluate models or RAG systems using Amazon Bedrock Evaluations – Now generally available

Machine Learning Blog

Amazon has announced the general availability of Amazon Bedrock Evaluations, which allows organizations to systematically evaluate machine learning models and Retrieval Augmented Generation (RAG) systems across different environments. Key highlights include:

Bring Your Own Inference (BYOI) capabilities for evaluating models and RAG systems from any provider
New citation metrics for RAG systems: citation precision and citation coverage
Ability to evaluate model responses using LLM-as-a-judge (LLMaaJ) workflow
Support for evaluating retrieval and generation quality across different platforms
Flexible evaluation using Amazon Bedrock console, Python SDK, and APIs

The release enables organizations to assess generative AI application performance systematically, regardless of where the models or systems are deployed, helping improve quality and reliability.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Mar 20
2025

Amazon Bedrock now supports RAG Evaluation (generally available)

Apr 23
2024

Amazon Bedrock model evaluation is now generally available

Mar 14
2025

Evaluating RAG applications with Amazon Bedrock knowledge base evaluation

Oct 9
2024

Amazon Bedrock Model Evaluation now supports evaluating custom models

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Evaluate models or RAG systems using Amazon Bedrock Evaluations – Now generally available

Related articles