Amazon Bedrock's Model Evaluation LLM-as-a-judge is now generally available, allowing users to evaluate and compare AI models using customizable quality and responsibility metrics with unprecedented flexibility.


<div>
<p>
Amazon Bedrock Model Evaluation's LLM-as-a-judge capability is now generally available, offering comprehensive model evaluation tools.
</p>
<ul>
<li>Allows evaluating and comparing models using other LLMs as judges</li>
<li>Supports quality metrics like correctness, completeness, and professional tone</li>
<li>Includes responsible AI metrics such as harmfulness and answer refusal</li>
<li>Can evaluate all Amazon Bedrock models, including serverless and marketplace models</li>
<li>New feature allows "bring your own inference responses" from any model or system</li>
</ul>
<p>
The service provides human-like evaluation quality at a lower cost and significantly reduces evaluation time, making model selection more efficient and comprehensive.
</p>
</div>


Amazon Bedrock Model Evaluation LLM-as-a-judge is now generally available

Related articles

Related articles

Dec 2
2024
Amazon Bedrock Model Evaluation now includes LLM-as-a-judge (Preview)

Feb 12
2025
LLM-as-a-judge on Amazon Bedrock Model Evaluation

Apr 23
2024
Amazon Bedrock model evaluation is now generally available

Dec 2
2024
New RAG evaluation and LLM-as-a-judge capabilities in Amazon Bedrock