Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

Machine Learning Blog

This article discusses how to evaluate large language models (LLMs) using Amazon SageMaker managed MLflow and FMEval, focusing on tracking and assessing model performance across various dimensions.

FMEval is an open-source library for evaluating foundation models with three main components:
- Data config
- Model runner
- Evaluation algorithm
SageMaker MLflow provides a tracking server for managing machine learning experiments
The approach enables systematic evaluation of LLMs for:
- Accuracy
- Toxicity
- Factual knowledge
Supports evaluations for models from Amazon Bedrock and SageMaker JumpStart

The method helps developers create a robust, scalable workflow for assessing LLM performance, tracking results, and making data-driven decisions in generative AI development.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Jul 24
2024

LLM experimentation at scale using Amazon SageMaker Pipelines and MLflow

Apr 22
2025

Supercharge your LLM performance with Amazon SageMaker Large Model Inference container v15

Apr 24
2024

Improve LLM performance with human and AI feedback on Amazon SageMaker for Amazon Engineering

Jun 19
2024

Amazon SageMaker now offers a fully managed MLflow Capability

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

Related articles