Home icon

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

Machine Learning Blog



This article discusses how to evaluate large language models (LLMs) using Amazon SageMaker managed MLflow and FMEval, focusing on tracking and assessing model performance across various dimensions.

  • FMEval is an open-source library for evaluating foundation models with three main components:
    • Data config
    • Model runner
    • Evaluation algorithm
  • SageMaker MLflow provides a tracking server for managing machine learning experiments
  • The approach enables systematic evaluation of LLMs for:
    • Accuracy
    • Toxicity
    • Factual knowledge
  • Supports evaluations for models from Amazon Bedrock and SageMaker JumpStart

The method helps developers create a robust, scalable workflow for assessing LLM performance, tracking results, and making data-driven decisions in generative AI development.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Jul 24
2024
LLM experimentation at scale using Amazon SageMaker Pipelines and MLflow
Apr 22
2025
Supercharge your LLM performance with Amazon SageMaker Large Model Inference container v15
Apr 24
2024
Improve LLM performance with human and AI feedback on Amazon SageMaker for Amazon Engineering
Jun 19
2024
Amazon SageMaker now offers a fully managed MLflow Capability

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.