Home icon

Efficiently fine-tune the ESM-2 protein language model with Amazon SageMaker

Machine Learning Blog



This article discusses how to efficiently fine-tune the ESM-2 protein language model (pLM) using Amazon SageMaker to predict protein subcellular localization. Specifically, the article covers:

  • An introduction to protein language models and their use in life sciences research
  • Four methods to improve the efficiency of fine-tuning large models like ESM-2: weighted training classes, gradient accumulation, gradient checkpointing, and Low-Rank Adaptation (LoRA)
  • How to prepare the training data and create a training script for SageMaker
  • Submitting a SageMaker training job and comparing the results of different efficiency methods
  • Conclusion highlighting the benefits of fine-tuning and efficient methods for adapting large language models


Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

May 31
2024
Pre-training genomic language models using AWS HealthOmics and Amazon SageMaker
May 29
2024
Fine-tune large multimodal models using Amazon SageMaker
Nov 27
2024
Efficiently train models with large sequence lengths using Amazon SageMaker model parallel
Jun 28
2024
EvolutionaryScale’s ESM3, a frontier language model family for biology, now available on AWS

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.