Efficiently fine-tune the ESM-2 protein language model with Amazon SageMaker

Machine Learning Blog

This article discusses how to efficiently fine-tune the ESM-2 protein language model (pLM) using Amazon SageMaker to predict protein subcellular localization. Specifically, the article covers:

An introduction to protein language models and their use in life sciences research
Four methods to improve the efficiency of fine-tuning large models like ESM-2: weighted training classes, gradient accumulation, gradient checkpointing, and Low-Rank Adaptation (LoRA)
How to prepare the training data and create a training script for SageMaker
Submitting a SageMaker training job and comparing the results of different efficiency methods
Conclusion highlighting the benefits of fine-tuning and efficient methods for adapting large language models

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

May 31
2024

Pre-training genomic language models using AWS HealthOmics and Amazon SageMaker

Jul 1
2026

Accelerate protein design with BoltzGen on Amazon SageMaker AI

May 29
2024

Fine-tune large multimodal models using Amazon SageMaker

Nov 27
2024

Efficiently train models with large sequence lengths using Amazon SageMaker model parallel

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Efficiently fine-tune the ESM-2 protein language model with Amazon SageMaker

Related articles