Efficiently fine-tune the ESM-2 protein language model with Amazon SageMaker
Machine Learning Blog
This article discusses how to efficiently fine-tune the ESM-2 protein language model (pLM) using Amazon SageMaker to predict protein subcellular localization. Specifically, the article covers:
- An introduction to protein language models and their use in life sciences research
- Four methods to improve the efficiency of fine-tuning large models like ESM-2: weighted training classes, gradient accumulation, gradient checkpointing, and Low-Rank Adaptation (LoRA)
- How to prepare the training data and create a training script for SageMaker
- Submitting a SageMaker training job and comparing the results of different efficiency methods
- Conclusion highlighting the benefits of fine-tuning and efficient methods for adapting large language models
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
May 31
2024
2024
Pre-training genomic language models using AWS HealthOmics and Amazon SageMaker
May 29
2024
2024
Fine-tune large multimodal models using Amazon SageMaker
Nov 27
2024
2024
Efficiently train models with large sequence lengths using Amazon SageMaker model parallel
Jun 28
2024
2024
EvolutionaryScale’s ESM3, a frontier language model family for biology, now available on AWS
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.