Pre-training genomic language models using AWS HealthOmics and Amazon SageMaker

Machine Learning Blog

This article discusses pre-training genomic language models, specifically HyenaDNA, using AWS HealthOmics and Amazon SageMaker.

Specifically, the article covers:

Background on genomic language models like DNABERT, Nucleotide Transformer, and HyenaDNA
AWS HealthOmics and its capabilities for storing and organizing genomic data
Using Amazon SageMaker for training machine learning models like HyenaDNA
A step-by-step solution for pre-training HyenaDNA on genomic data, including data preparation, loading data into HealthOmics, configuring and running the training job on SageMaker, deploying the trained model, and performing inference
Results and evaluation metrics from pre-training HyenaDNA on a mouse genome dataset
Conclusion highlighting the benefits of pre-training genomic models and using AWS services for this purpose

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Feb 6
2024

Deploy large language models for a healthtech use case on Amazon SageMaker

Mar 18
2024

Protein language model training with NVIDIA BioNeMo framework on AWS ParallelCluster

Mar 6
2024

Efficiently fine-tune the ESM-2 protein language model with Amazon SageMaker

Nov 20
2025

Accelerating genomics variant interpretation with AWS HealthOmics and Amazon Bedrock AgentCore

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Pre-training genomic language models using AWS HealthOmics and Amazon SageMaker

Related articles