Home icon

Pre-training genomic language models using AWS HealthOmics and Amazon SageMaker

Machine Learning Blog



This article discusses pre-training genomic language models, specifically HyenaDNA, using AWS HealthOmics and Amazon SageMaker.

Specifically, the article covers:

  • Background on genomic language models like DNABERT, Nucleotide Transformer, and HyenaDNA
  • AWS HealthOmics and its capabilities for storing and organizing genomic data
  • Using Amazon SageMaker for training machine learning models like HyenaDNA
  • A step-by-step solution for pre-training HyenaDNA on genomic data, including data preparation, loading data into HealthOmics, configuring and running the training job on SageMaker, deploying the trained model, and performing inference
  • Results and evaluation metrics from pre-training HyenaDNA on a mouse genome dataset
  • Conclusion highlighting the benefits of pre-training genomic models and using AWS services for this purpose


Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Feb 6
2024
Deploy large language models for a healthtech use case on Amazon SageMaker
Mar 18
2024
Protein language model training with NVIDIA BioNeMo framework on AWS ParallelCluster
Mar 6
2024
Efficiently fine-tune the ESM-2 protein language model with Amazon SageMaker
Nov 20
2025
Accelerating genomics variant interpretation with AWS HealthOmics and Amazon Bedrock AgentCore

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.