Home icon

Efficiently train models with large sequence lengths using Amazon SageMaker model parallel

Machine Learning Blog



This article discusses how to efficiently train large language models with long sequence lengths using Amazon SageMaker's model parallel (SMP) library, highlighting two key features: context parallelism and FP8 mixed-precision training.

  • Context parallelism partitions model activations along the sequence dimension, enabling training with longer input sequences
  • FP8 mixed-precision training reduces memory and computational requirements by using 8-bit floating point formats
  • Supports models like Llama 3.1, Mixtral, and Mistral
  • Demonstrated using the PubMed scientific papers dataset with a 16,384 token sequence length
  • Throughput improvements observed:
    • Without context parallelism: Out of memory error
    • With context parallelism: 2.03 samples/second
    • With context parallelism and FP8: 3.05 samples/second

The solution enables more efficient training of large language models by addressing memory constraints and computational performance challenges.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Dec 13
2024
How Amazon trains sequential ensemble models at scale with Amazon SageMaker Pipelines
Nov 25
2024
Amazon SageMaker launches Multi-Adapter Model Inference
Apr 16
2024
Distributed training and efficient scaling with the Amazon SageMaker Model Parallel and Data Parallel Libraries
May 29
2024
Fine-tune large multimodal models using Amazon SageMaker

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.