Home icon

Distributed training and efficient scaling with the Amazon SageMaker Model Parallel and Data Parallel Libraries

Machine Learning Blog



This article discusses the performance benefits of Amazon SageMaker Model Parallel (SMP) and Data Parallel (SMDDP) libraries for training large language models efficiently on AWS SageMaker. It demonstrates near-linear scaling efficiencies for SageMaker up to 128 instances on ml.p4d.24xlarge, with benchmarks on various model sizes (7B, 13B, and 70B parameters) of the Llama 2 model.

Specifically, the article covers:

  • Near-linear scaling with SageMaker, showing robust scaling efficiencies across different model sizes and cluster sizes
  • SMP 2.0 performance on the 70B Llama 2 model, analyzing contributions from SMDDP, hybrid sharding, Transformer Engine integration, and activation offloading
  • Enabling training with long sequences up to 32,768 using SMP tensor parallelism
  • Conclusion highlighting SageMaker as a powerful tool for efficient large language model training


Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Nov 27
2024
Efficiently train models with large sequence lengths using Amazon SageMaker model parallel
Oct 3
2025
Building ML excellence: A practical training guide for Amazon SageMaker AI
Oct 21
2025
Accelerate large-scale AI training with Amazon SageMaker HyperPod training operator
Dec 13
2024
How Amazon trains sequential ensemble models at scale with Amazon SageMaker Pipelines

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.