Distributed training and efficient scaling with the Amazon SageMaker Model Parallel and Data Parallel Libraries

Machine Learning Blog

This article discusses the performance benefits of Amazon SageMaker Model Parallel (SMP) and Data Parallel (SMDDP) libraries for training large language models efficiently on AWS SageMaker. It demonstrates near-linear scaling efficiencies for SageMaker up to 128 instances on ml.p4d.24xlarge, with benchmarks on various model sizes (7B, 13B, and 70B parameters) of the Llama 2 model.

Specifically, the article covers:

Near-linear scaling with SageMaker, showing robust scaling efficiencies across different model sizes and cluster sizes
SMP 2.0 performance on the 70B Llama 2 model, analyzing contributions from SMDDP, hybrid sharding, Transformer Engine integration, and activation offloading
Enabling training with long sequences up to 32,768 using SMP tensor parallelism
Conclusion highlighting SageMaker as a powerful tool for efficient large language model training

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Nov 27
2024

Efficiently train models with large sequence lengths using Amazon SageMaker model parallel

Oct 3
2025

Building ML excellence: A practical training guide for Amazon SageMaker AI

Oct 21
2025

Accelerate large-scale AI training with Amazon SageMaker HyperPod training operator

Jun 25
2026

Optimize model training on Amazon SageMaker AI with NVIDIA Blackwell

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Distributed training and efficient scaling with the Amazon SageMaker Model Parallel and Data Parallel Libraries

Related articles