Home icon

Scaling seismic foundation models on AWS: Distributed training with Amazon SageMaker HyperPod and expanding context windows

Machine Learning Blog



This article describes how TGS, a geoscience data provider, partnered with AWS to optimize seismic foundation model training using Amazon SageMaker HyperPod, achieving dramatic performance improvements and expanded analytical capabilities.

  • Reduced training time from 6 months to 5 days using distributed training across 16 EC2 P5 instances
  • Achieved near-linear scaling (90-95% parallel efficiency) across 128 GPUs with DeepSpeed ZeRO-2
  • Streamed data directly from Amazon S3 instead of FSx, reducing storage costs by over 90%
  • Implemented ring attention and context parallelism to expand model context window 4.5x
  • Enabled processing of larger 3D seismic volumes for better geological pattern detection
  • Key optimization: DeepSpeed ZeRO-2 outperformed FSDP2 and ZeRO-3 for this workload

The solution demonstrates how optimized data pipelines, distributed training frameworks, and advanced parallelization techniques enable efficient scaling of foundation models for specialized scientific domains.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Dec 15
2025
Adaptive infrastructure for foundation model training with elastic training on SageMaker HyperPod
Jun 19
2025
Accelerate foundation model training and inference with Amazon SageMaker HyperPod and Amazon SageMaker Studio
Jul 10
2025
Accelerate foundation model development with one-click observability in Amazon SageMaker HyperPod
Sep 10
2024
Amazon EKS support in Amazon SageMaker HyperPod to scale foundation model development

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.