Announcing the Preview of Amazon SageMaker Profiler: Track and visualize detailed hardware performance data for your model training workloads

Blog

This article announces the preview of Amazon SageMaker Profiler, a tool for tracking and visualizing hardware performance during deep learning model training on AWS.

Tracks CPU and GPU activities including utilization, kernel runs, memory operations, and data transfers
Provides Python modules for annotating PyTorch and TensorFlow training scripts
Offers UI dashboard with visualizations of GPU/CPU active time, utilization trends, and kernel performance
Includes timeline interface showing detailed kernel launches and runs at operation level
Supports PyTorch 2.0.0, 1.13.1 and TensorFlow 2.12.0, 2.11.1 on specific GPU instance types
Available in US East, US West, and Europe regions
Generates up to 10x less profiling data than open-source alternatives

SageMaker Profiler helps ML practitioners optimize resource utilization and reduce training costs by identifying performance bottlenecks in large-scale distributed training jobs.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles