Accelerate Generative AI Inference on Amazon SageMaker AI with G7e Instances

Machine Learning Blog

This article announces the availability of G7e instances powered by NVIDIA RTX PRO 6000 Blackwell GPUs on Amazon SageMaker AI for generative AI inference workloads.

G7e instances offer 96 GB GPU memory per GPU, double that of G6e instances
Delivers up to 2.3x inference performance improvement over previous-generation G6e
Single G7e.2xlarge GPU can host 35B parameter models; 8-GPU G7e.48xlarge supports 300B models
1,600 Gbps networking throughput enables low-latency multi-node inference scenarios
Benchmarks show 2.6x cost reduction ($0.79 vs $2.06 per million tokens) at production concurrency
Combined with EAGLE speculative decoding, achieves 2.4x throughput and 75% cost reduction
Well-suited for chatbots, RAG pipelines, long-context inference, and multimodal AI workloads

G7e instances provide significant cost and performance improvements for LLM inference, enabling previously multi-GPU workloads to run efficiently on single GPUs while reducing operational complexity.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Jul 23
2026

Amazon SageMaker AI inference now supports G7 instances

Dec 6
2024

Amazon SageMaker introduces new capabilities to accelerate scaling of Generative AI Inference

Jul 9
2024

Amazon SageMaker introduces a new generative AI inference optimization capability

Jul 25
2024

Amazon SageMaker inference launches faster auto scaling for generative AI models

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Accelerate Generative AI Inference on Amazon SageMaker AI with G7e Instances

Related articles