Home icon

Accelerate Generative AI Inference on Amazon SageMaker AI with G7e Instances

Machine Learning Blog



This article announces the availability of G7e instances powered by NVIDIA RTX PRO 6000 Blackwell GPUs on Amazon SageMaker AI for generative AI inference workloads.

  • G7e instances offer 96 GB GPU memory per GPU, double that of G6e instances
  • Delivers up to 2.3x inference performance improvement over previous-generation G6e
  • Single G7e.2xlarge GPU can host 35B parameter models; 8-GPU G7e.48xlarge supports 300B models
  • 1,600 Gbps networking throughput enables low-latency multi-node inference scenarios
  • Benchmarks show 2.6x cost reduction ($0.79 vs $2.06 per million tokens) at production concurrency
  • Combined with EAGLE speculative decoding, achieves 2.4x throughput and 75% cost reduction
  • Well-suited for chatbots, RAG pipelines, long-context inference, and multimodal AI workloads

G7e instances provide significant cost and performance improvements for LLM inference, enabling previously multi-GPU workloads to run efficiently on single GPUs while reducing operational complexity.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Dec 6
2024
Amazon SageMaker introduces new capabilities to accelerate scaling of Generative AI Inference
Jul 9
2024
Amazon SageMaker introduces a new generative AI inference optimization capability
Jul 25
2024
Amazon SageMaker inference launches faster auto scaling for generative AI models
Aug 29
2024
Accelerate Generative AI Inference with NVIDIA NIM Microservices on Amazon SageMaker

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.