Home icon

Deploy LLMs in AWS GovCloud (US) Regions using Hugging Face Inference Containers

Public Sector Blog



This article provides a detailed guide on how to deploy large language models (LLMs) in AWS GovCloud (US) Regions using Hugging Face Inference Containers. It covers the process of hosting LLMs on Amazon EC2 instances and serving custom LLMs using the Hugging Face Text Generation Inference (TGI) Container.

Specifically, the article covers:

  • Prerequisites for deploying LLMs in AWS GovCloud (US)
  • Optional steps for downloading custom LLM weights to Amazon S3
  • Creating an Amazon EC2 instance for LLM hosting
  • Configuring the EC2 instance for hosting and deploying the TGI Container
  • Testing the inference server with a SageMaker notebook instance
  • Cleanup process to terminate resources
  • Usage in cloud applications and potential integrations


Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Jun 19
2024
Fine-tuning an LLM using QLoRA in AWS GovCloud (US)
Aug 22
2025
Deploy LLMs on Amazon EKS using vLLM Deep Learning Containers
Aug 14
2025
Deploy LLMs on Amazon EKS using vLLM Deep Learning Containers
Dec 2
2024
Scaling your LLM inference workloads: multi-node deployment with TensorRT-LLM and Triton on Amazon EKS

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.