Deploy LLMs in AWS GovCloud (US) Regions using Hugging Face Inference Containers

Public Sector Blog

This article provides a detailed guide on how to deploy large language models (LLMs) in AWS GovCloud (US) Regions using Hugging Face Inference Containers. It covers the process of hosting LLMs on Amazon EC2 instances and serving custom LLMs using the Hugging Face Text Generation Inference (TGI) Container.

Specifically, the article covers:

Prerequisites for deploying LLMs in AWS GovCloud (US)
Optional steps for downloading custom LLM weights to Amazon S3
Creating an Amazon EC2 instance for LLM hosting
Configuring the EC2 instance for hosting and deploying the TGI Container
Testing the inference server with a SageMaker notebook instance
Cleanup process to terminate resources
Usage in cloud applications and potential integrations

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Jun 19
2024

Fine-tuning an LLM using QLoRA in AWS GovCloud (US)

Aug 22
2025

Deploy LLMs on Amazon EKS using vLLM Deep Learning Containers

Aug 14
2025

Deploy LLMs on Amazon EKS using vLLM Deep Learning Containers

Dec 2
2024

Scaling your LLM inference workloads: multi-node deployment with TensorRT-LLM and Triton on Amazon EKS

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Deploy LLMs in AWS GovCloud (US) Regions using Hugging Face Inference Containers

Related articles