Fine-tuning an LLM using QLoRA in AWS GovCloud (US)

Public Sector Blog

This article describes how to fine-tune a large language model (LLM) using the Quantized Low Rank Adaptation (QLoRA) technique on the AWS GovCloud (US) Regions. It covers the following key points:

Setting up an Amazon SageMaker notebook instance and preprocessing a dataset for instruction-based fine-tuning
Launching a SageMaker training job using a Hugging Face container and QLoRA to efficiently adapt an LLM to a specific domain
Key configurations for quantization and low-rank adapters to enable efficient fine-tuning and inference
Merging the adapted weights into the base model for simplified deployment to an inference endpoint
Relevant resources and next steps for deploying the fine-tuned model

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

May 9
2024

Deploy LLMs in AWS GovCloud (US) Regions using Hugging Face Inference Containers

Jul 24
2024

LLM experimentation at scale using Amazon SageMaker Pipelines and MLflow

Nov 26
2024

Serving LLMs using vLLM and Amazon EC2 instances with AWS AI chips

Mar 19
2026

AWS adds support for NIXL with EFA to accelerate LLM inference at scale

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Fine-tuning an LLM using QLoRA in AWS GovCloud (US)

Related articles