Fine-tuning an LLM using QLoRA in AWS GovCloud (US)
Public Sector Blog
This article describes how to fine-tune a large language model (LLM) using the Quantized Low Rank Adaptation (QLoRA) technique on the AWS GovCloud (US) Regions. It covers the following key points:
- Setting up an Amazon SageMaker notebook instance and preprocessing a dataset for instruction-based fine-tuning
- Launching a SageMaker training job using a Hugging Face container and QLoRA to efficiently adapt an LLM to a specific domain
- Key configurations for quantization and low-rank adapters to enable efficient fine-tuning and inference
- Merging the adapted weights into the base model for simplified deployment to an inference endpoint
- Relevant resources and next steps for deploying the fine-tuned model
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
May 9
2024
2024
Deploy LLMs in AWS GovCloud (US) Regions using Hugging Face Inference Containers
Jul 24
2024
2024
LLM experimentation at scale using Amazon SageMaker Pipelines and MLflow
Nov 26
2024
2024
Serving LLMs using vLLM and Amazon EC2 instances with AWS AI chips
Mar 19
2026
2026
AWS adds support for NIXL with EFA to accelerate LLM inference at scale
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.