The art and science of hyperparameter optimization on Amazon Nova Forge
Machine Learning Blog
This article provides a comprehensive guide to hyperparameter optimization for Amazon Nova Forge, AWS's platform for building custom domain-specific language models. It addresses the critical challenges of fine-tuning while preserving general capabilities.
- Catastrophic forgetting occurs when domain training overwrites general capabilities; data mixing prevents this
- Learning rate is the most sensitive hyperparameter; service defaults are recommended starting points
- Three customization techniques: Continued Pre-Training (CPT), Supervised Fine-Tuning (SFT), Reinforcement Fine-Tuning (RFT)
- Checkpoint selection is the most impactful decision for CPT; match flexibility to data scale
- Data mixing should balance customer data at ~50% with Nova curated data; always include reasoning-instruction-following
- LoRA training offers lower cost and faster iteration; graduate to Full Rank after validation
- Recommended workflow: SFT with LoRA, then RFT for optimal results on labeled data
- Batch size targets: 2-20 million tokens per step for CPT; monitor validation loss for overfitting
- Common mistakes: skipping SFT before RFT, deviating from default learning rates, poor reward functions
- Data and reward quality matter more than volume; prioritize quality over scale
Successful Nova Forge customization requires balancing strategic decisions (checkpoint, data mixing, training mode) with systematic hyperparameter tuning, while maintaining data quality and reward function rigor.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
2024
2025
2025
2025
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.