Nova Forge SDK series part 2: Practical guide to fine-tune Nova models using data mixing capabilities

Machine Learning Blog

This article provides a comprehensive guide to fine-tuning Amazon Nova models using the Nova Forge SDK with data mixing capabilities, enabling domain specialization while preserving general intelligence.

Data mixing blends domain-specific training data with Amazon-curated datasets to prevent catastrophic forgetting
Five-stage workflow: environment setup, data preparation, training configuration, model training, and evaluation
Requires SageMaker HyperPod cluster with GPU instances (ml.p5.48xlarge recommended)
Data sanitization removes tokens conflicting with Nova's chat template (e.g., "System:", "Assistant:")
LoRA (Low-Rank Adaptation) is parameter-efficient; full-rank SFT available for deeper adaptation
Configurable data mixing ratios control batch composition (e.g., 50% customer data, 50% Nova-curated)
Evaluation measures both domain performance and general capability retention using MMLU and IFEval benchmarks
Previous results showed 12-point F1 improvement on domain task while maintaining near-baseline MMLU scores
Best practice: iterate on mixing ratios, not just hyperparameters; always evaluate on both axes

The guide demonstrates that data mixing enables practical production fine-tuning by balancing domain expertise with general intelligence through systematic iteration and dual-axis evaluation.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Mar 18
2026

Introducing Nova Forge SDK, a seamless way to customize Nova models for enterprise AI

Mar 2
2026

Building specialized AI without sacrificing intelligence: Nova Forge data mixing in action

Mar 18
2026

Kick off Nova customization experiments using Nova Forge SDK

Dec 2
2025

Introducing Amazon Nova Forge: Build your own frontier models using Nova

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Nova Forge SDK series part 2: Practical guide to fine-tune Nova models using data mixing capabilities

Related articles