Accelerating LLM fine-tuning with unstructured data using SageMaker Unified Studio and S3

Machine Learning Blog

This article demonstrates how to fine-tune Llama 3.2 11B Vision Instruct for visual question answering using SageMaker Unified Studio and S3 integration.

Integrates S3 general purpose buckets with SageMaker Catalog for unstructured data access
Uses DocVQA dataset with three training sizes (1k, 5k, 10k images) for fine-tuning
Data producer project catalogs S3 data; consumer project subscribes and develops models
Fine-tuned model achieves 0.902 ANLS score, 4.9% improvement over base model (0.853)
MLflow tracks experimentation and evaluation metrics across all training runs
S3 Access Grants provide secure data access without complex permission management
Requires ml.p4de.24xlarge instance type; training takes approximately 4 hours per model

This integration streamlines ML workflows by enabling secure data discovery, cataloging, and collaboration between data producers and consumers while maintaining governance controls.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

May 13
2026

Fine-tune LLM with Databricks Unity Catalog and Amazon SageMaker AI

Jul 24
2024

LLM experimentation at scale using Amazon SageMaker Pipelines and MLflow

Apr 22
2025

Supercharge your LLM performance with Amazon SageMaker Large Model Inference container v15

Jun 24
2025

Power Your LLM Training and Evaluation with the New SageMaker AI Generative AI Tools

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Accelerating LLM fine-tuning with unstructured data using SageMaker Unified Studio and S3

Related articles