Home icon

Accelerating LLM fine-tuning with unstructured data using SageMaker Unified Studio and S3

Machine Learning Blog



This article demonstrates how to fine-tune Llama 3.2 11B Vision Instruct for visual question answering using SageMaker Unified Studio and S3 integration.

  • Integrates S3 general purpose buckets with SageMaker Catalog for unstructured data access
  • Uses DocVQA dataset with three training sizes (1k, 5k, 10k images) for fine-tuning
  • Data producer project catalogs S3 data; consumer project subscribes and develops models
  • Fine-tuned model achieves 0.902 ANLS score, 4.9% improvement over base model (0.853)
  • MLflow tracks experimentation and evaluation metrics across all training runs
  • S3 Access Grants provide secure data access without complex permission management
  • Requires ml.p4de.24xlarge instance type; training takes approximately 4 hours per model

This integration streamlines ML workflows by enabling secure data discovery, cataloging, and collaboration between data producers and consumers while maintaining governance controls.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

May 13
2026
Fine-tune LLM with Databricks Unity Catalog and Amazon SageMaker AI
Jul 24
2024
LLM experimentation at scale using Amazon SageMaker Pipelines and MLflow
Apr 22
2025
Supercharge your LLM performance with Amazon SageMaker Large Model Inference container v15
Jun 24
2025
Power Your LLM Training and Evaluation with the New SageMaker AI Generative AI Tools

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.