Automating data classification in Amazon SageMaker Catalog using an AI agent
Big Data Blog
This article explains how Amazon SageMaker Catalog's AI agent automates data classification using business glossaries and Amazon Bedrock language models.
- AI agent analyzes table metadata and schema to suggest relevant business glossary terms automatically
- Reduces manual tagging effort and improves metadata consistency across organizations
- Agent uses reasoning-driven approach: reviews context, searches catalog, evaluates results, refines strategy
- Suggestions include functional terms and sensitive data classifications like PII and PHI
- Data producers review and accept or modify suggestions before publishing assets
- Integrated directly into publish workflow with no separate ETL processes required
- Requires well-defined, consistent business glossaries to generate accurate recommendations
- Enables improved data discovery through standardized terminology and filtering by terms
SageMaker Catalog's automated classification reduces metadata inconsistency by standardizing terminology at publication time, improving data governance and discovery without significant workflow changes.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
Dec 1
2025
2025
Amazon SageMaker Catalog provides automatic data classification using AI agents
Apr 1
2026
2026
Improve the discoverability of your unstructured data in Amazon SageMaker Catalog using generative AI
Feb 3
2026
2026
Agentic AI for healthcare data analysis with Amazon SageMaker Data Agent
May 4
2026
2026
Agent-guided workflows to accelerate model customization in Amazon SageMaker AI
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.