Entity resolution and fuzzy matches in AWS Glue using the Zingg open source library
Big Data Blog
This article discusses how to use the Zingg open source library for entity resolution and fuzzy matching in AWS Glue notebooks. It provides a step-by-step guide on setting up the required files, preparing training data, building a model, and finding matches using Zingg within an AWS Glue notebook.
Specifically, the article covers:
- Overview of entity resolution and its importance in data integration
- Prerequisites (AWS user/role, S3 bucket, etc.)
- Setting up the Zingg library and configuration files
- Configuring an AWS Glue notebook to use Zingg
- Preparing training data for entity resolution
- Building a Zingg model and finding matches
- Exploring the matches found by the algorithm
- Additional considerations (configuration options, incremental matching, etc.)
- Conclusion highlighting the benefit of integrating third-party libraries with AWS Glue
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
Jul 30
2025
2025
Resolve imperfect data with advanced rule-based fuzzy matching in AWS Entity Resolution
Sep 29
2025
2025
Measuring the accuracy of rule or ML-based matching in AWS Entity Resolution
May 4
2026
2026
AWS Entity Resolution launches support for incremental Machine Learning based matching workflows
Jun 3
2025
2025
Near real-time matching available in AWS Entity Resolution
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.