Home icon
Content Repository for Unstructured Data with Multilingual Semantic Search: Part 2

Blog



This article demonstrates building multilingual semantic search for unstructured data repositories with access control, extending Part 1 architecture.

  • Amazon Textract extracts text from documents via OCR workflow
  • SageMaker deploys universal-sentence-encoder-multilingual model for embeddings
  • Amazon OpenSearch k-NN search with HNSW algorithm indexes vector embeddings
  • API Gateway and Lambda functions enable semantic search queries
  • Access control maintained using Cognito ID token department claims
  • Solution supports cross-language search across English, German, Spanish, French
  • Complete walkthrough includes CDK deployment, SageMaker endpoint setup, sample data upload

The solution enables organizations to search unstructured multilingual documents semantically while enforcing department-based access controls through AWS managed services.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.