Home icon

How Amazon trains sequential ensemble models at scale with Amazon SageMaker Pipelines

Machine Learning Blog



This article details how Amazon uses Amazon SageMaker Pipelines to train sequential ensemble models for use case identification in Salesforce opportunities, using a multi-layered BERTopic approach.

  • The solution uses three sequential BERTopic models to generate hierarchical topic clustering
  • Each BERTopic model consists of embedding, dimension reduction, clustering, and keyword identification steps
  • Key challenges include data preprocessing, scalable compute, and coordinating multi-layer model training
  • The implementation leverages SageMaker pipeline steps including Processing, Training, Callback, and Model registration
  • Uses custom Docker images and integrates with AWS services like SQS and Lambda for workflow orchestration

The approach enables automatic identification of use cases from text data, improving sales analytics and recommendation models through advanced machine learning techniques.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Nov 27
2024
Efficiently train models with large sequence lengths using Amazon SageMaker model parallel
Oct 28
2024
Customized model monitoring for near real-time batch inference with Amazon SageMaker
Aug 7
2024
Automate the machine learning model approval process with Amazon SageMaker Model Registry and Amazon SageMaker Pipelines
Apr 16
2024
Distributed training and efficient scaling with the Amazon SageMaker Model Parallel and Data Parallel Libraries

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.