Migrating enterprise ML workloads from Databricks to AWS for large scale ML

Industries Blog

This article details Kargo's migration of enterprise ML workloads from Databricks to AWS, achieving significant improvements in cost, scalability, and operational efficiency.

Replaced Delta Lake ETL with AWS Glue and Apache Iceberg for ACID transactions and schema evolution
Consolidated scattered modeling logic into containerized Python packages deployed via Amazon ECR
Implemented SageMaker Pipelines for end-to-end orchestration with deterministic artifact versioning
Achieved 40% cost reduction through serverless AWS Glue and Athena replacing persistent clusters
Improved pipeline execution speed 3-5x through parallel SageMaker pipeline execution
Decoupled real-time inference serving from training using sidecar containers for zero-downtime updates
Standardized observability via Amazon CloudWatch for unified monitoring across all components
Maintained byte-for-byte output parity with original Databricks pipelines for production safety

The migration demonstrates how thoughtful re-architecture—rather than lift-and-shift—enables scalable ML platforms supporting both offline optimization and real-time inference at advertising scale.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Dec 2
2024

New streamlined deployment experience for Databricks on AWS

Jan 11
2024

Enhancing ML workflows with AWS ParallelCluster and Amazon EC2 Capacity Blocks for ML

May 13
2025

Databricks modernizes healthcare data on AWS

Nov 13
2024

Zero to generative AI with Databricks and AWS

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Migrating enterprise ML workloads from Databricks to AWS for large scale ML

Related articles