Home icon
Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Blog



This article demonstrates an end-to-end ML workflow integrating Amazon Redshift data warehouse with Amazon SageMaker Feature Store for building ML features at scale.

  • Three options for feature engineering: AWS Glue interactive sessions, SageMaker Processing with Spark, or Data Wrangler
  • RedshiftDatasetDefinition enables simple Redshift connection configuration without maintaining persistent connections
  • SageMaker Feature Store Spark connector allows distributed feature ingestion from Spark DataFrames
  • DatasetBuilder API and Athena queries enable ML dataset creation from feature groups
  • Batch transform provides bulk model inference on S3 data with results stored in S3
  • Amazon Redshift Spectrum joins batch predictions with original Redshift data
  • CloudFormation template automates setup of SageMaker domain, Redshift cluster, and required IAM roles

The post provides a complete guide for data scientists to seamlessly move data from Redshift through feature engineering to model training and inference using AWS native services.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.