Home icon

Apache Spark lineage now available in Amazon SageMaker Unified Studio for IDC based domains

News



This article announces the general availability of Apache Spark Data Lineage in Amazon SageMaker Unified Studio for IDC-based domains.

  • Captures schema and transformation lineage from Spark jobs on EMR and AWS Glue
  • Supports EMR-EC2, EMR-Serverless, EMR-EKS, and AWS Glue executions
  • Visualize lineage as interactive graphs in SageMaker Unified Studio
  • Query lineage data using APIs for programmatic access
  • Compare transformations across Spark job execution history
  • Available in all existing SageMaker Unified Studio regions

In summary, Apache Spark Data Lineage helps users identify root causes of issues and understand data transformation impacts across their data pipelines.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Mar 17
2026
Amazon SageMaker Unified Studio supports aggregated view of data lineage
Dec 3
2024
Data Lineage is now generally available in Amazon DataZone and next generation of Amazon SageMaker
Jun 24
2025
Capture data lineage from dbt, Apache Airflow, and Apache Spark with Amazon SageMaker
Dec 3
2024
Announcing the general availability of data lineage in the next generation of Amazon SageMaker and Amazon DataZone

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.