Home icon

Processing Amazon S3 objects at scale with AWS Step Functions Distributed Map S3 prefix

Compute Blog



The article discusses processing large-scale data files using AWS Step Functions Distributed Map, a new feature for iterating over Amazon S3 objects with simplified management.

  • Enables processing large datasets by running concurrent workflow iterations in parallel
  • Introduces prefix-based iteration with `LOAD_AND_FLATTEN` transformation
  • Demonstrates a use case of application log processing and summarization
  • Provides a sample workflow that:
    • Iterates over log files from an S3 prefix
    • Puts hourly error count metrics into CloudWatch
    • Stores metrics in DynamoDB
    • Invokes a Lambda function for metrics aggregation
  • Supports multiple input types including CSV, JSON, JSONL, and Parquet

The feature simplifies data processing workflows by eliminating the need for nested workflows and custom code, making it easier to build dynamic, resilient data processing pipelines.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

Nov 4
2025
Orchestrating big data processing with AWS Step Functions Distributed Map
Oct 1
2025
How to export to Amazon S3 Tables by using AWS Step Functions Distributed Map
Nov 4
2025
Optimizing nested JSON array processing using AWS Step Functions Distributed Map
Sep 18
2025
AWS Step Functions expands data source options and improves observability for Distributed Map

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.