DevOps & Developer Productivity Blog
Amazon has introduced SWE-PolyBench, a groundbreaking multilingual benchmark for evaluating AI coding agents across diverse programming languages and real-world scenarios.
- Covers four programming languages: Java, JavaScript, TypeScript, and Python
- Contains 2,110 curated coding tasks from 21 repositories
- Includes a stratified subset of 500 issues (SWE-PolyBench500) for rapid experimentation
- Introduces new evaluation metrics beyond pass rates, including file-level localization and CST node-level retrieval
- Aims to assess AI coding agents' ability to navigate and understand complex codebases
Key findings show that current AI coding agents perform best in Python and struggle with complex, multi-file tasks across different programming languages. The benchmark provides a comprehensive framework for evaluating and improving AI-powered software engineering tools.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.