Home icon
Amazon introduces SWE-PolyBench, a multilingual benchmark for AI Coding Agents

DevOps & Developer Productivity Blog



Amazon has introduced SWE-PolyBench, a groundbreaking multilingual benchmark for evaluating AI coding agents across diverse programming languages and real-world scenarios.

  • Covers four programming languages: Java, JavaScript, TypeScript, and Python
  • Contains 2,110 curated coding tasks from 21 repositories
  • Includes a stratified subset of 500 issues (SWE-PolyBench500) for rapid experimentation
  • Introduces new evaluation metrics beyond pass rates, including file-level localization and CST node-level retrieval
  • Aims to assess AI coding agents' ability to navigate and understand complex codebases

Key findings show that current AI coding agents perform best in Python and struggle with complex, multi-file tasks across different programming languages. The benchmark provides a comprehensive framework for evaluating and improving AI-powered software engineering tools.



Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.