Build reliable Agentic AI solution with Amazon Bedrock: Learn from Pushpay’s journey on GenAI evaluation

Machine Learning Blog

This article details how Pushpay built a production-ready agentic AI search feature using Amazon Bedrock, focusing on their custom evaluation framework for continuous quality assurance.

Pushpay created AI search enabling ministry staff to query community data using natural language
Initial solution achieved only 60-70% accuracy with manual evaluation bottlenecks
Implemented generative AI evaluation framework with golden dataset of 300+ representative queries
Domain-category based evaluation revealed performance variations masked by aggregate metrics
Strategic domain-level rollout achieved 95% accuracy on high-priority categories
Dynamic prompt constructor tailors prompts based on query content and user context
Reduced time-to-insight from 120 seconds to under 4 seconds for users
Used Amazon Bedrock prompt caching to reduce latency and token costs
Key lesson: Build evaluation frameworks early; think beyond aggregate accuracy scores

Pushpay's systematic, data-driven approach to AI agent optimization demonstrates that production readiness requires robust evaluation infrastructure, not just sophisticated prompts.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Dec 15
2025

Operationalize generative AI workloads and scale to hundreds of use cases with Amazon Bedrock – Part 1: GenAIOps

Mar 31
2026

Build reliable AI agents with Amazon Bedrock AgentCore Evaluations

Oct 2
2024

Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 1

Oct 21
2024

Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 2

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Build reliable Agentic AI solution with Amazon Bedrock: Learn from Pushpay’s journey on GenAI evaluation

Related articles