How Beekeeper optimized user personalization with Amazon Bedrock
Machine Learning Blog
This article describes how Beekeeper built an automated system using Amazon Bedrock to continuously optimize LLM model and prompt selection for user personalization.
- Beekeeper created a dynamic evaluation system that tests model/prompt combinations and ranks them on a live leaderboard
- System evaluates quality using compression ratio, action item presence, hallucination detection, and vector comparison metrics
- Baseline leaderboard established using synthetic test data with ground truth annotations
- Manual validation performed via Amazon Mechanical Turk on statistically significant sample sizes
- User feedback incorporated through prompt mutation process without affecting other users
- Production deployment uses top three model/prompt pairs at 50%, 30%, and 20% traffic ratios
- Preliminary results show 13-24% better ratings when aggregated per tenant
- Solution uses Amazon EventBridge, EKS, Lambda, RDS, and Bedrock for orchestration and evaluation
Beekeeper's approach automates LLM selection and prompt optimization, enabling smaller teams to continuously improve results while balancing quality, cost, and latency without manual intervention.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
2024
2025
2025
2026
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.