Improve operational visibility for inference workloads on Amazon Bedrock with new CloudWatch metrics for TTFT and Estimated Quota Consumption

Machine Learning Blog

This article announces two new CloudWatch metrics for Amazon Bedrock: TimeToFirstToken and EstimatedTPMQuotaUsage, providing server-side visibility into streaming latency and quota consumption for inference workloads.

TimeToFirstToken measures latency from request receipt to first response token generation for streaming APIs
EstimatedTPMQuotaUsage tracks tokens-per-minute quota consumed, accounting for burndown multipliers and cache tokens
Both metrics automatically emitted at no cost with no API changes or opt-in required
Available in AWS/Bedrock CloudWatch namespace with ModelId dimension filtering
Supports cross-Region inference profiles for geographic and global configurations
Enable proactive alarms, SLA baselines, and capacity planning without client-side instrumentation
Quota formula varies by throughput type: on-demand applies output token burndown; provisioned throughput applies cache weighting

These metrics eliminate the need for custom instrumentation and help teams prevent throttling and performance degradation in production AI workloads.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Sep 18
2025

Monitor Amazon Bedrock batch inference using Amazon CloudWatch metrics

Jun 25
2024

Improve visibility into Amazon Bedrock usage and performance with Amazon CloudWatch

May 5
2026

Amazon ElastiCache adds thirteen new Amazon CloudWatch metrics for network capacity planning and engine diagnostics

Jun 1
2026

Amazon Bedrock adds Amazon CloudWatch metrics for OpenAI- and Anthropic-compatible APIs

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Improve operational visibility for inference workloads on Amazon Bedrock with new CloudWatch metrics for TTFT and Estimated Quota Consumption

Related articles