Planning for failure: How to make generative AI workloads more resilient

Public Sector Blog

This article provides guidance on making generative AI workloads more resilient by focusing on five key categories of resilience:

Redundancy: Eliminating single points of failure using cross-Region inference and decoupling tools from large language models
Sufficient Capacity: Managing service quotas, scaling compute resources, and considering provisioned throughput
Timely Output: Monitoring business-aligned metrics, managing model latency, and implementing smart retry strategies
Correct Output: Using guardrails, validating user input, backing up knowledge bases, and rigorously evaluating model performance
Fault Isolation: Implementing patterns like backoff, token bucket retries, and circuit breakers to minimize system-wide failures

The key recommendation is to continuously evaluate and improve generative AI workloads, ensuring they are safe, predictable, and resilient to various potential failures.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Feb 1
2024

Designing generative AI workloads for resilience

Sep 30
2025

Build resilient generative AI agents

Nov 18
2024

Threat modeling your generative AI workload to evaluate security risk

Sep 16
2024

Methodology for incident response on generative AI workloads

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Planning for failure: How to make generative AI workloads more resilient

Related articles