Improving System Resilience and Observability: Chaos Engineering with AWS FIS and AWS DLT
Blog
This article discusses how to use AWS Fault Injection Simulator (AWS FIS) and AWS Distributed Load Testing (DLT) to improve system resilience and observability. It covers a solution architecture that integrates DLT for load simulation, AWS FIS for fault simulation, and Amazon Managed Grafana for monitoring and visualization.
Specifically, the article covers:
- Using DLT to simulate realistic traffic patterns and load on application services with JMeter scripts
- Creating AWS FIS experiment templates to introduce controlled faults like instance termination, CPU spikes, network latency, etc.
- Setting up an EC2 instance with InfluxDB and configuring Amazon Managed Grafana with InfluxDB and CloudWatch as data sources
- Importing Grafana dashboards to monitor application and infrastructure metrics during load and chaos tests
- Key metrics to monitor for resilience, infrastructure, and application performance
- Cleaning up resources after testing
- Conclusion on the benefits of combining AWS FIS, DLT, and Grafana for resilience testing
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.