AWS engineers recently wrote about an open source chaos engineering tool called AWSSSMChaosRunner that they used to test fault injection in Prime Video.
AWS engineers recently wrote about an open source chaos engineering tool called AWSSSMChaosRunner that they used to test fault injection in Prime Video. Built using AWS Systems Manager that can execute arbitrary commands on EC2 instances, the team was able to mitigate latency related issues using it.
The AWSSSMChaosRunner is built using the AWS Systems Manager to remotely execute commands against a specific set of EC2 instances. The set of commands, specified declaratively as a collection, creates the set of injected faults.
Varun Jewalikar, Software Engineer at Prime Video, and Adrian Hornsby, Principal Developer Advocate (Architecture) at AWS, write that typical chaos engineering experiments include simulating resource exhaustion and a failed or slow network. There are countermeasures for such scenarios but "they are rarely adequately tested, as unit or integration tests generally can't validate them with high confidence".
AWS Systems Manager is a tool that can perform various operational tasks across AWS resources with an agent component called SSM Agent. The agent - pre-installed by default on certain Windows and Linux AMIs - has the concept of "Documents" which are similar to runbooks that can be executed. It can run simple shell scripts too, a feature leveraged by the AWSSSMChaosRunner. The SendCommand API in SSM enables running commands across multiple instances, which can be filtered by AWS tags. CloudWatch can be used to view logs from all the instances in a single place.
The security aspects like creating a user for execution on the EC2 instance are taken care of by the agent. Examples of what the chaos runner can do include silently dropping all outgoing TCP traffic on a specific port, introducing network latency on an interface, hogging CPU etc. It’s important to note that the currently supported failure injections are either at the infrastructure or at the AWS service layer.
The Principles of Chaos Engineering: That said, resilience is not without its challenges. Building microservices that are independent yet work well together is not easy.
Breaking your System Infrastructure on purpose — Really? Why would anybody do that? Chaos Engineering is a type of Engineering where we test the system’s robustness, reliability and the ability to survive a disaster without manual intervention.
India's best AWS Online Training & Certification Course By CETPA with Live Project & Avail 50% discount. Level up from creating cloud applications using AWS SDKs and to prepare for certification exams. Enroll Now!
AWS DevOps Tutorial will help you understand what is Continuous Integration and Continuous Deployment through the various tools & services of the AWS suite. This tutorial also explains how to achieve Continuous Integration & Deployment through AWS CodePipeline & EC2 through the means of a Hands-On session by the end of which you will learn how to deploy a demo application using AWS.
This Edureka "AWS DevOps vs Azure DevOps" video will give a detailed comparison of how AWS and Azure fare in handling and supporting DevOps approach on the respective cloud platforms along with latest trends and numbers in the domain.