Gremlin, an organisation focused on chaos engineering, recently announced the general availability of Status Checks. This new feature automatically validates systems that are healthy and ready for running chaos experiments in production. Status Checks support integration with CI/CD pipelines along with third-party tool integration for PagerDuty, Datadog, New Relic, or any other monitoring tool.

One of the "core aspects" of Gremlin is safety. Speaking about it further, Matthew Fornaciari, CTO and Co-Founder of Gremlin said,

“Since launch in 2017, we’ve had a big red HALT button that makes it simple for Gremlin users to reactively rollback experiments, should an attack negatively impact the customer experience. Today, companies that have matured are automating more of their experiments with CI/CD, and they need a way to programmatically check the health of their systems and proactively stop an experiment. That’s Status Checks.”

In times past, companies would address safety concerns by running experiments in staging environments, then apply those learnings to problems in production. Sometimes staging environments do not accurately mirror production environments, making this approach less valuable. By making an API call to a third-party monitoring or alerting endpoint, Status Checks evaluate the status code, request-response time, JSON response body, and then provide a response (pass or fail) depending on your system conditions. If the system is healthy, Gremlin runs a chaos engineering experiment on the system before continuing with another status check.

Expounding on the approach, Ana Margarita Medinachaos engineer at Gremlin said, “The point of chaos engineering is not to add unnecessary chaos. You want to control the chaos in your system.” Taking this point further, Matt Schillerstrom, product manager at Gremlin, explained SD Times in an email, “It’s very important to note that Gremlin doesn’t advocate for ‘chaos’ — the term chaos engineering can be a little misleading. We advocate for hypothesis-driven testing, in order to tame the chaos. To better understand our systems in order to prevent chaos. It does no one any good to be attacking infrastructure that’s already under stress.”

#gremlin #security #chaos engineering #devops #news

Gremlin Announces General Availability of Status Checks
1.15 GEEK