Ensuring a Smooth Kubernetes Dockershim Deprecation With Chaos Engineering

Ensuring a Smooth Kubernetes Dockershim Deprecation With Chaos Engineering. Learn what the deprecation of Docker as a container runtime means for you and how to ensure a pain-free transition.

Rethinking How the Industry Approaches Chaos Engineering

Nora Jones focuses on the Before and After phases of developing Chaos Engineering experiments (whether they be gamedays or driven by software) and develops important questions to ask in each of these phases. She digs into some of the Ironies of Automation present with Chaos Engineering today.

Chaos-Mesh-Action: Integrate Chaos Engineering Into Your CI

In this article, I will share how we use chaos-mesh-action, a GitHub action to integrate Chaos Mesh into the CI process. Chaos-mesh-action is available on the GitHub market, and the source code is on GitHub.

Growing Resilience: Serving Half a Billion Users Monthly at Condé Nast

Crystal Hirschorn outlines how Condé Nast practices Chaos Engineering, where this fits within the already established testing and verification ecosystem, and what emergent practices and tools are on the horizon. Last but not least, she covers how to build up an organization’s true superpower: Human Resilience.

Building an Automated Testing Framework Based on Chaos Mesh and Argo

How we use TiPocket, an automated testing framework to build a full Chaos Engineering testing loop for TiDB, our distributed database.

An Open Source Chaos Engineering Library from AWS

AWS engineers recently wrote about an open source chaos engineering tool called AWSSSMChaosRunner that they used to test fault injection in Prime Video.

Nora Jones on Resilience Engineering, Mental Models, and Learning

In this podcast, Nora Jones, Co-Founder and CEO at Jeli and co-author of O’Reilly’s “Chaos Engineering: System Resiliency in Practice”, sat down with InfoQ podcast co-host Daniel Bryant. Topics discussed included: chaos engineering and resilience engineering, planning and running effective chaos experiments, and learning from incidents.

Ana Medina on Chaos Engineering, Game Days, and Learning

In this podcast, Ana Medina, senior chaos engineer at Gremlin, sat down with InfoQ podcast co-host Daniel Bryant. Topics discussed included: how enterprise organisations are adopting chaos engineering with the requirements for guardrails and the need for “status checks” to ensure pre-experiment system health; how to run game days or IT fire drills when everyone is working remotely; and why teams should continually invest in learning from past incidents and preparing for inevitable failures within systems.

The Principles of Chaos Engineering

The Principles of Chaos Engineering: That said, resilience is not without its challenges. Building microservices that are independent yet work well together is not easy.

Gremlin Announces General Availability of Status Checks

Gremlin recently announced the general availability of Status Checks. This new feature automatically validates systems that are healthy and ready for running chaos experiments in production.

The Principles of Chaos Engineering

Improve the resilience of your system by optimizing the high fault tolerance of your infrastructure through Chaos Engineering. Learn more here.