Breaking Serverless on Purpose with Chaos Engineering

Breaking Serverless on Purpose with Chaos Engineering

From higher granularity to expanding attack surface to new failure types, serverless has so many potential points of failure. Chaos testing is one method to finding out where these potential failures are — before they cripple your operations.

As chaos engineering becomes a more mainstream way of proactively seeking out your system’s weaknesses, we see it applied to increasingly complicated circumstances and with teams of all sizes.

One such area is serverless. After all, serverless computing is the language-agnostic, pay-as-you-go way to access backend services. This makes it multitenant, stateless, highly distributed, and heavily reliant on third parties. A heck of a lot can go wrong with so much out of your control.

From higher granularity to expanding attack surface to new failure types, serverless has so many potential points of failure, noted Thundra’s Product Vice President Emrah Samdan at ChaosConf, hosted by GremlinChaos Engineering is one method to finding out where these potential failures are — before they cripple your operations.

What Chaos Engineering Isn’t

If there was an underlying theme of this year’s ChaosConf, it’d be defining just what chaos engineering is. Because, even among expert fire starters, explaining the concept is as much art as it is science.

For Samdan, it’s not about being a glutton for punishment, breaking your system because you feel like it. And it’s not about placing blame.

For him, chaos engineering is all about asking: “What if?”

Samdan said, “You need to ask your system: What if your databases become unreachable? What if your whole region goes down? What if my downstream Lambda times out? Any type of failure can happen in your systems. Chaos engineering answers these questions.”

He says you need to answer these questions to establish what are the acceptable limits of your system. He analogized it to a vaccine, injecting a little bit more resiliency and confidence into your system every time.

“Chaos isn’t a pit. Chaos is a ladder.” — Emrah Samdan, Thundra

How to Get Started with Chaos Engineering

Echoing another message from ChaosConf, Samdan reminds us chaos engineering also isn’t just for giant streaming companies. Anyone can do it and you can get started small. He even recommends avoiding doing it in production at the start.

“You can just start when you are staging. Start small. Start injecting into a relatively new service, but put your tools in and just grow stronger with chaos experiments,” he recommended.

Start by measuring your steady-state — the ups and downs of your system. He recommends using an observability tool to accomplish this.

The typical system-level metrics include:

  • Memory usage
  • 99% latency
  • CPU usage
  • Time to restore service

Samdan says typical business-level metrics include:

  • Apdex score, which, according to New Relic, is a ratio value of the number of satisfied and tolerating requests to the total requests made. Each satisfied request counts as one request, while each tolerating request counts as half a satisfied request.
  • Number of transactions, successful or otherwise.

Set acceptable limits for each of these metrics. Then develop a hypothesis: What happens if this happens? Some examples can be:

  • What if I inject latency of 300 milliseconds on average into every Lambda function in my architecture? SLA promise: My responses will still be in the acceptable latency range.
  • What if my DynamoDB table becomes unreachable? SLA promise: My system will continue performing graceful service degradation.

You can ask big questions, but then only start experimenting on the small parts. Samdan reminds you to only inject failure into a controlled piece of your system, like only injecting latency towards one function, not the entire architecture. You want to maintain that smaller blast radius.

That’s also why you only run one experiment at a time. Then you can continue, injecting latency into two, three, four functions. He says you keep going until something breaks.

development devops serverless profile

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Hire Dedicated DevOps Developers

Hire our Dedicated DevOps Developers who have in-depth skills and expertise to develop an interactive and secure web application. Get custom DevOps solutions for your project.

Hire DevOps Developer

Looking to hire top DevOps developers at affordable prices? **[Hire DevOps Developer](https://hourlydeveloper.io/hire-dedicated-devops-developer/ "Hire DevOps Developer")** from **[HourlyDeveloper.io](https://hourlydeveloper.io/...

How long does it take to develop/build an app?

This article covers A-Z about the mobile and web app development process and answers your question on how long does it take to develop/build an app.

Developer Career Path: To Become a Team Lead or Stay a Developer?

For a developer, becoming a team leader can be a trap or open up opportunities for creating software. Two years ago, when I was a developer, ... by Oleg Sklyarov, Fullstack Developer at Skyeng company

How to Extend your DevOps Strategy For Success in the Cloud?

DevOps and Cloud computing are joined at the hip, now that fact is well appreciated by the organizations that engaged in SaaS cloud and developed applications in the Cloud. During the COVID crisis period, most of the organizations have started using cloud computing services and implementing a cloud-first strategy to establish their remote operations. Similarly, the extended DevOps strategy will make the development process more agile with automated test cases.