Software Testing Is a Zero-Sum Game

One thing I sometimes hear from developers or managers, that do not have much experience in Quality Assurance, is that we need to test everything. The refrain that comes is “but what if that one case we don’t test breaks?”. This sounds pretty reasonable! But this is the wrong mentality. You cannot test every case and try to will hurt your team’s quality rather than help it.

Each test you create consumes from your finite stock of “Test Resources”. I will talk in more detail about what that generic test resource is but think of it as a bucket. Each test you make adds a little water to that bucket. When that bucket overflows, Bad Things™ happen.

Tests Consume Resources

What does it mean then to say, “tests consume resources”?

Minimally, creating and running tests takes time. Since any project needs to be finished at some point, we need to limit the number of tests we run at some point. However, another consideration is complexity. The more complex your tests are, the more time they take to maintain and the more likely they are to fail for test issues, rather than for bugs.

If you run out of those resources, if your bucket overflows, then you will spend more time maintaining your tests than writing new tests or new code. At that point, becomes likely that the developers using the tests will lose confidence in them, and stop maintain or running them. This can signal the beginning of the end of quality control for any application.

What About Automated Tests?

A counter is that automated tests have effectively infinite resources. The time it takes to run manual tests might be a couple of days, and in that time, we can run all the automated tests we could ever want. This argument is that the bucket is so big we could never fill it. This is technically true. Given several hours or days, you should be able to run any automated test you want for a single application. However, it misunderstands a crucial piece of how automated tests differ from manual tests.

In reality, the metaphorical bucket is sometimes even smaller for automated tests than it is for manual tests. Automated tests are expected to run much faster than manual tests. While manual tests can take hours or days, automated tests do not have that luxury.

The acceptable time varies depending on the type of tests, but UI tests for an app should have a target runtime of a few minutes. The good non-UI test should aim to run in seconds. Even a few minutes can be considered slow.

Anything slower hurts developer productivity in the red-green-refactor cycle and the deploy pipeline. This leads to frustration, and eventually tests being ignored or not written.

Automated tests are also harder to maintain due to their complexity. Instead of a plain-text set of instructions, these instructions need to be written by someone with programming knowledge. Automated tests are code, and so they have code issues. While automated tests consume less of your time bucket, they often consume more of your complexity bucket.

So yes, while automated tests are much faster than manual tests, they are expected to run in a fraction of the time and are harder to maintain. The bucket still fills quickly.

How Do We Fill the Bucket Well?

Let’s examine a case where we are asked to test every permutation of every setting. For the sake of the example assume we have 3 settings and each setting can have 3 values. To test every permutation would take 36 test cases. To test each pair using a pairwise testing strategy would take 9 cases. What should you consider when someone points out that you might miss in issue in one of the 27 cases you did not test using a pairwise strategy?

You should point out the likelihood of one of those tests’ permutations failing versus any other test of the app failing. These tests are competing for the same space in your bucket as every other test in your app, and so they need to be compared with every other test.

You already have the pairs covered, so these tests only catch failures that happen with extremely specific settings that do not happen with pairs of settings. On top of being relatively unlikely to find issues, it also means fewer workflows would be affected by a failure.

Could you add any other test to a more commonly used, and more impactful piece of functionality instead? A different piece of functionality that does not already have 9 tests or a workflow that affects more than 1/36th of your users might be better for the overall quality of your app.

In general, there are few questions from the example above you can ask. Remember you should compare to any other test you could write.

How much space does this test take in my bucket? A unit test takes less space than a UI end-to-end test, so you can more liberally add unit tests than UI tests.
How impactful is a failure of this test? A test that checks login works, is more important than a test checking that the login page has the right color buttons.
How often would a user encounter failure that this test caught? A test for shopping checkout is more important than attest for the “About Us” page.
What is the likelihood that this test catches an issue that another test does not? A test that checks a workflow that is mostly covered by another test should be less valuable than testing a completely different workflow.

There are exceptions of course. Sometimes you need to test all 36 permutations, as they are all vital. Sometimes those tests are so small that the amount of the bucket they take up is acceptable. The important thing is that those tests are not added by default because “we need to test everything”.

#qa #devops