The Fortified Testing Philosophy

Mission Statement

Construct your test suite such that it is fortified against internal and external changes to your system (allowing such code changes to happen with minimal changes to your tests), while at the same time ensuring your test suite provides a strong level of confidence in the accuracy of your code.

About This Philosophy

This philosophy's mission statement can be achieved by following a number of good practices and guidelines that are outlined in detail in this document, which are as follows:

Some things to keep in mind about this document:

Definitions

There are, unfortunately, many valid ways to define automated testing-related terminology. For the purpose of this document, the following definitions will be used.

Examples of incidental stability

The following examples demonstrate ways in which you may accrue incidental stability if you are not careful. This list of examples should not be taken as an exhaustive containing all of the ways in which you can accrue incidental stability. If the amount of incidental stability accrued is going to be an issue, then you can use techniques such as those found in the following sections to help mitigate the problem.

Incidental stability can happen in the arrange section of your tests. If your codebase writes audit information to a log file, you may find yourself stubbing out the file-write function in a large number of your tests. Each time you stub out the file-write function, you add incidental stability to the "audit information gets written out to a file" behavior. If, later, you need to instead send audit information to an external service via network requests, you may find yourself doing a lot of work to update your tests due to how stable this behavior has become.

Incidental stability can happen in the act section of your tests. Say you've written many tests for a "create user" function, and all of these tests explicitly supply a required full name field. Later, things change and you need to be able to accept the "first name" and "last names" as separate inputs instead of having them combined into a single "full name" field. Every test that explicitly supplied the full name field will now have to be updated.

As a second example of incidental stability in your test's act section - consider the scenario where you have a number of tests that verify that various password policies are in force when you use the create-user function - these all add incidental stability to the fact that you can supply a password during user-creation in the first place. Separating the set-password action into its own endpoint would require you to spend lots of extra work updating tests as well.

Incidental stability can happen in the assert section of your tests. Perhaps you coded up the handling of 4xx and 5xx errors in your REST API using a single shared helper function that returns consistent headers, such as `Content-Type: application/json`, If many of your tests explicitly assert that the response's Content-Type is application/json, then you're adding a lot of incidental stability to this shared behavior.

Most Code Would Benefit from Automated Tests

[Ensure] your test suite provides a strong level of confidence in the accuracy of your code.

Reaching a "strong level of confidence" means having an automated test for nearly every public behavior of your system. What constitutes a "behavior" is very much open for interpretation and will need to be decided on a case-by-case situation, but here are some guidelines:

There are valid reasons you may choose to not write automated tests for a piece of code:

Strive to keep the amount of code in these non-test-friendly regions minimal. Sometimes there are chunks of logic that can be moved out into other locations that are easier to test, such as extracting out pure functions that are then put under test, or moving logic from the non-test-friendly zone to whoever is triggering the execution of the non-test-friendly zone.

Notes on code coverage tools

Configuring a code coverage tool for your project can be an optional, but helpful way to make sure you don't have any glaring holes in your test coverage. They're not perfect - they can't tell you if every behavior is tested that should be tested, but they can still be useful in pointing out larger issues.

If you do choose to use a code coverage tool, consider configuring it to require 100% line coverage. You won't actually be required to get 100% of your lines of code under test, instead, if you feel that a piece of code should not be tested, simply inform your code coverage tool to turn a blind eye to that piece of code (typically this is done through a special "ignore" comment). When placing a comment like this, you may also consider placing an additional comment down explaining why you're choosing to omit tests for this piece of code. This means you're not really achieving 100% test coverage, instead, what you're actually achieving is "100% of your lines of code is either tested, or explicitly marked as not needing to be tested".

Unfortunately, not all code coverage tools support exclude-style comments. How to proceed under such limitations is up to you and your team, but you may want to consider the option of just avoiding the code coverage tool altogether if such a feature is not available to you. Using any coverage requirement other than 0% or 100% is unfortunately not a very effective alternative as the feedback loop is much too large, i.e. one developer forgetting to test some code may not manifest as an issue until another developer is working in the code and need to intentionally leave a few lines of code untested.

Additionally, don't hesitate to disable branch coverage completely if you find it to be too much of a burden. While in principle it sounds nice to have automated tests for every single branch, in practice it can take a fair number of extra tests to cover every single optional parameter, boolean logic short-circuit, etc. These types of tests tend to not be overly useful (they don't add a ton of additional confidence that your code is working correctly), and it can be a burden to maintain them all.

Avoid Testing Private Functions

Avoid writing tests for private functions. Figure out how the private function affects the public API, then test the behavior through the public API instead. Remember that this is just a guideline, there are scenarios where it may be preferable to write tests against private functions, such as when the private behaviors are actually more stable than what gets seen through the public API.

More context on why you might wish to test a private function directly

Consider a program that gathers information from various types of devices, then combines all of that information together to make a guess as to what the temperature will be like tomorrow. The private function that averages data from various temperature and wind probes should always give the same output when given the same input, but the overall temperature-estimating function's public behavior may change often depending on what heuristics are added or removed from it. In this case, the behavior of the private functions is actually more stable than the public API, which means if you want your tests to stay stable, it would be preferable to tie them to the private functions (perhaps with a comment next to the private function explaining that it is getting used by an automated test, as a hint to future maintainers that this function's API can't be freely refactored without breaking tests).

It's sometimes argued that if you feel the need to test a private function, perhaps it's better to move that private function into a different module and make it public. This is certainly a valid option as well, but it shouldn't be considered the only option. Sometimes you just want to keep the private function private, and that's fine.

Don't Let Your SUT Be Too Small

An unfortunate common pattern is to treat every module (or class) as its own "unit", isolating all modules from each other. Unit testing like this can be extremely unhealthy for your project, and it violates both sides of the mission statement:

  1. It fails to be fortified against internal changes in your system. Almost by definition, we are adding unnecessary stability for every internal module/class of the system, which in turn makes it hard to make any meaningful changes to the system. Every time you want to move logic from one place to another, or divide a module in two, or join two modules, etc, you will also need to fix many tests that broke as a result of these changes.
  2. It fails to provide a high degree of confidence in your code, as the units being tested are simply too small. There's little value in creating automated tests for pieces of code that are overly small, considering the fact that executing those pieces of code by hand whenever you change them is always an option as well. Unit tests work best when they're verifying the behavior of larger components of your system, and making sure that all of those small pieces fit together the way they need to.

While it's less common to see this issue with integration tests, the same advice still applies to it. If the SUTs for your integration tests are overly small, they won't be as effective as they could be.

Don't Let Your SUT Be Too Large

While it's true that individual modules and classes generally don't need to be tested in isolation from each other, that doesn't mean that no isolation should happen at all. For larger applications, it may be valuable to divide up your application into "components", and then test each component in isolation from each other, using test doubles to prevent a SUT from ever passing through multiple components. How big or small a component should be can be a difficult question to answer, but here are some guidelines:

When in doubt, use a larger SUT.

An example of a SUT that is too large

If a large portion of your codebase participates in audit logging, you may want to have tests in place to make sure the correct things get logged when they should. However, if each of these tests are mocking out an API for writing to the file system, then you are going to be in trouble if you ever need to change the way the audit logs are saved (e.g. what happens if you need to save them over the internet instead, or if you simply want to make the writes throttled, which in turn requires each test to additionally do some timing-based mocking?) While it's true that you want some unit testing around how you write your audit logs out to disk, you don't want every audit-log-related test to be verifying that audit logs get saved to disk - that's putting way too much incidental stability on a single concept. In this scenario, it may be best to shrink the size of your SUT a bit, so you only verify that audit-log messages are being sent to some audit-log handler component, and then have a separate set of tests to make sure the audit-log handler behaves as it should (e.g. it actually writes incoming messages out to a log file).

DRY Your Tests When Appropriate

Shrinking your SUT isn't the only way to reduce incidental stability. If many of your tests are adding incidental stability to a single behavior, you may choose to move the dependency on that behavior into a single helper function, and then have all of your tests depend on the helper function instead. If the behavior needs to change, you now only need to update your helper function.

Don't overdo it. A little incidental stability is fine and is often preferable to the added complexity that comes from DRYing your code.

Additional notes on DRY tests and readability

A good metric to decide what goes in a shared helper function and what doesn't is to look at the specific behavior you're trying to test, then figure out what content in your test is strongly correlated with that behavior, and what content is just "plumbing". Moving the plumbing out into helper functions can help reduce the noise in your test and make its intent more clear, however, you (usually) want to be careful not to also put the essential pieces of your test into helper functions.

For example, if you wish to test a function that fetches data from a REST endpoint, transforms it, and returns it, and you wish to assert that various pieces of information are present in the returned data, make sure it's clear why those assertions are asserting the values they assert. If they assert that a "name" property is set to "Sally", then you probably want to make sure that, when arranging your test, you also use the word "Sally" somewhere, instead of having that abstracted away into your helper functions.

One good way to help reduce plumbing is to avoid the "beforeEach()" hook in your tests. The problem with "beforeEach()" is that it can't be parameterized - it will always run the exact same setup logic for all of your tests. A better alternative is to have a shared "init()" function that each test explicitly calls. If the tests need to customize certain setup behaviors, they can pass in the necessary configuration into the "init()" function.

Consider Testing Shared Logic Through a Single Public Function

If multiple parts of your public API depend on the same shared behavior, instead of extensively retesting that behavior through every public function that has it, consider picking just one public function to do the extensive testing through. The rest of the public functions can have one or two tests to verify that the shared behavior is present, but they don't all need to do extensive testing of the same behavior.

Some notes on organization

If your organize your tests by creating a separate test file for each public module, it might be worthwhile to break that convention when dealing with these sorts of tests for shared behavior. While it's true that you are testing through a single function, the purpose of these tests isn't to verify the behavior of just that one function, rather, it's to verify a behavior shared by many functions, it just happens to be using one function to do it. For this reason, it may make more sense to move these tests into their own file, named after the behavior they're trying to test.

Test Through Your Isolation Boundaries

Any time you make a test boundary that separates one area of your codebase from another during unit testing, it's a good idea to have a few internally-integrated tests that cross the boundary, just to make sure you don't have any bugs at the boundary itself. It's important to not overdue your boundary testing - such tests tend to require more maintenance effort and may break easier, which is why the boundary was put there in the first place.

Prefer Fakes Over Stubs and Mocks

When it's possible and it's not too much work, it is preferable to create and use fakes in your tests instead of stubs. This is because every time you use a stub, you're tying your test to an internal detail (i.e. how your codebase interacts with the stub and what it expects the stub to return), and if those internal details change, you'll have to additionally update many individual tests. If, instead, you used fakes, the only thing you'd need to update is the fake, and everything will just work.

Prefer Arranging and Asserting with Public Apis

If you're trying to test the behavior of one public function, that doesn't mean you can't use other public functions to help arrange your test or assert conditions, in fact, it's encouraged. Using public APIs when arranging and asserting moves incidental stability from internal APIs (the APIs of your test doubles), to your more stable external APIs. It is also necessary to test in this format if you wish to make your unit test double as an integration test, as integration tests rely on public APIs much more.

This isn't a hard rule. There are times when you may want to intentionally add some stability to the internals of your system, for example, you may want to write assertions related to how your data is getting saved, to prevent accidental changes to the shape of the stored data.

Concrete example

If you wish to test a "deleteUser()" function, you could first call your public "createUser()" function to add a user to the system, then your "deleteUser()" function to delete them, and finally a "getUser()" function, to make sure they aren't found anymore.

Don't Over-Assert

Avoid unnecessarily asserting for the same behavior in multiple tests - it is redundant and unnecessarily adds incidental stability to that behavior.

Concrete example

If your createUser() function sends out an email on user-creation, and your unit test is replacing that behavior with a test double, don't make every test for createUser() replace it with a mock (i.e. don't assert that an email got sent out with the correct information in every test). Create a few tests that are dedicated to verifying that email sending is working correctly (by asserting that the correct parameters are passed into the mock). The rest of the tests can stub out the email-verification function if needed, but they need not assert that the correct parameters are passed into it, that is already covered.

Combine Unit and Integration Tests

Many unit tests can be, with a little bit of effort, reused as integration tests. Doing so makes it fairly easy to build up a strong collection of integration tests. The technique works as follows:

Additional Resources