Navigation: part 1 | part 2 | part 3 | part 4 | part 5 | part 6
As you will recall from part 4, we left off with a problem. All the unit tests for the relative time formatter v2 passed, but I explained that there was, in fact, a bug in how the day formatter was being called. The bug wasn't caught by the unit tests because of faulty assumptions made while writing the primary formatter "class" and its tests.
The purpose of integration testing
Although I created the aforementioned bug deliberately to show how unit tests alone aren't enough, it's very easy for such a bug to occur naturally during the development cycle. In more strictly-typed languages, a compiler error could have occurred if common interfaces were used to build the hour and day formatter stubs, and this exact scenario could be avoided. However, that wouldn't address the entire problem. An issue with call arguments is an issue of data flow from one unit to another. It could also manifest itself in more subtle ways, such as with improperly formatted strings or with unexpected nulls, and those bugs would not necessarily be highlighted by a compiler.
Integration testing is how we can mitigate data flow issues. Instead of isolating every unit and testing it individually, we instantiate multiple units and test how they function together. This gives us confidence in our code at a higher level: it tells us that an entire section or functional area of the code is working as intended.
The tests
Let's take a look at an integration test for relative time formatter v2.
Relative time formatter v2 integration testing
The implementation portion in the above link is unchanged from the relative time formatter v2 implementation introduced in part 4 - bug included. The tests, however, are quite different. You can see that we're still mocking out the current date, since the need for a stable and reproducible environment still exists, but instead of having separate suites testing each unit individually, there is a single integration suite whose setup function instantiates all three classes and passes the day and hour formatter instances to the primary formatter exactly as a real system would.
If you run the tests, you'll see that the first two pass, while the second two fail, thus exposing the bug. Note that, while there were a lot of unit tests for these three classes, there are only four integration tests. This is by design. Integration testing is not intended to cover every possible scenario that could occur in the individual units. Its focus is on covering the major interactions between the units. Arguably, the integration suite could have only contained two tests - one for interacting with the hour formatter, and one for interacting with the day formatter. I chose to have two tests per formatter to verify that dates in the past and in the future are working as intended. I could see a possible desire to split those up into separate classes, so I figured that a couple of extra tests might help.
Let's fix the implementation.
Relative time formatter v2 integration testing, with fix
The integration suite in the above link is identical to the one in the previous link, and the only change in the implementation is the call to the day formatter: this.dayFormatter.format(now, target);
. Now the integration tests pass. However, if you were to run the previously created unit tests against this code, there would be failures in the primary formatter's test suite. Those unit tests would then need to be updated to work with the fixed primary formatter. Since no code was changed in the hour and day formatters, their respective unit tests don't need to be updated.
Cascading failures and improper fixes
The interplay between different types of automated software tests is such that a failure in one type (e.g., integration) will sometimes cause failures in other types (e.g., unit) after a fix is applied to correct the first failure. You must always be cautious when facing this situation, and you should pay close attention to the exact kinds of failures you're encountering and fixing. You may discover, for example, that after applying a fix for an integration failure, you end up with a failed unit test with a very specific defined scenario. If your fix broke that scenario, you have to determine whether the scenario was incorrect, or whether your fix has now broken something important, in which case you need to reconsider how - and where - to implement your fix.
It's easy to get caught up in fixing one area of code and to not pay much attention to "side effects" of broken tests. If a test failed on an expectation of True
for some value that now returns False
, the way to make that test pass once again is clear. Unfortunately, simply changing the expectation is not always the correct course of action, as tempting as it may be.
What you must do is examine two things: why there was a test for the value in question, and why that value changed. If the reason for the test isn't obvious and the description of the test or comments around the failed expectation don't yield useful information, then it's possible that the expectation is extraneous and could be removed. On the other hand, it could have been added hastily as part of a bug fix at some point in time. Version control systems can help pinpoint when the expectation was added, and it's often useful to see the entire changeset where this addition happened in order to see it in context and gain a greater understanding of the change as a whole.
If you've determined that the reason for the expectation is valid, you must then figure out why the value has changed. If the change was a direct result of your fix, then perhaps the expectation needs to be updated, and if that's the case, you should make sure this change doesn't negatively impact code downstream and then update the expectation. However, if the expectation is still entirely correct, or if this change will introduce problems in other parts of the system, then your fix is likely improper.
Unit and integration testing, used together, can be a powerful tool to help determine the viability of bug fixes as described above. In a complex system, an initial attempt at a bug fix may not always result in a viable solution due to unforeseen effects down the line. Writing good integration tests that encompass potentially obscure functionality will give you greater confidence that bug regressions in that area of code can be avoided.
Determining boundaries
It's important to define the boundaries in your integration tests before you write them. In the case of the relative time formatter v2, the primary formatter, the hour formatter, and the day formatter are being tested together. Everything else that happens to be a dependency, such as the clock, must still be replaced with a test double.
Determining the boundaries is rarely an exact science, unfortunately. In the above example, all the units that we have form a distinct piece of functionality, the relative time formatter, so the boundaries are pretty obvious. In real-world projects, you'll often find similar sets of units that work together to accomplish a particular task. Those are good candidates for integration testing. In an n-tier application, you'll often want to test the interaction between tiers as well.
Avoid creating integration tests that need to communicate with slow external systems such as databases, remote file systems, and web services. Creating a temporary in-memory database and using it in an integration test is perfectly fine, but if you need to communicate with a real instance of a remote database, you're introducing a complex, relatively slow, and potentially brittle dependency. Integration tests should be reasonably fast, and they should not fail just because some database instance on another server was down for maintenance.
Most of the time you should also avoid creating integration tests that touch the UI. Those tend to get overly complicated and brittle when isolating the set of units you're trying to test. It's generally better to leave that to system testing, which will be covered in part 6.