Navigation: part 1 | part 2 | part 3 | part 4 | part 5 | part 6
The first and second parts of this blog post series are an overview of automated software testing, with part 1 focusing on the why and part 2 a pretty high-level how. Now, with the third part, let's delve deeper into the how of unit tests specifically.
As I explained in part 2, unit tests shouldn't see the "big picture". They must be purposely designed to test their units and nothing else. It's often tempting to create unit tests that span multiple units in order to get a more realistic representation of state, but you should strive to avoid this. Such tests should be added at the integration level instead.
Let's get interactive
Part of my goal is to explain many of the concepts here with real examples instead of just words, because I think that running a test for yourself and seeing it pass or fail is much more powerful than simply reading about it. To that end, I created an online JavaScript runner/tester, which lets you write JavaScript code and tests and run them immediately in your browser.
I've chosen Jasmine, a popular testing framework, to help show the concepts below. Please take a moment to look at Jasmine's terminology and syntax so you can follow along more easily. You can also keep that page open in another browser window or tab and refer back to it if you're unclear on how a particular Jasmine feature works.
Now let's take a look at a very basic coding and testing example, "hello world".
On the left, you'll see a function that returns the familiar string. This is the code that is to be tested. On the right, you'll see a test suite defined using the describe
function, and two tests ("specs" in Jasmine parlance) defined using the it
function inside. If you run the tests, you should see something similar to the following in the test output:
Jasmine v2.3.4 started.
Hello World function
should exist
[passed]
should return the expected value
[passed]
Hello World function: finished
Test run complete.
Both tests passed, hooray! You can play around with the helloWorld
function and its tests, and run them again to see what happens when failures occur. Jasmine's syntax is intended to assist developers with using BDD, but that's a topic that is far outside the scope of this post, so we can gloss over the details, such as why some things are so verbose. Suffice it to say, natural language constructs are an important aspect. In fact, as you can see from the output above, the suite and spec text could be read as "Hello World function should exist" and "Hello World function should return the expected value".
Of course, Hello World isn't a real-world example. Not only are there no code-containing units to speak of, there's virtually no functionality, either. This makes for a great example of how to use the tools at a mechanical level, but not any deeper.
A better example
To demonstrate something a bit more advanced, we'll need to start making use of code encapsulation in JavaScript, often implemented using a constructor pattern. This also serves as a decent approximation of classes with public and private methods in languages that support these concepts natively. For the next example, then, let's say we want to create a relative time formatter - something that will take a date object and return, for example, "5 days ago", based on the difference between that date object and now.
Relative time formatter example
As you can see, the formatter's code is a lot more complicated than Hello World, but the tests in this case are pretty simple and straightforward: they just pass in various dates and verify the "correctness" of the output. Notice that I put quotation marks around the word correctness. The reason is that, with unit tests or any other tests for that matter, what you consider to be correct may not match what your users consider to be correct. It may be an obvious statement, but it is nonetheless important to keep in mind that just because the tests are passing doesn't mean that there are no issues.
Test doubles
While the above example is relatively isolated from external dependencies, it does rely on one: the current date/time. Since we can't predict when our tests run, we have to have a way to control what the formatter thinks is the current date/time in order to have meaningful tests. Luckily, Jasmine provides a way to "stop time" at a desired point by replacing the real Date
object with a fake one. Faking objects is an example of using test doubles. As the linked article explains, test doubles include mocks, stubs, and fakes. Although Jasmine uses a function named mockDate
to accomplish its goal, it's really creating a stub rather than a mock. However, terminology surrounding test doubles isn't universally consistent, so you should expect differences in definitions when reading articles or talking with people about them.
It's important to know when a test double requires a tear down procedure - and even more important to then implement it. In this case, jasmine.clock().uninstall()
is called after each spec run. This is done in order to prevent "leakage" of test doubles from specs that run earlier in the execution order to ones that run later. Such leakage can cause very confusing test results and problems that are often difficult to track down. Since there is only one suite in this example, and all specs rely on the test double, the tear down is not strictly needed here, but it is nevertheless a good practice.
Public vs. private
The only public function exposed by the relative time formatter is format
, due to it being assigned on this
. The other functions in it are private. That means they cannot be tested directly. It can be tempting to expose additional functions publicly in order to test them directly, but you should avoid doing so (unless those functions are actually needed by non-test code). One reason is that, as you are testing a unit, you should only be concerned with the unit's inputs, outputs, and external side-effects (if any). The internal state of the unit shouldn't matter, since that isn't the purpose of unit testing: what you should be testing is the what, not the how. Another reason is, if you start tying a unit's internal functionality to external dependencies (tests in this case), the unit becomes extremely brittle. For example, right now you could rename getMidnightOfDate
and the calls to it, or even get rid of the function entirely and duplicate its code in the two places it's called, and no tests should break. If you were to expose this function publicly, you would now be tied to the current implementation, and any change would mean fixing tests that failed for no good reason. Not only that, but other people may start using this exposed function when you hadn't intended it, which effectively prevents you from changing it without costly and time-consuming refactoring.
So, how do you test private functions? The answer is, you test them indirectly. Going back to the example of the getMidnightOfDate
function, it is tested by calling format
with a date that is 24 hours in the past or older. This indirect testing can be painful to do at times, but when that pain becomes too much, it's a strong indicator that your unit may need to be refactored or broken up into multiple smaller ones. If you're at the point of drawing diagrams just to figure out the precise set of inputs you need to test a certain code path, you should take a step back and ask yourself whether the unit is just too big.
Too much vs. not enough
At what point do you say that your unit has enough coverage? (And don't tell me that it's when your code coverage tool says 100%! See part 2 for a refresher on that.) The way to determine the answer can be considered partly art and partly science. You should strive to test most, if not all, of your unit's code paths. However, a meticulous analysis of the possible paths and their tests can be very time consuming. Sometimes the detailed output of a code coverage tool will be able to point out paths that aren't tested, and you can judge for yourself whether to add tests there.
For this example, I chose to write tests that exercise the major code paths (i.e., the different words for days, a future date, and generic past days), as well as an extreme (365 days) that is still well within the realm of possibility. I did not choose to write a test that, for example, verifies that an exception is thrown if something other than a Date
object is passed to format
. While that could conceivably happen, I feel that there's no need to check this case because I consider it undefined behavior and, as such, unimportant.
I also didn't test leap years because I expect the browser's JavaScript implementation to handle that for me (so if I were to compare 2017-01-01 and 2016-01-01 the difference should be 366 days). There is generally no need to verify the correctness of the basic frameworks or language features that you're using. While it is possible to encounter bugs in them, this is an exceedingly rare occurrence, and not one you should spend time worrying about or writing tests against.
If you write many tests that exercise the same code paths, that makes it difficult to modify the unit in the future, because a lot of tests would break. On the other hand, if your tests don't exercise the important code paths of your unit, you could encounter bugs either in the original implementation or after a modification, if such a modification accidentally changed the result of an untested code path. Striking a balance is key here.
Next time
In part 4, I'll to go into more detail on test doubles (and spies) and how to use them to verify a unit's external interactions.