In part 5 we discussed how integration testing is used to improve the quality of software at a higher level than unit testing and when it should - and shouldn't - be used. As I mentioned in that post, integration tests aren't suited for testing functionality that relies on slow external systems like databases and web services or functionality that touches the UI. System testing is most useful in these areas.
While integration testing provides valuable feedback on the interaction between units, system testing provides equally valuable feedback on the functionality of the application as a whole. Recall the pyramid structure from part 2. Just as integration tests see more of the application than unit tests, so do system tests see more than integration tests. Where integration tests can catch issues that unit tests cannot, system tests can catch issues that integration tests cannot: they exercise the interactions between major application components as well as interactions with external services.
Adding system testing to your process
At the most basic level, system testing involves getting your application installed in an environment that mimics production as closely as possible and using an automation tool or framework to access that application in the same way that a real user would. This type of testing is very similar to how an application would be manually tested. However, setting up an appropriate environment for system testing can be challenging.
Where to run system tests
System tests should be run on local development machines, at a minimum, in order to facilitate easy and quick development as well as maintenance of the tests themselves. Ideally, the tests should also run as part of a continuous delivery setup. If they cannot be run on development machines directly because the application itself can't run in such an environment, a virtual machine may help. If specialized hardware is required to run the application, it may be worthwhile to invest in automatable hardware simulation tools that can be set up to respond as the real hardware would in various states.
The test environment in which your application runs must be able to handle multiple executions of your system tests. This means that, if your tests add or delete information in the application, those changes have to be reverted between successive test runs - otherwise, such add or delete operations can fail because of previous modifications to the data. If at all possible, the full expected initial state of the application and its dependencies should be configured during the setup phase of your system tests. This could involve copying a "known good" database file to a configured location, running a local web server to respond to web service calls made by your application, cleaning up an output directory, and other such actions. Avoid executing important steps as part of a teardown procedure because catastrophic failures during test runs may prevent teardown actions from executing.
If you're testing a network-connected application, then it likely relies on external web services and/or databases. It is sometimes impractical, especially for larger systems, to set up all of the external dependencies locally for system testing. In these situations, having dedicated test environments hosted elsewhere can be very useful. You could then point your local application instance to this test environment and run your system tests against that. You must be careful, though, not to affect the shared environment when running your system tests: after all, you don't want to cause issues for other people or automated test systems by deleting important data that they may be relying on.
To avoid such issues with shared environments while still exercising your application's data modification functionality, it can be useful to set up a kind of test double for the external services that your application needs. This could be as simple as a local database server that is initialized to a known good state before running the system tests. The application can then be pointed at this local database, and you wouldn't have to worry about affecting a shared test environment's database.
While developing and running system tests you'll likely encounter a number of common issues. Some of them may be tricky to deal with, while others not so much. The important thing to remember is, do not accept flaky tests. Tests that occasionally fail for no apparent reason will happen. If a "random" failure occurs once in every 100 runs, it should be looked at, but it's certainly not the end of the world. If, however, a failure occurs every third run, it needs to be addressed quickly. As I mentioned in part 1, people may stop caring about failures if they happen so often, and then your automated testing is all but useless.
One of the most common and most annoying issues with system tests is a timing-related problem. The test tried clicking a button on the main screen before a modal dialog finished closing. The test didn't wait long enough before checking the value of a text box while the application was updating the screen. The test tried typing into an entry field before the screen finished loading. These are all realistic examples, and they can be rather frustrating. There is more than one way to address these timing issues. The most obvious is to add a simple delay (or "sleep") to the execution of the problematic test. Unfortunately, this is usually also the worst way, in terms of both stability and performance. A delay of one second, as an example, may solve a particular timing failure on your machine, but if someone is running the same test on a slower (or busier) machine, it may not be enough time. Conversely, if you "play it safe" and add a ten second delay, then you slow down the execution of these tests needlessly for everyone who runs them.
A better way to address timing issues is to have your tests look for specific events to happen before executing the next step. For example, you could verify that the window handle of the modal dialog no longer points to a valid window before trying to click the button on the main screen. In the case of a textual update happening too slowly, you could either wait for a signal from the application that everything is ready (if such a signal exists) or you could check for the expected value in a loop with a reasonable timeout. In this case, having a timeout of ten seconds wouldn't be nearly as bad as forcing a ten second wait on everyone because the loop would exit as soon as the correct value is seen, and if the correct value is not seen in the ten seconds, the test simply fails.
Another issue that happens with system tests is the execution of unrealistic steps. This often involves clicking a button that is invisible or otherwise obscured such that a real user would not be able to click it. This usually happens because a test was written to push an event directly to an object via the automation framework, which generally doesn't know whether the object is truly available for the user to interact with. Executing unrealistic steps will sometimes cause failures because the application isn't yet ready to accept whatever input the test is sending it. Other times the application may get into an unpredictable state because a series of events occurs that would normally be impossible for a real user to execute.
These issues can be avoided by simulating input as realistically as possible. For example, instead of sending a click event directly to a button, you could determine the button's coordinates and then send a mouse click event to the main application window at those coordinates and let the application propagate the event down to that button - or to whatever else may be there on top of the button.
Very often the reason for many of these common issues is that the testing or automation tools don't provide a good way to do what you need to. This is a systemic problem, and the best way to avoid such problems is to create systemic fixes. If your UI automation tool requires three method calls to be able to get the coordinates of an object and send a click event to the application at those coordinates, you should create a single wrapper method for this and encourage everyone to use it. If the automation tool doesn't provide an easy way to wait for a text box to contain an expected value by constantly polling it until a timeout occurs, create this method yourself. Essentially, if you find that you're writing the same boilerplate test code in different tests, extract it to a helper class or module. This way you can improve the stability and reliability of most, if not all, of your system tests.
In this series of blog posts I've explained the value of automated tests at the unit, integration, and system levels. I've also discussed the types of issues that you'll often encounter with these tests and how to solve - or at least mitigate - many of them. If you implement a robust testing strategy for your application, you will benefit immensely from it: not only will it be easier for you and others to implement changes and fixes without breaking existing functionality, but you will also have increased confidence in the overall stability of your codebase.
Go forth and boldly create!