Isn’t Unit Testing Enough? A Testing Pyramid Intro – The Build Steps

TIP: References Quick List

Introduction

So far, we have only been talking about unit testing. Although we aren’t yet ready to dive into the details of the higher levels of testing, is is still good to understand what they are with broad strokes and why a layered approach is recommended. Also, it is worth mentioning that not every team / company will implement all of these layers. Some will also implement a similar layer under another name.

The goal of such a layered testing is to identify issues as early as possible, with the fastest-running and least-brittle tests as possible, so we are quickly able to identify the source of issues. While we want to increase confidence in our code, if we deferred all testing until we were integrating multiple systems to simulate a user’s experience, we would be both waiting so long for the tests to complete and finding so many bugs that our ability to deliver features and bug-fixes to the users would be severely hindered. Also, with so many applications possibly in a system-level test’s scope, the cost to resolve the bugs found would sky-rocket, as multiple teams would be involved in the troubleshooting of even minor issues that otherwise could have been caught in lower levels of testing.

The Build

The first few layers of testing can and should be part of the build process that runs on either developer’s local machines or in the build server, whenever possible. This will give developers fast, early feedback on issues.

Unit Testing

Making sure each small unit (often a single method or related methods in a class) work as expected is definitely a solid starting point. The unit tests should run quickly (ideally each measured in milliseconds) and in parallel. This allows us to add a larger unit test suite without adding too much delay to the build. This is where the majority of our tests will exist. However, it is not the whole story. Unit testing is great for getting into all of the little corners and edge cases, too. Therefore, we are likely to have higher code coverage in this layer compared to the rest of the pyramid. Isn’t that enough?

No, that one method doesn’t run in isolation. Rather, it is a small piece of a larger puzzle that, together, makes up a component, an application and/or a system. At each layer, the communications and interaction between neighboring logic could cause unwanted behaviors for our users.

Unit testing in the build should also enforce that a minimum level of code coverage is maintained. Unit testing will have the highest code coverage for any of our testing layers.

Component Level Testing

There is much debate about what should be considered a component. This could be:

A jar containing:
- Utilities that client code may include as a dependency,
- A Java client to make invoking an API easier
- etc.
A microService application
etc.

Setting the definition of what our component boundaries will look like aside, during component testing we want to test the logic in that component as a whole unit, in isolation from the other parts of the system. We don’t want to include live calls to back-end APIs or databases. By avoiding live calls, our component tests will run faster (not spinning up a database instance, avoiding delays due to network latency, skipping the wait on the back-end to possibly call its own back-ends, etc.).

Ideally, this should also be part of the regular build process, so by avoiding live back-end calls, we can skip any authentication / secret management (passwords and/or certificates) in the build. We need the build to work on both developer’s local machines as well as a remote build server. This also helps keep our build stable. Otherwise, if a shared password was changed or a certificate used for authentication expired, we would have to update not only the build server, but also all developer’s local machines, too. Having the build unexpectedly break because a password was changed when you have a high-priority bug-fix that needs to be pushed out ASAP is not where we want to possibly find ourselves.

If testing an application that exposes APIs, this is a good time to test edge cases for:

Input validation and error handling
Happy-path scenarios when invoking the API
Exception / error handling

Also, keep in mind that you can add timeouts to tests, so you can get an early warning if a change causes the code to start responding more slowly than is allowable.

Code Coverage during component level testing is something we can measure during the build. I would recommend setting up a minimum threshold for component test code coverage as one of the quality gates for our build, in addition to the enforcement that component tests all succeed. However, we likely won’t be able to realistically achieve the same level of code coverage during component testing as we did during unit testing. For example, null checks in an internal method may never see a null value because earlier code already rejected the request when the value was null.

Static Code Analysis

Static code analysis tools try to find bugs or code smells in the runtime logic of our code. A code smell isn’t necessarily a defect, but rather an indication that something may be wrong in our application’s design or implementation that may cause us pain either while maintaining or developing future features for the code. They may be part of the build directly (i.e., maven build plugins), or it may be part of separate pipelines that are kicked off when a commit is pushed to a PR or a main branch for the repo.

These types of scans can help developers find bugs they may have overlooked during unit testing, since our tests are only as good as the scenarios and validations that we think to add. A few examples from the spotbugs-maven-plugin are:

Static Application Security Testing (SAST)

There are many different tools that can be used for SAST scans. Some are open source and free. Others require a commercial license.

This quality gate is very similar to the Static Code Analysis section above. However, the focus on the SAST scan is security vulnerability related issues. For example:

Similar to the static code analysis, this could be part of the build itself, or it could be an automated action that is kicked off whenever code is pushed to a PR or a main branch for the repo. Either way, it should be a programmatic quality gate, deciding without user input as to whether the pipeline for the given commit should proceed. Can the PR be merged? Can the code be deployed?

Conditional – Manual Peer Reviews

At the point where your code passes automated quality gates and testing validations, you’re likely ready to solicit peer feedback from other engineers on the team. This isn’t part of the testing pyramid, as the testing pyramid starts over at unit tests once the new/updated code is merged into the main branch(es).

This is often done in a pull request (PR) process when a team shares ownership of a code repository (repo). However, since you have automated unit tests and static analysis as part of the Maven build, then adapting to peer feedback should be quick and efficient, helping you keep up momentum.

If your tooling allows it, I would also recommend enabling a PR merge check to ensure that the build is successful before PRs can be merged into the main branch(es) in the code repository.

Summary

Before any deployments necessarily occur, we can shift the testing effort left, getting quick and fast feedback to developers as new code is added to a project, by having our build actions include:

Unit Tests
Component Tests
Static Code Scans for:
- Runtime bugs
- Security vulnerabilities
Manual Peer Reviews

Each of these layers is easier to add near the start of a project, so we should be designing the build’s testing and quality gates as part of spinning up a new project. Existing projects can also be on-boarded to these layers and benefit from their early feedback loops. However, if adding these to an existing project, we will also need to consider how to handle the existing code base.

Coding Chica