Recently I released a mutation testing tool for smart contracts on GitHub called Vertigo.

In this series of blog posts, I hope to do the following things:

  • Show you how powerful Mutation Testing is
  • Explain how you can use mutation testing in your SDLC
  • Demonstrate how you can use Vertigo on some example projects

Vertigo is accompanied by a research paper that was presented at CBT’19. by Yves Alarie

Why Mutation Testing?

In this first post, we will review the concepts, and more importantly, the motivation behind mutation testing.

There is no shortage of stories on the security incidents that have occurred with smart contract systems (the DAO Hack, Parity Wallet Hack, batchOverflow, … ). The high stakes, high-risk environment has sparked the development of many tools and techniques that help increase the security of smart contract projects ( e.g. Mythril, the MythX platform, the K framework, Verisol, etc.).

Many of these tools deserve their place in your development life cycle, improving the security of smart contracts across the spectrum. Another method that is already actively being applied in smart contract development is unit testing. Unit tests can be used to make sure that a program or smart contract performs as expected on a set of concrete inputs. While this does not exclude the presence of bugs, a passing test suite does give a sense of confidence in the correctness and security of a smart contract.

This raises the question: How confident should a passing test suite make you feel? Most development teams use code coverage as a metric to answer this question. As the name suggests, code coverage counts the percentage of lines, statements, branches, etc.. covered by the tests in a test suite.

Unfortunately, there are some problems with this metric:

  • Firstly, one can write tests that cover a lot of code at once; they can improve the measured code coverage while not actually adding many assurances. Test quality will seem to increase, but in reality, it stays the same.
  • Secondly, unit tests may lack proper assertions. As a result, parts of the code may seem to be well covered, whereas the business logic itself is insufficiently tested.

In short, you should not be using code coverage as a metric for the security or correctness of your smart contracts. by Yogi Purnama

Mutation Testing to the Rescue

Mutation Testing is an approach that can help with the evaluation of a test suite’s quality. It specifically tries to answer the following question:

“How good is this test suite at finding bugs in the smart contracts.”

by Amelia Barklid It does so by generating slightly changed versions of the smart contract called mutants. Each of these mutants represents a potential bug in the smart contracts. For each mutant, we can check whether at least one of the tests fails (this is called “killing a mutant”).

Mutation Operators

The mutants are generated based on mutation strategies, called “Mutation Operators”. A mutation operator implements specific translation rules that try to introduce faulty behaviour. The following figure shows some mutation rules that transform comparison operators into their exact opposite. E.g. An equals operator becomes a not equals operator.

<   =>  >=
>   =>  <=
<=  =>  >
>=  =>  <
!=  =>  ==
==  =>  !=

Vertigo implements a series of these mutation operators targeting different parts of the contract like arithmetic operations, modifiers and comparisons.

Evaluating Mutation operators

Determining whether a mutation is killed ( a mutation is killed when one of the tests fail for the mutation) or not is quite straightforward. You run the test suite on the mutated program and find if one of the tests fail.

However, besides a test that survives or is killed, we can end up in two other situations:

  • The first additional class is timed out; we encounter this case of mutations in situations where a mutant creates an infinite loop or otherwise slows down the execution of the test suite by a lot.
  • The second class is errored, which can occur whenever the compilation of the mutant was not successful, or the testing framework encountered some unexpected error (other than a failing test ofc.)

Equivalent mutations

Sometimes a mutated version of a program is equivalent to the original program. Take for example the function max from openzeppelin’s contracts/math/math.sol, and the mutated function mutated_max().

function max(uint256 a, uint256 b) internal pure returns (uint256) {
    return a >= b ? a : b;
}

function mutated_max(uint256 a, uint256 b) internal pure returns (uint256) {
    return a > b ? a : b;
}

Even though there is a syntactic change in the program ( “>=” is changed to “>”), it will still give the same correct results for all possible inputs. As a result, this mutant will survive an execution of the test suite. In the evaluation of the performance of a test suite, we prefer to ignore these cases, as there is no actual fault introduced in the tested smart contract.

These mutants are called “Equivalent Mutations”, and they form one of the main weaknesses of mutation testing. Existing approaches to automatic detection of these equivalent mutations are not perfect, and often a developer will have to review mutation testing results to filter the equivalent mutations.

In conclusion

After generating, testing, and filtering all equivalent mutations, we end up with a number of surviving mutants, using this number, we can compute the mutation score: You can use this mutation score to answer the question from the beginning of this post

“How good is this test suite at finding bugs in the smart contracts”.

Furthermore, the surviving mutants give you valuable information about which parts of the code are insufficiently tested.

Thanks for reading this all the way through to the end! In this post, we looked at the main concepts used in mutation testing: mutation operators, killed and surviving mutations, equivalent mutations and the mutation score. In the following post, I will show you how to apply these concepts using Vertigo.


Thanks to tintin and Bernhard Mueller for providing feedback on drafts of this article