Recently I received an email with the following contents:
I recently started working at a startup. I'm learning lots of cool stuff, including unit testing, but as a math major something kinda nagged at me about them. I don't know if I could write a unit test that I really trust all that much. It's better than manually poking my code to be sure, but I felt there might be a better way. Fast forward to an article posted in ycombinator which gave me the knowledge that there are actually people who prove their code correct. I stumbled upon a blog you wrote, in 2009 and have found a few more from around the same time, but I was wondering if you were familiar with any literature on best practices, application, etc. I have some guesses as to how I personally would start to go about attempting proving code correctness, but I find it useful to have books written by people who implement this practice in their day to day work lives. Thank you for your time
Proving code correctness of a whole system is hard. It's so hard in fact that for a large system it's undoable. So should one give up on that altogether? Well... no not exactly, as there's a different approach to it: do it in two steps. It's based on the idea that a unit test actually tests at least two things, not one:
- A unit test tests whether the algorithm implemented functions correctly with the input specified
- A unit test tests whether the implementation of the algorithm is correct.
If a test fails, it can be because there's a bug in the code (e.g. the code uses the wrong index variable) or a bug in the algorithm (e.g. it fails to reach the desired outcome with a given input set) or both (and then you're in real trouble ;)). Because if a test fails it can mean more than one thing, it's good to weed out all possibilities but one: that way, when a test or verification fails, we know it's always because of the same reason and never because of another. So if we for example eliminate errors in the algorithm, we can be sure that if a test fails, the implementation of the algorithm is borked, the algorithm itself is OK.
The 'easiest' thing to eliminate from your tests is algorithm correctness testing. I say 'easiest', because formulating an algorithm is often easier than writing it down in an executable form, as that comes down to projecting it after you formally written it out in your head. To prove an algorithm to be correct is a task which is ranging from trivial to extremely difficult and it's part of what's called Formal Verification. Formal verification is, like its name suggests, rather formal, so if you're not into math that much, it can be daunting. So I'll describe a different approach which, in my humble opinion, is rooted onto the same ideas as formal verification, albeit not using formal math.
The overall idea is this:
- For a given feature, a specification is written (be it a user story or a thick, 120 page document written by an over-paid consultant) which describes what the feature should do with what input.
- To be able to realize the feature as stated in the specification, algorithms and data-structures on which they operate are designed. No coding is done yet. You'll see that during these first two steps, changes happen often and it's very easy to apply them, much easier than when altering code.
- Using verification techniques, the algorithms are proven to be correct for the expected input.
- Using Program derivation the algorithms are implemented into an executable form. As Program derivation can be very formal, it's often the case a developer falls back to Design by Contract or a less formal way to transform an algorithm into executable form.
The core idea is that if the algorithm is correct, a carefully constructed implementation of it as executable form is therefore correct too. At least for the algorithm part. The implementation of the algorithm can still contain bugs: wrong statements, wrong variable usage, boundaries of used constructs aren't taken into account etc.
One thing that is crucial is that testing and proving are both after the same goal: to make you realize whether what you've written is doing what it should (namely what the spec says!) or not. Using proving techniques is less time consuming in many cases as you can reason about all possible situations in one go, while with unit tests, you test for a subset of situations (the one implemented with the test) which can result in many many tests if there are many different situations the code has to work correctly with. On top of that, unit tests without algorithm proving can still fail for two reasons (algorithm error or code error) instead of one (code error).
How I use this in practice
I'm not mathematician, so I am not using math to prove whether my algorithms are correct. I use pre-/post conditions per step and formal algorithm descriptions to write down the algorithms. I also use per feature a description about what the feature has to do with what input. After all, if there's no description what a feature should do with what input, there's no way to say whether the code written does what it should do, as there's no description of what it should do in the first place.
On a whiteboard I use pre/post condition proving of the algorithm and change it if necessary. If I'm satisfied, I write down the algorithm steps in comments in the code and implement the data-structures required, or look for existing data-structures whether they fit the algorithm. After that I implement each step, either in-place where the comment is placed or in its own method/class.
When I'm done, and the compiler thinks I'm done too, I go back to what I've written and start reading the code again. This is something that should be taken very seriously: humans suck at reading code, so when you read code, it's essential you pay extra attention to what everything does, what the state of things is, etc. It helps to run the code through a debugger if you're not skilled in reading code (developers often 'glance' over code, not read it) to see what it does with each step.
While reading the code, I match the code parts of each step with the algorithm step it implements, check whether pre-/post conditions are indeed taken into account. In short this means: did I implement the algorithm step as intended or did I cut corners? Remember, this verification isn't done to check whether the algorithm works or not! It's meant to see whether the implementation of the algorithm is done correctly. We already proved the algorithm to be correct before we started typing one line of code.
It's important not to get cocky here: it's nice to show off your OO skills and how clever you can intertwine these silly loops, but doing so should never ruin the connection between algorithm description and code itself: it's very important for a person who reads the code to know what bizarre algorithm it implements, so changes can be made accordingly if the algorithm changes after a while.
At this point I have:
- A description of the feature, which is the basis of proving whether the code implementing the feature does what it should
- A series of algorithm descriptions, which are proven to be correct.
- Code which implements the algorithms for the feature
- Data-structures which are needed for the algorithms.
And now I run the code to see what I read and thought to be correct is actually also correct. Think of this as integration testing, as the code is ran in-system, with all the other code. Running the code can mean running a few unit tests, if the feature is a subsystem which is UI-less or which is used later on by other subsystems. The unit tests are focused on what the specification describes as valid inputs (so you also know what invalid inputs are).
"So your code is bug-free?"
In short: not always ;). As all these steps take time, it's sometimes not possible to do every step in the time it requires but has to be done in the time that's available. This leads to some corners being cut for things which seem trivial, but later on turn out to be less trivial. However what I see in our code base is that the parts on average are relatively bug free, and that with a minimum of unit tests. Our runtime test base is only a few thousand unit tests, which is rather small for a code base of about 100K lines. The designer, which is also over 100K lines has only 50 unit tests for core sub systems and has had only 25 bug fixes in the past year. We achieved this by using a solid foundation with proven code, namely our Algorithmia library (open source algorithm and data-structures library).
There are more aspects about making your code bug-free: separation of logic/code so changing one part doesn't affect everything else, clean code so it's easy to understand and maintain, logical placement of functionality, so you can easily find things back and see what affects what as understanding a code base makes making changes easier and less error prone.
There are always a chance that you break something when you make a change. For this, unit tests can help (but are not a way to be 100% certain!): write integration tests. You can design the tests specifically for detecting breaking changes. A lot of our runtime unit tests are designed for just that: fail if a sub-system got changed in such a way it affects the consumers of that sub-system.
I hope to have given some insight in how to write more solid code, namely by starting with a solid root where the code is coming from: what should it do? what algorithms are used and are these correct? Are the algorithms implemented correctly?
Many algorithms have already been proven, so you can pick these up and implement them right away (if someone hasn't done that for you as well). After all, if what you have to implement sucks, your program will never work correctly, even if the code doesn't contain a single programmer error. You can't avoid all mistakes, but it's doable to make the amount of bugs small.