Correctness provability should be the goal, not testability

Wednesday, November 14, 2007

(edited. s/proof/prove/gc and some sentences which were 'out of wack' have been corrected as well)

Phil Haack today posted an article about Writing Testable Code Is About Managing Complexity. He says:

Of course, when I'm done un-bunching my knickers, I realize that despite all the benefits of unit testable code, the real benefit of testable code is how it helps handle the software development's biggest problem since time immemorial, managing complexity.
There are two ways that testable code helps manage complexity.

1. It directly helps manage complexity assuming that you not only write testable code, but also write the unit tests to go along with them. With decent code coverage, you now have a nice suite of regression tests, which helps manage complexity by alerting you to potential bugs introduced during code maintenance in a large project long before they become a problem in production.

2. It indirectly helps manage complexity because in order to write testable code, you have to employ the principle of separation of concerns to really write testable code.

I find that a little backwards. It's not that due to focusing on testability, you get separation of concerns, but by applying the principle of separation of concerns you get code that's easier to test as a bonus. For people who think that's the same thing, it's not. Getting easy testability as a bonus means that testability is not the goal of why you set up your architecture using the principle of separation of concerns, it's a side-effect; the main goal was to get separation of concerns to be able to prove code is correct.

Scientific research has shown that in the end, separation of concerns is inevitable to get solid software which is correct. (Peri Tarr, Harold Ossher, William Harrison, and Jr. Stanley M. Sutton. N degrees of separation: multi-dimensional separation of concerns. In Proceedings of the 21st international conference on Software engineering, pages 107-119. IEEE Computer Society Press, 1999. (PDF). Who said Computer Science research is out of touch with reality, eh?

When focusing on testability, one can fall into the trap of believing that the tests proof that your code is correct. That's not true: the set of tests prove that your code is correct in the solution space defined by the tests. Take for example this simple routine:

for(int i=0;i<100;i++)
{
	if(SomeMethod())
	{
		// code
	}
	else
	{
		// code
	}
}

This routine has 2¹⁰⁰ code paths. It's nearly impossible to test them all out with unit-tests in normal time. To say that this code is correct, one has to prove the code is correct, a set of tests won't cut it. Code can only be proven to be correct if it is predictable (deterministic). Dijkstra's famous rant against 'goto' statements is mainly based on that: a goto could create unpredictable code and therefore non-provable code.

Proving code correctness
Proving code to be correct isn't easy, but it should be your main focus when writing solid software. Your first step should be to prove that your algorithms are correct. If an algorithm fails to be correct, you can save yourself the trouble typing the executable form of it (the code representing the algorithm) as it will never result in solid correct software.

After you have proven your algorithms are correct, your second step should be to implement the algorithms in executable form by taking extra care in the projection that you are applying while doing it: you should write sourcecode which reflects the algorithm(s) you designed and of which you have proven to be correct. Writing sourcecode is making a projection of an algorithm on an executable form, as the sourcecode is the algorithm, but in another form: the executable form. This is essential knowledge, as you can use that to prove that your code you wrote to represent the algorithm or algorithms is indeed the projection of the algorithm in executable form. Typically this is done by using pre/post conditions for example, checking if the code indeed traverses from state n to state n+1 if event e occurs, etc. One way to do this is for example by writing other code which tests your code's algorithm implementation, however that creates in effect an infinite loop: to prove that your test is correct, you either have to prove that with conventional tools or with yet another test.

By applying the principle of separation of concerns, one will quickly learn that proving code-correctness is easier: algorithms are projected only once, in isolated form so the end result is deterministic and therefore a candidate to be proven to be correct.

There's another benefit of taking these steps and sticking to it no matter what: debugging an application becomes easy. As you've already proven that the algorithms are correct, you only have to confirm that the code you wrote is indeed the the algorithm in executable form. That doesn't even need a debugger at all, you can do that in the editor, whiteboard, sheet of paper or whatever rocks your boat. If it becomes complex, you can revert to the debugger but you know what to look for: you know what to expect, as you know the algorithm.

Testing your code can be a way to prove that code obeys a given set of standards, does things as expected. It's a tool to be used to reach the goal called correct software, it's not a goal on its own.

14 Comments

> When focusing on testability, one can fall into the trap of believing that the tests proof that your code is correct. That's not true: the set of tests proof that your code is correct in the solution space defined by the tests.

Finally, someone that understands logic.

I wish more people understood this.

foobar - Wednesday, November 14, 2007 3:06:58 PM

Minor nitpick - and I know that English isn't your first language, so I really hate to do this... The way you are using "proof" as a verb has to do with making bread rise or "proofing" something against water (for example); the verb you want is "to prove"... Sorry, I'll go back to reading the content and not the semantics :-)

Mike - Wednesday, November 14, 2007 4:49:09 PM

Mike: thanks for that, I'll update the text. I think that I have the title correct, but for the rest I'll check if I can find any prove/proof mistakes ;)

FransBouma - Wednesday, November 14, 2007 5:25:13 PM

Yes I made the mistake OFTEN :) I was convinced that it was 'to proof', so I wrote that everywhere. From now on I'll use 'to prove'. :)

This community server system is too cumbersome to update all the posts, so you'll have to live with the 'kromme tenen' the mistakes'll give you ;)

FransBouma - Wednesday, November 14, 2007 9:51:01 PM

So... are you saying we shouldn't write test code, or that we should rely on it? How do approach this problem in your own code?

Thanks.

PBZ - Wednesday, November 14, 2007 9:54:16 PM

"So... are you saying we shouldn't write test code, or that we should rely on it?"
I am saying that you shouldn't see tests as the proof that your software is correct, and that testability isn't a goal, it's a side effect of separation of concerns.

You should focus on proving that your code is correct, which CAN be done by testing your algorithm implementations with tests, but these tests then have to be proven to be correct, which is often ignored. You can also prove that a piece of code is correct without writing 1 line of testcode. It's a myth that you need automated tests to get correct code.

So in short: automated tests can show you you broke something which did work before. So they can prove your code ISN'T correct. They can't prove your code IS correct. That's a big difference and often overlooked.

FransBouma - Wednesday, November 14, 2007 10:04:57 PM

I agree, I've seen lately ugly code caused for trying too hard to make it testable

Eber Irigoyen - Wednesday, November 14, 2007 10:25:07 PM

[Dijkstra's famous rant against 'goto' statements is mainly based on that: a goto could create unpredictable code and therefore non-provable code.]

Er... no. Goto is totally deterministic, just like every other statement. Dijkstra was worried about complexity, especially when it came to humans trying to comprehend the code, not with determinism.

As for proving one's algorithms... yea, absolutely. Right after solving the comparatively minor problem of the Sun running out of "steam" in a few billion years.

Finally: [One way to do this is for example by writing other code which tests your code's algorithm implementation, however that creates in effect an infinite loop: to prove that your test is correct, you either have to prove that with conventional tools or with yet another test.]

Hmm. The same thing - with the same flaws - can be said about proving that the algorithm works. You have to prove that the proof is correct. Yuck.

In practice, things are much weirder: how do you prove that a test that says

Assert(2, MyClass.Add(1, 1));

is correct? What does that even mean? It compiles? Hell yeah. It matches its specifications? There are none for a test. It says what I meant? Obviously.

Hmm. I don't think I liked this article that much :)

Marcel Popescu - Thursday, November 15, 2007 7:35:09 PM

Sameer: I know there was a response, but I decided  not to continue this discussion as the basic idea of proving the correctness of an algorithm is apparently not accepted as something perfectly doable. (it's just logic). I mean: when someone wants to sort elements in a list, one picks a sort algorithm. It's easy to prove these are correct or not.

So the discussion then becomes a silly exercise where the 'confidence' meter of some person is apparently the thing to use when one decides if software is correctly doing what it should.

With a simple example it's shown that what Phil said is pretty meaningless in this regard. Take the following:

- you are given an object. The object has references to other objects, which have references to other objects, via collections or directly etc. etc.

- you are given the task to put the objects in the right order so when the data inside the objects is saved to the DB, no FK constraints are violated.

Say you write code for that: you think about how you would do this, cook up some algorithm and write code for that. You write 10,000 unittests to see if you're done or if it needs more work. They all pass.

Then a customer calls, who says: "I get an FK violation". The customer shows you a complex graph of objects and indeed, when that graph is used with your code in the 10001st unittest, it fails.

Edge case? Who knows, there are unlimited graphs to think of, so there are in theory unlimited unittests necessary to see if your code is correct.

However, what does that failing 10001st unittest show you?

1) Your algorithm sucked, but your implementation of it was correct

2) Your algorithm is OK, but your implementation of it sucked

3) All of the above

You can't possibly FIX this, unless you know your code is failing because of 2) and not because of 1).

So you have to know if your algorithm is correct. Now, if you would have implemented a proven algorithm like Topologically sorting of graphs, you would know your code is failing because of an implementation error and you should review the code again. Topologically sorting is provable by simple logic. As with a lot of other algorithms, in fact a LOT of algorithms are already proven to be correct for you, and the last couple of years a lot of research has been done to prove algorithms.

What I was trying to say in my article above, is that if you don't know if your algorithm is correct, you can forget the rest. With the example given above, I have proven that to be true, as you can't say what's wrong with your code when the unittest fails if you don't know if your algorithm is the cause or your implementation of it.

So do I then have to continue this discussion with someone who is apparently deeply convinced that what people do in the CS field is apparently not really interesting, all what matters is confidence based on a lot of green lights in the unittesting tool? I have respect for Phil Haack, but I think we have incredibly different views on the matter that discussing this further isn't that useful.
 Trackbacks are disabled because trackback spam to .info sites became too much of a burden.
 

FransBouma - Monday, November 19, 2007 2:13:27 PM

"Er... no. Goto is totally deterministic, just like every other statement. Dijkstra was worried about complexity, especially when it came to humans trying to comprehend the code, not with determinism."

Please re-read Dijkstra's paper:
http://www.cs.uakron.edu/~collard/cs430/dijkstra68goto.pdf
In there, he argues that a program execution is traversing a graph. This means that it's predicatable where you end up, by just following the graph. However with goto that's not the case.

FransBouma - Monday, November 19, 2007 2:21:48 PM

"Interesting discussion although I do wonder what your point is. Are you talking about some theoretical goal to strive for in the future? I mean proving code-correctness in all but some trivial examples is just not feasible. You agree on that don't you?"
I'm not posting articles about open-door snake-oil, Stephan.

"Really? How do you suggest I go about doing that? Let's say I've got a modest program with 3000 lines of code using some third party libraries. Do you have any idea what kind of work would be involved in proving that to be correct? We're not just talking elbow-grease here. We're talking a team of people working fulltime for a looooong time. All for a program you should be able to write by yourself in a couple of weeks. Would your customers pay for that?"
I hope you've read what I wrote. If you think proving algorithms is undoable, we're done debating the point. It's not about the code, it's about what the code represents: after all the code is an executable form of an algorithm or set of algorithms: i.e. it's a result of a projection, not something which is the origin of a projection, but a result of it.

Let me reverse the question: you're asked to produce a given set of functionality in executable form. You write some code and apparently something triggers you to say: "it's done, it works, lets ship this to the customer".

Unless that 'something' is actual proof that it works, you are shipping something to the customer which can fail miserably in some occasion.

A system which can receive an infinite amount of possible inputs can and WILL fail if you haven't proven that the system can handle these infinite amount of inputs. It's not hard, although it's necessary.

Take our O/R mapper system. Do you think we simply ship it because it passes our unittests? Or do you think we ship it because it passes our code reviews of implementations of our proven algorithms AND it passes our integration tests? Mind you: our system has to deal with every possible input, and there's no room for failure: if our system fails, the application build with it fails!

""Your first step should be to prove that your algorithms are correct."

Good we've now proven that maybe 1% of our program is correct. Maybe we've even gotten our projection correct. (We've been really really careful so it's still correct right? riiiiight)"
I don't think you understood what I meant with proving algorithms. If you think proving the algorithms of your software means you've proven 1% of your software to be correct, you have no clue what programming is all about. Hint: it has nothing to do with typing code.

"What about the rest. What about the code to render the ui, reset your teatimer and open the harddrive tray? I don't think they would qualify as alghorithms in any realistic sense, yet they have a huge potential for bugs."
You think in 'code', but that's not what you should do. 'Code' is the end result, it theoretical basis is somewhere else. If you give me a piece of code you wrote, it effectively has NO MEANING. What's more important is the theory behind that code: why did you write it the way you wrote it? If you're a professional, you'll answer: "because the design dictated that", which comes down to the algorithms designed dictated your code that way. If you're an amateur, you'll likely mumble something about wind direction, the temperature of your coffee etc.

"I think unit tests are a way to aleviate the uncertainty and unpredictability of the rest. At least that way you can add some constraints that 'll make sure you don't repeat past mistakes. It's nowhere near a proof that your code is correct but I honestly don't know anyone who really thinks that."
You firmly believe that anyone who unittests is doing that solely to prevent future mistakes? Or do you think they unittest to be sure what they wrote works?

Did you read the post I wrote above? What if I have a failing unittest? What does that mean, really? You can't answer that if you don't know that the algorithm is correct or not.

Next time you ship something to a customer, think about WHY you made the decision to ship it at that moment and not a week before that. Was it because all unittests succeeded? (which shouldn't be a guideline according to you because succeeding tests don't prove correctness, right?) or was it because some 'confidence' or other 'believe-driven-development' hogwash rose above a given threshold? ;)

FransBouma - Tuesday, November 20, 2007 11:01:50 AM

So in short: you're saying that most of the code in an application isn't about an algorithm and therefore proving it to be correct is pretty useless, as it makes up a very small part of the code, and therefore proving these to be correct is time consuming and the rest of the code isn't helped with it anyway?

I disagree. I find it essential that every random picked line of code has to have a theoretical basis, otherwise it can be removed because it's unnecessary.

Mind you: even some screen glue code might look dumb and boring, but it is code which represents an algorithm, because it was written in THAT order and not in a random other order. That must have a reason, and that reason is the important thing here.

I've proven the majority of the runtime code by hand indeed. It's not rocketscience. First of all, pre/post conditions are essential. Then you've to use algorithms which are already proven to be correct: this increases quality as the algorithm and its scope is known, so you can just implement it.

Our experience with the 300,000 lines of code of LLBLGen Pro is that almost all bugs in the code are in projection errors, i.e.: projecting the algorithm onto code. So i.o.w.: the algorithm wasn't implemented correctly.

There's unfortunately not a real automated scenario possible at the moment for checking projections onto code, (which basicly comes down to compiling the algorithm in a DSL to C#), so there will be mistakes in this, as a human has to read the code again and do the checking.

The main point is that you seem to argue that there's no real solid evidence what the code should do in the first place. I find that a little strange. I never write code without knowing exactly what it should do, how else can I write the code? I mean: the code is the representation of functionality in executable form, resulting from algorithms projected onto code, where the algorithms are the design how the functionality should work in executable form. That thus means that you can't have code which doesn't have a reason to be there.

Take a customer-order edit screen and its features. Algorithm: read customer, read orders of customer, display customer fields, display orders, let user select order -> display selected order -> enter edit loop -> exit edit loop -> persist changes (== other algorithm). After you know THAT, you can write it in code.

If you think that's code without an algorithm, how on earth can you possibly create such a screen which works properly?

Perhaps you find the software you ship to be good enough if it passes a set of tests and the customer likes the blue color of the screens. However it might be that the software has to produce results which are always correct, and the results are based on data which is entered into the system or calculated based on input, you know the core of your app.

You don't prove that kind of code? You 'trust' yourself that the code you wrote is correct? I don't think so.

As we're all human, we indeed need triggers which go off when we make a mistake. But that should be their only reason of existence.

What annoys me is that by saying people should strive for provability of software, it is apparently a bad thing to say, because it's impossible, or counterproductive etc. however at the same time, the alternatives are simply not there: according to test-oriented developers, what IS there to make the quality of software better (i.e. it works as expected)? More tests? How many?

Unclear. So people who want to improve the quality of their software are left in the dark with nothing that helps. As if writing software is something completely not scientific and equal to producing a work of art.

FransBouma - Tuesday, November 20, 2007 3:16:55 PM

There's a couple of things that puzzle me here.

One is, if you've proven that your codebase is correct, why would you need unit-tests then. If you've proven that for all inputs it produces correct outputs, feeding it some inputs and checking the corresponding output would seem to be an exercise in futility. I guess you need them for your projection which you do by hand again? Isn't that what a lot of bugs are? Errors in translating the recipe in your head to code on screen?

Secondly I would love all this theoretical stuff translated into a real-world scenario. You say it's not that hard. Prove it!(pun intended:)).

I've done some of this stuff in college and to produce a mathematical correct proof of a simple 4 line loop we needed more that 2 pages of pager. You can understand my skepticism...

You say you've done this for a codebase of 300000 so obviously we're not talking about the same thing. I know I know, you're proving the original algorith, but was does that mean eventually. You say pre and postconditions are important. With a sorting algorithm I can say, pre: the list may or may not be sorted, Post: The list is sorted. Fine, that's seems very provable.(although by no means easy) But what happens on our order form where you have al sorts of buttons the user can push. Do I make seperate pre and postconditions for each button press? It's starting to sound like a unit test...:)

"Take a customer-order edit screen and its features. Algorithm: read customer, read orders of customer, display customer fields, display orders, let user select order -> display selected order -> enter edit loop -> exit edit loop -> persist changes (== other algorithm). After you know THAT, you can write it in code."

Yes let's take this screen. It seems like a very common scenario, and exactly what i'm talking about. What you've listed here is not an algorithm. Its a program flow. One of many many possible flows. (the user might cancel, enter strange values, open another screen etc. etc.)
I would very much (and I really mean this) like to see what your concept of a proof of it's correctness would look like. Perhaps we're using different definitions for proof. To clarify, I'm talking about an actual mathematical proof. One that is built upon axioms and one that doesn't reduce this complex real-world scenario to an empty shell.

"Perhaps you find the software you ship to be good enough if it passes a set of tests and the customer likes the blue color of the screens. However it might be that the software has to produce results which are always correct, and the results are based on data which is entered into the system or calculated based on input, you know the core of your app.

You don't prove that kind of code? You 'trust' yourself that the code you wrote is correct? I don't think so. "

No I don't, I believe that anyone who thinks their code is 'correct' is fundamentally wrong.
And again, except for some very specialized banking applications and maybe a nasa app, I don't know of any other software that has been proven to be correct. If all this stuff is not rocket science you'd think they would have done it for the mach kernel of linux. They haven't, so I guess they're just lazy, and only 'think in code'. Ps. My screens are never blue, my customers prefer yellow....

One final point, you seem to think I'm a huge test driven design proponent. I'm not, I think of unit-tests as one of a number of tools programmers can use to increase the quality of the products.

And no, building software in the real world is not entirely scientific. I would like it to be more like that, but this is an area of constant ongoing research. Frankly coming from someone who seems to hold an active interest in the science of our field it's kind of puzzling you treat this as a solved problem.

stephan - Tuesday, November 20, 2007 4:06:40 PM

(Ok one more then... :))

Stephan: Ok, I agree: code correctness (not algorithms) via formal analysis of that code is indeed not what I meant, that's why I described the steps to take. You can use Hoare logic for example, which can be formal but the idea is that you use logic with pre-/post-conditions which is something you have to do anyway for code, otherwise writing the code is an undoable exercise

The idea is that if you prove your algorithm to be correct, and verify that the code written is representing that algorithm, the code is then also correct. If not, the algorithm isn't correct. That's the idea. The weakest link is indeed the projection of the algorithm onto code.

However if you don't prove your algorithms, you have nothing to debug if a flaw is found in the code. If you HAVE proven the algorithms, you can debug the code as you know what to look for: the code apparently has a projection error.

In my interpretation of the paper, I concluded that it is indeed about making it impossible to understand why you arrive at the goto label and therefore not possible to determine if the code is correct or not. As this was several years ago, I should have looked it up before adding the sentence to the article, you're completely right in that case.

FransBouma - Wednesday, November 21, 2007 10:17:34 AM

Comments have been disabled for this content.