Maintainable software: why you can't live without proper solid documentation

Monday, June 18, 2007

This post is a reply to Jeremy D. Miller's post 'A Train of Thought, June 17, 2007'. It's part of an on going discussion about maintainability of software and what's necessary for having maintainable software. I'm not going to link to every post in the discussion, you can find most of them from Jeremy's post.

Before I continue, I'd like to say that I'm not participating in this discussion to disqualify TDD/Agile as a set of useful methodologies because I do think they have some solid points everyone can benefit from. I'm also not participating in this discussion because I'm a waterfall-follower, because I'm not a waterfall follower. Waterfall is a methodology which could be very beneficial but it has to suit the project. For example, you really want to use waterfall in software for some medical equipment as you don't want to run the risk to miss a spot because you didn't anticipate a particular use-case would be possible. I don't use waterfall myself as I'm not in the medical equipment business and I'm also not a consultant payed by the hour. But more on that later on in the article. The post is build up as replies to things Jeremy said in his post, so the blockquotes are quoted from his post.

The summary comes down to this: Documentation describes the what and the why, code describes the how. You need both documentation and code to have the complete overview, not just the code.

Granted, I've got an almost knee jerk reaction to disagree with Frans on almost anything related to software development, but I'd still prefer a much stronger emphasis on the "what" and "how" a system is put together than I would the design documentation.

And why is it that you want to disagree with me about software development so often, Jeremy? Is it because you think I am a true waterfall-adept and anti-agile/TDD ? Well, I'm not.

Focussing on the what and how is OK, and I'm not saying that solid, clear, easy to understand code isn't more maintainable than a big steaming pile of spagetti-crap, but just focussing on getting solid clear easy to understand code doesn't necessarily bring you a great maintenance experience: if essential information is missing, you're still doomed. Furthermore, it's not to say that proper code is thus the result of TDD/Agile principles. It's just that your experiences show that Agile/TDD gives good easy to understand code. Well, good for you . The thing is though: a team of good software engineers which works like a nicely oiled machinery will very likely create proper code which is easy to understand, despite the methodology used. If all of those software engineers move on to other projects, you could have shared as much knowledge inside the team, but that's not going to be available to the successors unless you provide solid, easy to understand documentation about the code.

I don't really care so much "why" it was written that way, only what it is. And by solidly written code I mean code that I can understand by looking at that readily accepts change.

This is the essential part where you make a trivial, but costly mistake: The why is of up-most importancy. The reason is that because you have to make a change to a piece code, you might be tempted to refactor the code a bit to a form which was rejected earlier because for example of bad side-effects for other parts. If you don't know the why of a given routine, class or structure, you will sooner or later make the mistake to refactor the code so it reflects what wasn't the best option and you'll find that out the hard way, losing precious time you could have avoided.

That's why the why documentation is so important: the documented design decisions: "what were the alternatives? why were these rejected?" This is essential information for maintainability as a maintainer needs that info to properly refactor the code to a form which doesn't fall into a form which was rejected.

The other element of your remark, about understanding code, is showing some lack of understanding why humans are so bad in writing code: you assume you will understand code when you read it. Well, I have news for you, Jeremy: you will not. Not now, not ever. And not only you, but everyone out there who writes code, thus that includes me as well, will not be able to read code and understand it immediately. That's not because you lack experience or knowledge, but because you and I are human. Sure there will be code snippets we will understand in a heartbeat. However there's an essential part of understanding code which is missing in a human body: a code interpreter which can understand why at time T element A has the state S and why at time T+t it has state S'. Only with such a code interpreter you'll understand what code does in full. As a human lacks such an interpreter, we can only try to understand the code and we will very easily make mistakes doing so. That's also why there are tools like Resharper and everybody's friend, the debugger.

Besides, I've never seen a long technical document that was entirely useful. Time and manpower is finite. I'd rather sink more energy and resources into better, cleaner, well-structured code than comprehensive documentation because I think the payoff is higher. To me, one of the biggest advantages of moving from a waterfall shop that produced a lot of intermediate documents to XP shops was that I now get to spend much more time on a project focusing on the design, architecture, and code than I did when I was on the hook for much more documentation. I write fewer documents, but I get to create better code with far better configuration management practices. I call that a net win.

Let me be blunt here: do you hate documentation that much? Do you think having a lot of documentation hurts your project and will make it, oh behold! look like it is written using waterfall? Code isn't documentation, it's code. Code is the purest form of the executable functionality you have to provide as it is the form of the functionality that actually gets executed, however it's not the best form to illustrate why the functionality is constructed in the way it is constructed. I'll get more into that in a second.

Oh, and besides that: just because you haven't seen a technical document which makes sense doesn't mean having them is effectively useless. I've seen technical documents which did make a lot of sense and were essential to understanding what was going on at such a level that making changes was easy.

You see, Jeremy, the thing is that if a set of features has to be added to a project that is in production for a while, you really need overview where to make the changes and in what form. If your project consists of say 400,000 lines of code, it's not a walk in the park to even get a slightest overview where what is located without reading all of those lines if there's no documentation which is of any value. Code is for formulating functionality in an executable form, it's not documentation of any kind. If you think that it is, I really pity the one who has to maintain your code in say, 2 years from now.

As a quick aside, Frans also more or less makes the claim on Sam's blog that TDD doesn't do anything for maintainability, or just wonders what in the world it does do for maintainability.

No I didn't make that claim, you think I made that claim. What I was trying to say was that TDD/Agile is advocated as a set of methodologies which will make your project the best that can be written. However, the elements for properly maintainable software don't require TDD at all! Furthermore, you seem to suggest that TDD/Agile will give better results no matter what, which isn't guaranteed: it depends on the people in your team and a lot of other factors if the results of your software project will be up to par. TDD/Agile can help, but aren't a guarantee. They're also not a ticket for maintainable software.

Orthogonality. Codebase's developed with Test Driven Development will almost always exhibit better qualities in regards to cohesion and coupling, the very same qualities that make code easier and safer to change. I know Frans is going to come back and argue that he gets it done with lots and lots of documentation and very careful upfront design. I'm going to respond to that by saying the instant feedback loop from doing detailed design with TDD pushes me in the direction of orthogonality more efficiently and effectively than any form of upfront design. Why is this true? Because you can't use TDD on code that isn't loosely coupled and not easily on code that isn't highly cohesive.

Ah, so now you also know how I design my software, Jeremy? Don't you agree that what you said above is actually pretty stupid? Especially when I say to you that I've used TDD/Agile style development for the last 5 years now? The code base of LLBLGen Pro alone is massive: the designer gathers meta-data which is fed to a code generator stack executed by tasks in a queue which consume templates which produce code which is a specialization of compiled code in the runtime libraries. Meta data affects generated code affects the total class stack in the project and vice versa. It's pretty complex, if I may say so. Do you think I've designed that all up-front in a waterfall-esk way, spend months and months writing document after document and then started with writing a lot of code? No way! It's vertically developed, use case after use case. Every feature is seen as a use case or set of use cases, depending on the feature, first analysed what the feature embodies, what impact it might have etc. etc., if necessary tests are written up front. Ok, then I'll do a weird thing: I'll open the design document and will write a piece of documentation how the feature is designed and why particular parts are the way they are and why alternatives won't work. After that, I'll go into the code base and write the code for that feature, run the tests I've written before and update documentation if I was wrong what I wrote there.

You see, documentation isn't a separate entity of the code written: it describes in a certain DSL (i.e human readable and understandable language) what the functionality is all about; the code will do so in another DSL (e.g. C#). Thats the essential part: you have to provide functionality in an executable form. Code is such a form, but it's arcane to read and understand for a human (or is your code always 100% bugfree when you've written a routine? I seriously doubt it, no-one is that good), however proper documentation which describes what the code realizes is another. These two aren't separate entities and you can't write the documentation after you're done writing code, as you then will document how the code works. Which is nice, but not enough. You need the why part too.

I'm very very glad that I've written these documentation parts since the beginning of the project back in 2002. You see, I've written the majority of the system and if someone should know how all code works, it would be me, right? Well, perhaps I'm not as talented as you, Jeremy, but I'm not able to remember every design decision I made in detail for this massive code base. Also, if something happens to me, I really want to hand it to someone else so s/he can continue my work. With documentation that's written on the spot, you can. When I need to make a change and need to know why a routine is the way it is, I look up the design document element for that part and check why it is the way it is and which alternatives are rejected and why. After 5 years, your own code also becomes legacy code. Do you still maintain code you've written 2-3 years ago? If so, do you still know why you designed it the way it is designed and also will always avoid to re-consider alternatives you rejected back then because they wouldn't lead to the right solution? Without proper documentation you can't possibly avoid missteps you probably already made before.

The unit tests are a form of documentation. Reading the unit tests for a class should be a great way to learn how to use any given class. I can think of several cases where someone else's unit tests have made it easier to use a class or API.

Unit tests are tests. They test a given piece of functionalty written in code to see if that code indeed represents that functionality. Very valuable feature and an essential part of quality assurance. What's missing is that a unit test isn't documenting anything: code isn't documentation, it's code. It describes the same functionality but in such a different DSL that a human isn't helped by wading through thousands and thousands of unit tests to understand what the api does and why. The why will never be represented by unit tests, the unit tests will only show how in a particular situation you can use a given routine or class. Use tests to see what you think is OK, is actually OK. Use documentation for documenting what you've written in code. Unit tests also don't reveal why the inner workings of methods / classes are the way they are. They just confirm that they work in the particular case the unit test tests for.

Using unit tests for learning purposes or documentation is similar to learning how databases work, what relational theory is, what set theory is etc. by looking at a lot of SQL queries. You will only see a lot of SQL, there's no context, there's no explanation why the statement is written that way and not in another way. Wouldn't you agree that learning how databases work is better done by reading a book about the theory behind databases, relational theory, set theory and why SQL is a set-oriented language? Then why is it so odd that in the case of the theory behind a piece of software you've written, it's OK to fall back on the code which uses it in a limited set of situations?

High levels of unit test coverage gives you so much more ability to change existing code without introducing regression errors. No matter how much upfront analysis and design you try to do, the users will always come with something completely new that you couldn't reasonable anticipate in your initial construction. It's awfully nice to have that immediate safety net of focused unit tests as you make changes to existing code. Documents are passive. Unit tests will shout out when they're broken -- assuming anybody runs them of course. Good unit tests will even tell you exactly where the regression breaks happen.

Unit tests are valuable, there's no disagreement there. However their name already implies that they're not documentation, they're tests. Documentation also isn't passive. It's active, as it describes in another DSL what functionality is implemented and why it is implemented the way it is implemented. If I may, I'd like to describe documentation as a projection result of the functionality to deliver onto human readable and understandable text and code as the projection result of that same functionality to deliver onto machine executable elements.

This implies that if the functionality changes, documentation and code will change, not just the code, simply because the documentation is the projection result of the same source as the code is.

Or are you suggesting that the code you're writing is actually a result of whatever came up in your mind at that time and some test will tell you if that thought was actually acceptable or not? I doubt it. You're a professional, passionate about computer science, however never forget, Jeremy: so am I.

No Comments