Maintainable software: why you can't live without proper solid documentation
This post is a reply to
Jeremy D. Miller's post 'A Train of Thought, June 17, 2007'. It's part of an on going discussion about
maintainability of software and what's necessary for having
maintainable software. I'm not going to link to every post
in the discussion, you can find most of them from Jeremy's
post.
Before I continue, I'd like to say that I'm not
participating in this discussion to disqualify TDD/Agile as
a set of useful methodologies because I do think they have
some solid points everyone can benefit from. I'm also not
participating in this discussion because I'm a
waterfall-follower, because I'm not a waterfall follower.
Waterfall is a methodology which could be very beneficial
but it has to suit the project. For example, you really want
to use waterfall in software for some medical equipment as
you don't want to run the risk to miss a spot because you
didn't anticipate a particular use-case would be possible. I
don't use waterfall myself as I'm not in the medical
equipment business and I'm also not a consultant payed by
the hour. But more on that later on in the article. The post
is build up as replies to things Jeremy said in his post, so
the blockquotes are quoted from his post.
The summary comes down to this: Documentation describes the
what and the why, code describes the
how. You need both documentation and code to have the
complete overview, not just the code.
Granted, I've got an almost knee jerk reaction to disagree with Frans on almost anything related to software development, but I'd still prefer a much stronger emphasis on the "what" and "how" a system is put together than I would the design documentation.
And why is it that you want to disagree with me about
software development so often, Jeremy? Is it because you
think I am a true waterfall-adept and anti-agile/TDD
? Well, I'm not.
Focussing on the what and how is OK, and I'm
not saying that solid, clear, easy to understand code isn't
more maintainable than a big steaming pile of spagetti-crap,
but just focussing on getting solid clear easy to understand
code doesn't necessarily bring you a great maintenance
experience: if essential information is missing, you're
still doomed. Furthermore, it's not to say that proper code
is thus the result of TDD/Agile principles. It's just
that your experiences show that Agile/TDD gives good easy to
understand code. Well, good for you
. The thing is though: a team of good software engineers
which works like a nicely oiled machinery will very likely
create proper code which is easy to understand, despite the
methodology used. If all of those software engineers move on
to other projects, you could have shared as much knowledge
inside the team, but that's not going to be available to the
successors unless you provide solid, easy to understand
documentation about the code.
I don't really care so much "why" it was written that way, only what it is. And by solidly written code I mean code that I can understand by looking at that readily accepts change.
This is the essential part where you make a trivial, but
costly mistake: The why is of up-most importancy. The
reason is that because you have to make a change to a piece
code, you might be tempted to refactor the code a bit to a
form which was rejected earlier because for example of bad
side-effects for other parts. If you don't know the
why of a given routine, class or structure, you
will sooner or later make the mistake to refactor the
code so it reflects what wasn't the best option and you'll
find that out the hard way, losing precious time you could
have avoided.
That's why the why documentation is so important: the
documented design decisions: "what were the alternatives?
why were these rejected?" This is essential information for
maintainability as a maintainer needs that info to properly
refactor the code to a form which doesn't fall into a form
which was rejected.
The other element of your remark, about understanding code,
is showing some lack of understanding why humans are so bad
in writing code: you assume you will understand code when
you read it. Well, I have news for you, Jeremy: you will
not. Not now, not ever. And not only you, but everyone out
there who writes code, thus that includes me as well, will
not be able to read code and understand it
immediately. That's not because you lack experience or
knowledge, but because you and I are human. Sure there will
be code snippets we will understand in a heartbeat. However
there's an essential part of understanding code which is
missing in a human body: a code interpreter which can
understand why at time T element A has the state S and why
at time T+t it has state S'. Only with such a code
interpreter you'll understand what code does in full. As a
human lacks such an interpreter, we can only try to
understand the code and we will very easily make mistakes
doing so. That's also why there are tools like Resharper and
everybody's friend, the debugger.
Besides, I've never seen a long technical document that was entirely useful. Time and manpower is finite. I'd rather sink more energy and resources into better, cleaner, well-structured code than comprehensive documentation because I think the payoff is higher. To me, one of the biggest advantages of moving from a waterfall shop that produced a lot of intermediate documents to XP shops was that I now get to spend much more time on a project focusing on the design, architecture, and code than I did when I was on the hook for much more documentation. I write fewer documents, but I get to create better code with far better configuration management practices. I call that a net win.
Let me be blunt here: do you hate documentation that much?
Do you think having a lot of documentation hurts your
project and will make it, oh behold! look like it is written
using waterfall? Code isn't documentation, it's code. Code
is the purest form of the executable functionality you have
to provide as it is the form of the functionality that
actually gets executed, however it's not the best form to
illustrate why the functionality is constructed in the way
it is constructed. I'll get more into that in a second.
Oh, and besides that: just because you haven't seen a
technical document which makes sense doesn't mean having
them is effectively useless. I've seen technical documents
which did make a lot of sense and were essential to
understanding what was going on at such a level that making
changes was easy.
You see, Jeremy, the thing is that if a set of features has
to be added to a project that is in production for a while,
you really need overview where to make the changes
and in what form. If your project consists of say 400,000
lines of code, it's not a walk in the park to even get a
slightest overview where what is located without reading all
of those lines if there's no documentation which is
of any value. Code is for formulating functionality in an
executable form, it's not documentation of any kind. If you
think that it is, I really pity the one who has to maintain
your code in say, 2 years from now.
As a quick aside, Frans also more or less makes the claim on Sam's blog that TDD doesn't do anything for maintainability, or just wonders what in the world it does do for maintainability.
No I didn't make that claim, you think I made that claim. What I was trying to say was that TDD/Agile is advocated as a set of methodologies which will make your project the best that can be written. However, the elements for properly maintainable software don't require TDD at all! Furthermore, you seem to suggest that TDD/Agile will give better results no matter what, which isn't guaranteed: it depends on the people in your team and a lot of other factors if the results of your software project will be up to par. TDD/Agile can help, but aren't a guarantee. They're also not a ticket for maintainable software.
Orthogonality. Codebase's developed with Test Driven Development will almost always exhibit better qualities in regards to cohesion and coupling, the very same qualities that make code easier and safer to change. I know Frans is going to come back and argue that he gets it done with lots and lots of documentation and very careful upfront design. I'm going to respond to that by saying the instant feedback loop from doing detailed design with TDD pushes me in the direction of orthogonality more efficiently and effectively than any form of upfront design. Why is this true? Because you can't use TDD on code that isn't loosely coupled and not easily on code that isn't highly cohesive.
Ah, so now you also know how I design my software, Jeremy?
Don't you agree that what you said above is actually pretty
stupid? Especially when I say to you that I've used
TDD/Agile style development for the last 5 years now? The
code base of
LLBLGen Pro
alone is massive: the designer gathers meta-data which is
fed to a code generator stack executed by tasks in a queue
which consume templates which produce code which is a
specialization of compiled code in the runtime libraries.
Meta data affects generated code affects the total class
stack in the project and vice versa. It's pretty complex, if
I may say so. Do you think I've designed that all up-front
in a waterfall-esk way, spend months and months writing
document after document and then started with writing a lot
of code? No way! It's vertically developed, use case after
use case. Every feature is seen as a use case or set of use
cases, depending on the feature, first analysed what the
feature embodies, what impact it might have etc. etc., if
necessary tests are written up front. Ok, then I'll do a
weird thing: I'll open the design document and will write a
piece of documentation how the feature is designed and why
particular parts are the way they are and why alternatives
won't work. After that, I'll go into the code base
and write the code for that feature, run the tests I've
written before and update documentation if I was wrong what
I wrote there.
You see, documentation isn't a separate entity of the code
written: it describes in a certain DSL (i.e human readable
and understandable language) what the functionality is all
about; the code will do so in another DSL (e.g. C#). Thats
the essential part: you have to provide functionality in an
executable form. Code is such a form, but it's arcane
to read and understand for a human (or is your code always
100% bugfree when you've written a routine? I seriously
doubt it, no-one is that good), however proper documentation
which describes what the code realizes is
another. These two aren't separate entities and you
can't write the documentation after you're done writing
code, as you then will document how the code works.
Which is nice, but not enough. You need the why part
too.
I'm very very glad that I've written these documentation
parts since the beginning of the project back in 2002. You
see, I've written the majority of the system and if someone
should know how all code works, it would be me, right? Well,
perhaps I'm not as talented as you, Jeremy, but I'm not able
to remember every design decision I made in detail for this
massive code base. Also, if something happens to me, I
really want to hand it to someone else so s/he can continue
my work. With documentation that's written on the spot, you
can. When I need to make a change and need to know why a
routine is the way it is, I look up the design document
element for that part and check why it is the way it is and
which alternatives are rejected and why. After 5 years, your
own code also becomes legacy code. Do you still maintain
code you've written 2-3 years ago? If so, do you still know
why you designed it the way it is designed and also will
always avoid to re-consider alternatives you rejected back
then because they wouldn't lead to the right solution?
Without proper documentation you can't possibly avoid
missteps you probably already made before.
The unit tests are a form of documentation. Reading the unit tests for a class should be a great way to learn how to use any given class. I can think of several cases where someone else's unit tests have made it easier to use a class or API.
Unit tests are tests. They test a given piece of
functionalty written in code to see if that code indeed
represents that functionality. Very valuable feature and an
essential part of quality assurance. What's missing is that
a unit test isn't documenting anything: code isn't
documentation, it's code. It describes the same
functionality but in such a different DSL that a human isn't
helped by wading through thousands and thousands of unit
tests to understand what the api does and why. The
why will never be represented by unit tests, the unit
tests will only show how in
a particular situation you can use a given routine or
class. Use tests to see what you think is OK,
is actually OK. Use documentation for documenting
what you've written in code. Unit tests also don't reveal
why the inner workings of methods / classes are the way they
are. They just confirm that they work in the particular case
the unit test tests for.
Using unit tests for learning purposes or documentation is
similar to learning how databases work, what relational
theory is, what set theory is etc. by looking at a lot of
SQL queries. You will only see a lot of SQL, there's no
context, there's no explanation why the statement is written
that way and not in another way. Wouldn't you agree that
learning how databases work is better done by reading a book
about the theory behind databases, relational theory, set
theory and why SQL is a set-oriented language? Then why is
it so odd that in the case of the theory behind a piece of
software you've written, it's OK to fall back on the code
which uses it in a limited set of situations?
High levels of unit test coverage gives you so much more ability to change existing code without introducing regression errors. No matter how much upfront analysis and design you try to do, the users will always come with something completely new that you couldn't reasonable anticipate in your initial construction. It's awfully nice to have that immediate safety net of focused unit tests as you make changes to existing code. Documents are passive. Unit tests will shout out when they're broken -- assuming anybody runs them of course. Good unit tests will even tell you exactly where the regression breaks happen.
Unit tests are valuable, there's no disagreement there.
However their name already implies that they're not
documentation, they're tests. Documentation also isn't
passive. It's active, as it describes in another DSL
what functionality is implemented and why it
is implemented the way it is implemented. If I may, I'd like
to describe documentation as a
projection result of the functionality to deliver onto
human readable and understandable text and code as the
projection result of that same functionality to deliver
onto machine executable elements.
This implies that if the functionality changes,
documentation and code will change, not just the
code, simply because the documentation is the projection
result of the same source as the code is.
Or are you suggesting that the code you're writing is
actually a result of whatever came up in your mind at that
time and some test will tell you if that thought was
actually acceptable or not? I doubt it. You're a
professional, passionate about computer science, however
never forget, Jeremy: so am I.