How to Shoot Yourself in the Foot with Code Coverage? (Part II)
Honest code coverage
One of the phrases I quite often use is “honest something“. What do I mean by this? First of all I believe that everything what is worth doing should be done with the highest quality possible. Secondly I believe in doing things “by the book“, but this shouldn't be confused with blindly following orders or having very fixed and limited thinking process ;-) The best way do explain this is to give a concrete example:
- “Honest development“. For me this means that before any feature is added to the code base the following general steps happen: 1) after everyone has agreed with the specification the developer thinks about design, writes a design document, and extensively reviews it with his/her peers; 2) writes code, ask his/her peers to review it and steps through every line of it in debugger; 3) makes sure that warnings generated by different static source code analysis tools are fixed; 4) develops a number of unit tests to test his/her code; 5) works with his/her buddy tester to make sure that the feature has received decent amount of testing coverage before it's added to the code base.
- “The other way“. Write the code, make it sure it compiles, spend 5-10 minutes testing it, hope that STE-s will find all the other issues, and add it to the code base.
Everyone can see that using the first approach is much harder - it requires way more engineering discipline, takes more time, and forces the developer to work harder. Microsoft is the exceptional place in the following sense that most of the developers exercise the majority steps given in the first bullet point. I spent 5 years writing code before Microsoft in different software companies and I have to admit that I've only used the second approach during this period ;-) It's not something to be proud of, but at least it gave me a very good understanding how not to do the things.
Here's the trouble I'm having - how to define “honest code coverage“? I reduced the problem to the following - “feature is well-tested and has n percentage of code coverage“. The sad thing is that it's not very easy to determine when something is well-tested. Number of books and PhD thesis’s are written about this problem alone. Yes, the usual things can be down: analyzing the bug trends, looking at the number of test cases (which is IMHO as pointless as measuring LOC to determine programmer's productivity), looking at how much user scenarios are covered, using certain mathematical models to determine how much bugs are left in the code, using the "green" - "yellow" - "red" - "green" or any other variations of unit testing from Extreme Programming methodologies etc. Unfortunately testing in controlled/simulated environments is quite different from product usage IRL. That's why all this fascination with “eating your own dog food“, having early deployments to external customers etc. The best feedback/indication about your product quality is IMHO still the reaction from your customers after you finally decide to ship ;-)
Requiring certain amount of code coverage for every new feature
One thing I've seen other teams succeeding with is requiring certain amount of code coverage for every new feature before adding it to the code base. IMHO this is quite reasonable thing to do. What it also means is that scheduling development and test related work items will be noticeably affected. One needs to add some amount of time to schedule for making sure that developers will have time to write (basic) unit tests, STE-s will have time to (more complicated) write test cases, and both sides will have time to work together and make sure that the right things will happen.
Usually when bug fixes are accepted late in the product cycle they'll go through very rigorous process and one of the things everyone is worried about is that how well the new code will be tested? Some of the teams require that for code changes like this the code coverage information is also given. Initially I thought of this as not being very useful when it comes to the cost vs. benefit, but last months before shipping the Speech Server have changed my mind ;-)
Dealing with non-exercised code paths
The process we currently use in our team is following:
- Once a week we have special instrumented build made which contains information necessary for code coverage tools to do their thing.
- All the automated test cases are run on this instrumented build and specific metrics about code coverage is published. Till this point everything happens automatically and practically no human intervention is required.
- Depending on how much time the STE-s have, some people will run manually a number of test cases. As we all know that manual labor is expensive and therefore we don't do this every week. After completing these steps we achieve something which we declare as our final code coverage numbers.
What happens next is that IC-s start looking at the blocks of source code which aren't covered by the current set of test cases and start giving estimates about how much time it'll take to write code which will exercise those code paths. After that this work is prioritized like any other work items.
At the end of the last year we took some time and went through all paths in source code which weren't exercised by the test automation for our product. Yes, I mean it. Every single function/method which wasn't touched. Every single conditional branch which wasn't executed. I still like to get my hands dirty with “real work“, therefore I did this exercise for two of the binaries in our product. It took me while and at the end I was thinking that I'll go crazy ;-) Finally I had this nice document where per every non-exercised code path I had estimate about how much it'll take to write code to exercise this specific piece of code. It paid off pretty well actually, because every time somebody complained about something related to code coverage, I took this document and said something á la “To exercise this code path we need to do A, B, and C. To do so it'll take us n days. This piece of code will be executed only under conditions D, E, and F. Here's the list of other things this team is currently working on and here are the priorities we're following.” This usually ended the discussion ;-)
The main thing I learned from this experiment for the next milestone/version of the product is that doing this exercise is very useful and though it takes time I think the end result pays off - you know the details of the code what you're not exercising, why you're not exercising it, and what it'll take to do so. One thing to note is that this exercise should be done quite late in the product cycle when the code isn't changed so much anymore.