Gunnar Kudrjavets

Paranoia is a virtue

March 2004 - Posts

Punch it in the nose - if it stops bleeding, punch it harder

This quote is actually from one of the internal presentations about the solid software design and quality. Personally for me it describes exactly the attitude I’m looking for in people working with me - never be happy, never be satisfied, there are always ways to break the product ;-) When you’re working on a server product then most of the things we do are about the quality. In server world quality is everything. Nobody would want to use a database server which needs to be restarted every 6 hours; nobody would like to bet success of their business on a telephony server which stops accepting calls during the peak hours etc. Therefore having people in your team who care about the quality, who are doing everything in the name of quality, and who use product quality as a primary factor while making their decisions, is extremely important.

It's all about the attitude

This is correct, IMHO it's mainly the attitude what determines which people will be successful in their positions are which ones won't. One of my peers once mentioned that most of our top performers seem to be constantly dissatisfied because they always have ideas about how to do things better, how to be more productive, how to find new categories of bugs, how to write better code etc. After thinking about this I tend to agree with him. Yes, you heard me correctly - I like “unhappy people” ;-) because IMHO they are the ones who are responsible for the progress. I usually tend to say that we all should celebrate the fact that for example Thomas Alva Edison wasn't satisfied with the way things were, otherwise we would still read books using the candle light ;-)

STE with the right attitude?

Let me just give you an example how IMHO good STE-s should think. To be crystal clear, this is my personal opinion:

  • ...
  • Hmm, what's this? New design document for our cache manager. Cool. Let’s see ... this design can be done much better. Why can’t we use separate heap to keep all the tree nodes together? Also, these two methods shouldn’t be public.
  • Bingo, I see new check-in e-mail, apparently some fresh code got checked in. I’ll go and read through every line of new code, I’ll check that every error condition is handled, there are no memory leaks, no obsolete API-s are used, I’ll run all the static source code analysis tools known to me on this code, I’ll step through every line under the debugger and see what happens.
  • Today is my lucky day - I have some time to play with the debug build. Let’s see if any assertions will pop up, let’s see what’s written in the debug output. And if I’ll have more time later then I’ll install debug version of operating system and I'll see what happens during running our product.
  • I finally get it - AppVerifier is my best friend! Therefore I'll run very often applications under AppVerifier and I’ll investigate every little problem I see.
  • Fresh piece of documentation. Let’s see, first I look if there aren’t any broken links. Then I’ll run the content through the spell-checker, and then I’ll spend some time thoroughly reading the documentation and reporting every mistake.
  • New GUI. How wonderful! I’ll go and see if anything is against the “Official Guidelines for User Interface Developers and Designers”. Any misaligned button, any font with the wrong size, any deviation from the guidelines and I’ll open a bug about it.
  • This new API call I have to test looks pretty cool! It takes name of the file as a parameter and look, here's one writable buffer. OK, where are my favorite books: “How to Break Software”, “How to Break Software Security”, and “Writing Secure Code”? I’ll apply every attack possible. I'll overflow every buffer and every integer. I'll just break the stuff!
  • ...

To summarize this: intellectual destructivity sounds like a right term to use ;-) One thing to watch out for is not to strive for the perfection. For example: taking 2 days and opening bug per every comma in documentation is a good thing to do, but can we use this time for something more important? Taking an API and generating 12,000 test cases for all the possible combinations of parameters and starting implementing them is cool when we would have an infinite amount of resources and time, IRL we unfortunately can't do this. Spending a day from development schedule to make sure that all member variables are prefixed with m_ will stylistically look good, but can this time be better used for fixing some customer issues? You get the point - always second-guess yourself, always prioritize, always use common sense.

SDE or SDE/T with the right attitude?

Here it gets tricky - after finishing the draft of this post I ended up with two pages of text about this paragraph alone. As this post is already long enough and writing boring and long essays isn't the thing I want to do, I'll morph this paragraph into separate post by EOW.

Success story

About 7-8 months ago we established a zero tolerance policy when it comes to the test case failures. The logic we used was very simple: when test case fails then it’s either a bug in the product or a bug in the code which exercises the product or problem with hardware or underlying infrastructure. In all the cases there needs to be bug opened in our internal bug tracking system. All the failures must be investigated. Every one of them. Every day. Every time. Half of the people reading this are probably laughing now and thinking - “This is just so basic, why you even need to worry about this?“ Sounds easy? Actually it's a little bit more complicated. Number of test cases is usually in thousands, all of them are executed on daily basis using all the possible editions of our product and on all the operating systems we support. Sometimes lots of them start to fail and you need to know ASAP what exactly caused these failures?

Anyhow, after lots of pain (which I won't describe) we finally did it. It took enormous amount of engineering discipline, but it paid off very well. We found a number of bugs in:

  • Different components and libraries we rely on.
  • Development and test tools we use.
  • Our test cases - some cases were outdated, some were not applicable anymore, and some just contained bugs.
  • Our product of course ;-)

Simplified form of our knowledge base may look like this:

Failing test case

Bug number

Do A and then expect B to happen.

12345

Do C and then expect D to happen.

67890

...

...

Simplified test cases vs. bugs table

As a result we increased a reputation of our team noticeably (step closer to famous Black Team ;-)). After the moment we achieved a solid base data then every day is just investigating the differences in results from previous day. I and number of my peers hopefully sleep better because every day we know exactly where the product quality stands and if there are any problems then we know exactly what they are. There were a number of similar “little things” we did during the last year both in development and test organizations which took us step closer to being a solid server team.

Engineering ethics

If you have looked at Microsoft titles then you notice the word 'engineer' in them. We have for example titles like 'Distinguished Engineer', 'Software Design Engineer', 'Software Test Engineer' etc. One of the first things I usually try to explain to new people is that computer science and software engineering are two different things and what we do here is engineering. I won't go into the details why I think so, instead I'll point you to Steve McConnell's book “Professional Software Development (2nd Edition of After the Gold Rush)”. Fortunately Steve McConnell has the chapter called “Software Engineering, Not Computer Science” online available for reading. It begins with one of my favorite quotes from Fred Brooks:

A scientist builds in order to learn; an engineer learns in order to build.

Internally we have number of very useful software engineering guidelines available which describe the best practices and recommended efficient ways how to get the things done. I constantly refer to them whenever I have argument with someone about how to accomplish some particular task ;-) But back to the ethics part. ACM has something called "Software Engineering Code of Ethics and Professional Practice". I would strongly recommend for everyone to read through this at least once. We all assume that when our car comes back from the maintenance then people responsible for maintenance checked that the all the bolts in wheels are tight, brakes are working, oil level is normal etc. Same thing applies IMHO to the software engineering – one should do as much as possible to make sure that customers won’t suffer data losses (day worth of work shouldn’t disappear after your text editor crashes), buffer overflow which wasn’t fixed because of the lack of development resources enables cracker to gain access to customer data, deadlock situation which wasn’t investigated causes some other problems etc. Number of people have told me that I'm being very unrealistic assuming that all the developers want to follow the Code IRL. True, but then again, nobody ever promised that it's easy to achieve the quality.

Posted: Mar 31 2004, 11:19 PM by gunnarku | with 2 comment(s)
Filed under:
Would you be willing to use speech based authentication IRL?

One of the frequent questions we’re getting from our customers is what we plan to do about biometrics and speech based authentication in Speech Server? Particularly people are interested in telephony scenarios á la:

...
System: Please state your name.
Caller: Gunnar Kudrjavets.
System: (doing some magic) You’re authenticated. What do you want to do?
Caller: Savings account balance, please.
...

Speech is very different from traditional biometrics based security measures: face, fingerprint, and iris. I’m just curious to know, would you trust something like this? Yes, I know that it depends on person, but hopefully if enough people will respond, I can make some useful conclusions from this ;-) Google will point you to lots of good links related to the current state of speech based authentication, error percentages, research articles etc.

Posted: Mar 31 2004, 12:36 AM by gunnarku | with 8 comment(s)
Filed under:
How to Shoot Yourself in the Foot with Code Coverage? (Part II)
Honest code coverage

One of the phrases I quite often use is “honest something“. What do I mean by this? First of all I believe that everything what is worth doing should be done with the highest quality possible. Secondly I believe in doing things “by the book“, but this shouldn't be confused with blindly following orders or having very fixed and limited thinking process ;-) The best way do explain this is to give a concrete example:

  • Honest development“. For me this means that before any feature is added to the code base the following general steps happen: 1) after everyone has agreed with the specification the developer thinks about design, writes a design document, and extensively reviews it with his/her peers; 2) writes code, ask his/her peers to review it and steps through every line of it in debugger; 3) makes sure that warnings generated by different static source code analysis tools are fixed; 4) develops a number of unit tests to test his/her code; 5) works with his/her buddy tester to make sure that the feature has received decent amount of testing coverage before it's added to the code base.
  • The other way“. Write the code, make it sure it compiles, spend 5-10 minutes testing it, hope that STE-s will find all the other issues, and add it to the code base.

Everyone can see that using the first approach is much harder - it requires way more engineering discipline, takes more time, and forces the developer to work harder. Microsoft is the exceptional place in the following sense that most of the developers exercise the majority steps given in the first bullet point. I spent 5 years writing code before Microsoft in different software companies and I have to admit that I've only used the second approach during this period ;-) It's not something to be proud of, but at least it gave me a very good understanding how not to do the things.

Here's the trouble I'm having - how to define “honest code coverage“? I reduced the problem to the following - “feature is well-tested and has percentage of code coverage“. The sad thing is that it's not very easy to determine when something is well-tested. Number of books and PhD thesis’s are written about this problem alone. Yes, the usual things can be down: analyzing the bug trends, looking at the number of test cases (which is IMHO as pointless as measuring LOC to determine programmer's productivity), looking at how much user scenarios are covered, using certain mathematical models to determine how much bugs are left in the code, using the "green" - "yellow" - "red" - "green" or any other variations of unit testing from Extreme Programming methodologies etc. Unfortunately testing in controlled/simulated environments is quite different from product usage IRL. That's why all this fascination with “eating your own dog food“, having early deployments to external customers etc. The best feedback/indication about your product quality is IMHO still the reaction from your customers after you finally decide to ship ;-)

Requiring certain amount of code coverage for every new feature

One thing I've seen other teams succeeding with is requiring certain amount of code coverage for every new feature before adding it to the code base. IMHO this is quite reasonable thing to do. What it also means is that scheduling development and test related work items will be noticeably affected. One needs to add some amount of time to schedule for making sure that developers will have time to write (basic) unit tests, STE-s will have time to (more complicated) write test cases, and both sides will have time to work together and make sure that the right things will happen.

Usually when bug fixes are accepted late in the product cycle they'll go through very rigorous process and one of the things everyone is worried about is that how well the new code will be tested? Some of the teams require that for code changes like this the code coverage information is also given. Initially I thought of this as not being very useful when it comes to the cost vs. benefit, but last months before shipping the Speech Server have changed my mind ;-)

Dealing with non-exercised code paths

The process we currently use in our team is following:

  • Once a week we have special instrumented build made which contains information necessary for code coverage tools to do their thing.
  • All the automated test cases are run on this instrumented build and specific metrics about code coverage is published. Till this point everything happens automatically and practically no human intervention is required.
  • Depending on how much time the STE-s have, some people will run manually a number of test cases. As we all know that manual labor is expensive and therefore we don't do this every week. After completing these steps we achieve something which we declare as our final code coverage numbers.

What happens next is that IC-s start looking at the blocks of source code which aren't covered by the current set of test cases and start giving estimates about how much time it'll take to write code which will exercise those code paths. After that this work is prioritized like any other work items.

At the end of the last year we took some time and went through all paths in source code which weren't exercised by the test automation for our product. Yes, I mean it. Every single function/method which wasn't touched. Every single conditional branch which wasn't executed. I still like to get my hands dirty with “real work“, therefore I did this exercise for two of the binaries in our product. It took me while and at the end I was thinking that I'll go crazy ;-) Finally I had this nice document where per every non-exercised code path I had estimate about how much it'll take to write code to exercise this specific piece of code. It paid off pretty well actually, because every time somebody complained about something related to code coverage, I took this document and said something á la “To exercise this code path we need to do A, B, and C. To do so it'll take us n days. This piece of code will be executed only under conditions D, E, and F. Here's the list of other things this team is currently working on and here are the priorities we're following.” This usually ended the discussion ;-)

The main thing I learned from this experiment for the next milestone/version of the product is that doing this exercise is very useful and though it takes time I think the end result pays off - you know the details of the code what you're not exercising, why you're not exercising it, and what it'll take to do so. One thing to note is that this exercise should be done quite late in the product cycle when the code isn't changed so much anymore.

Posted: Mar 27 2004, 08:58 PM by gunnarku | with 1 comment(s)
Filed under:
Writing blog entries is no different from managing software projects

Actually most things in life are like managing software projects: you need to understand how much time (or any other resources) you’ll have for the tasks you want to perform, what’s the priority of these tasks, in what order they need to be performed etc. To set myself into the habit of writing at least two posts during the week I finally decided it’s time to publish the schedule for upcoming posts. Most human beings have different kinds of fears: some people are afraid of dark, some people are afraid of flying etc. One of the things I’m always afraid of is not meeting any of the dates I promised to meet (feel free to analyze me now) ;-) So, hopefully publicly telling that by date D I’ll post blog entry on subject S will force my subconscious to schedule my time better. Here are the subjects I plan to write about with the date, short summary, and title:

Date Title Summary
03/27/2004 How to Shoot Yourself in the Foot with Code Coverage? (Part II) I'll describe a number of things I would like to see both our development and test organizations to be doing during the next version of our product in regards to code coverage: a) definition of “honest code coverage“; b) how to make sure that before the feature is added to the code base it'll have already certain amount of code coverage; c) how to deal with the code paths which aren't covered during the daily automated test runs etc.
03/31/2004 Punch it in the nose - if it stops bleeding, punch it harder This quote is actually from one of the internal presentations about the quality, but it just sounds so right while describing the desired attitude ;-) There are a number of things I would like to cover: a) the right attitude towards the product quality and how it's different from the strive to the perfection; b) what are for me the signs of the right attitude in STE-s (Software Test Engineer), SDE-s, and SDE/T-s (Software Design Engineer (Test)); c) some of the success stories we had while working on the Speech Server during last 3 years when people showed the right attitude; d) general rambling about Software Engineering Code of Ethics and Professional Practice.
04/05/2004 Why reporting every single bug is so-so-so important This post will exist mainly because I give the similar speech at least once a day to somebody. We'll see why bothering to take the time to report bugs is extremely important for: a) product quality in general; b) planning the next version; c) doing root cause analysis and looking at the bug trends; d) helping our customers with their problems etc.
04/08/2004 How to cut your couch into pieces? (Part II) In second part there won't be any couches to cut into pieces, but there'll be one big white couch which we had to remove from the apartment through the second floor balcony.
March/April 2004

One possible negative outcome will be that nobody will read this blog anymore because the excitement á la “I wonder what this crazy guy will write about next time?” is gone. On the other hand the possible positive outcome can be something like - “Today I'll get up earlier because this is the day when I'll hear the story about how Gunnar was helping to move out the big couch through the balcony of his friend's second floor apartment.” We'll see. OK, now I need to stop flattering myself and do something useful!

Posted: Mar 26 2004, 04:58 PM by gunnarku | with no comments
Filed under:
"The average software developer reads less than one professional book per year...

... (not including manuals) and subscribes to no professional magazines." This is actually from DeMarco and Lister, Peopleware, 2d Ed, 1999. I don’t have the book in my hands currently to find out on what page it was, but I stumbled across this quote while reading "Professional Development Handbook" by Construx Software which is Steve McConnell’s company. If you care about developing yourself or your direct reports / organization then this is IMHO a good document to read.

But back to the title. I remember that the first couple of times I read "Peopleware" it didn’t struck me as a big deal. Possibly I’m currently too spoiled by the intellectual atmosphere and all the smart people surrounding me at Microsoft, but it sounds like hard to believe. Is it really that bad?

Posted: Mar 23 2004, 11:12 AM by gunnarku | with 11 comment(s)
Filed under:
How to Shoot Yourself in the Foot with Code Coverage? (Part I)
Prologue

Usually after every milestone or after every version of some product most of the groups have something called postmortem. Encarta Dictionary defines postmortem as "an analysis carried out shortly after the conclusion of an event, especially an unsuccessful one." This doesn’t of course mean that every time you have postmortem, something went wrong. Quite the opposite, analyzing the positive experiences and writing down keys to success have the same benefits as learning lessons from negative events. Most people have unconsciously their personal postmortems daily thinking about recent events and stuff ;-) This is my code coverage postmortem.

First of all, here are the things the current post isn’t about:

  • It’s not about explaining what code coverage is. I’ll make an assumption that you know the basic principles. There are plenty of articles written about code coverage, most software engineering textbooks touch the subject in one way or another, and there are a number of tools in the market also.
  • It’s not about how to use code coverage properly. Let’s be honest, I’m in the middle of working on the efficient process myself. And of course it’s always easier to criticize (even myself) than tell how to do something right. What’s this about? It’s about the experiences I and my team had with the code coverage during last year or so. The best paper written about misusing code coverage I’ve found so far is located here. It’s written by Brian Marick and BTW he has lots of other interesting things in his blog which are IMHO definitely worth reading.

First I’ll cover the major incorrect acts or decisions which were made and in the second par of this post I’ll mention a number of things I plan to improve during the next version of our product. So, hopefully one year from I can write about the smart decisions made and excellent results achieved ;-) Curious reader may also wonder what code coverage tools we use to get the job done. We used toolset called "Magellan". Microsoft Research web site the previous link points to has somewhat more information.

Mistake 1: underestimating the complexity of code coverage itself

When I started to use any code coverage tools then the first thing I did was to spend significant amount of time while reading code coverage related articles in ACM Digital Library and I have to admit that I still don’t feel very confident talking about code coverage. There are number of things: function coverage, block coverage, statement coverage, path coverage, condition coverage etc. To efficiently use any of this a time needs to be invested in training people, choosing proper metrics, explaining metrics to all team members, and picking key things to measure. If I would have to start from scratch I would probably plan at least 1-1.5 days for in-depth workshop for the entire team to go through all the terms, their usage, and applicability to our particular situation.

Mistake 2: falling into the general "bigger numbers = good testing quality" trap

Whenever I read articles where somebody recommends that one should strive for 100% code coverage I want to start screaming ;-) This statement is IMHO in the equivalence class with stating that absolutely all the bugs in the product need to be fixed. What tends to happen IRL is that there’ll number of team: teams A, B, and C and we’ll start measuring how much code coverage these team have for the components they own. What happens next is very natural to human nature - people tend to optimize their work based on how it gets measured and the competition starts kicking in.

I usually give two examples how measuring the level of component testing is absolutely not relevant to how much code coverage you have. First example: assume that I’m one of these people who like to write their own versions of strlen(). My personal rationale is that I want an assertion to be fired when NULL pointer is passed to the function as a parameter. My code may look like this:

UINT YetAnotherStrLen(LPCWSTR szSomething)
{
    _ASSERTE(0 != szSomething);

    UINT nCount = 0;

    while (L'\0' != *szSomething++)
    {
        nCount++;
    }

    return nCount % 42;
}

Let's assume that somebody will write a quick unit test calling this function with parameter "Foo". The result checks out and nobody may even notice that programmer who wrote this routine made a mistake of returning all the results modulo 42. If somebody will measure how much code will be covered by this single unit test then there’s a pretty good probability that your favorite code coverage tool will report 100% of lines covered. Can we make any decision based on that? Of course not. You should use the most powerful weapon you have - common sense. The main point I’m trying to make is that like any other thing the code coverage can be very easily misused.

One of the most senior people in our development organization told me once story about his previous team: they were writing some number of tests to be run before any addition to code base (we call them check-in tests, your mileage may vary) and one of the developers was given a task to write a test which would have execution time less than 15 minutes and will cover as much code in product as possible. It took this person about a day and he achieved ~80% of block coverage. They did this to have something which will make sure that most of the code is executed and during the execution nothing bad will happen, but nobody ever stated that this test will prove that most of the features work as they’re supposed to. This is an example of doing something and understanding what's done and why it's bad or good.

Mistake 3: trying to achieve more than 80% of code coverage

Based on my experience getting more than 80% of code coverage gets extremely tricky and there’s very little in return for your investments. There’s an urban legend in circulation about one of the software companies where developers were pushed to achieve some unrealistic code coverage number and they ended up removing error handling code…

One of the other examples I like to give to people is analyzing the cost vs. benefit when trying to write some amount of unit tests to get all the error handling cases covered. Let’s look at hypothetical situation: you have a service, something bad happens (let's assume that service code contains some amount of AI and it detects that security has been compromised ;-)) and you’ll need to stop the service, if stopping the service fails then you would like to log the reason for a failure to NT Event log. The pseudo-code may look like this:

...

if (fSomethingBadHappened) {
    // Get the handle to the service.
    SC_HANDLE hService = ::OpenService(...);

    if (NULL != hService) {
        // Go ahead and stop the service.
        ...
    } else {
        // Try to open an event log.
        HANDLE hLog = ::OpenEventLog(...);

        if (NULL != hLog) {
            // Go ahead and log the reason for failure.
            ...
        } else {
            // We can't even open the NT Event log.
            // Do something else.
            ...
        }
    }
}

...

Let’s say that we need to write an automated test case for making sure that code in the last error case (we can’t open an event log) is executed. What do we need to do? First of all we need to simulate this condition to make "something bad to happen", then we need to make somehow OpenService() fail and then we need to make OpenEventLog() to fail. Usually this can be done using technique called fault injection. How much time it’ll take? Well at least one day if not more: writing the code, testing it, asking somebody else to review the code, checking it in, including the test case in the suite of automated test cases, monitoring the behavior of the test case next day etc.

Decision if to write an automated test case or not isn't any different from deciding if some bug needs to get fixed or not. IMHO you should not waste resources to achieve another percentage of code coverage if these resources can be used somewhere where it makes more sense. Remember, common sense rules ;-)

To be continued...

Posted: Mar 21 2004, 10:05 PM by gunnarku | with 9 comment(s)
Filed under:
Do you have a personal coffee machine at work?

It's quite late in the Seattle area now and I'm in the process of writing a first draft of a blog entry about how to misuse the code coverage. We've learned some lessons during the last year or so in Speech Server team and it's a good time for the root cause analysis. Of course it's also always easier to criticize something than do it properly yourself ;-) While analyzing our failures and successes I drink coffee to stay mentally sharp and suddenly realize that I actually need to write something about coffee.

How this old joke was: "software engineers are machines that take coffee and requirements as input and produce lines of code as output?" After having a personal coffee machine at office for last 6 months or so, I consider this being an investment which pays of very well. Generally there are two main ways to get coffee at Microsoft (I won’t talk about the option of bringing your own coffee to work):

  • "Starbucks". Almost every Microsoft cafeteria has one. IMHO "Starbucks" has quite high quality and wide choice. There are a couple of problems though. Getting "Starbucks" consists of the following steps: walking to the cafeteria, standing in the line, ordering your coffee, waiting for the coffee, and walking back or taking an additional 5-10 minutes to chat with your colleagues. Don’t get me wrong, I don’t have anything against the exercise, I get plenty of that almost every day. The question for me is the time and getting back into zone.
  • Community coffee machine. Algorithm is simple: if somebody has made some coffee then drink it, otherwise brew the coffee yourself. The trick with the community coffee machine is that if person A makes coffee then persons B, C, and D may not like it. It’s too strong or in most of the cases it’s too emaciated. I ran once a series of experiments where I went and tasted "community coffee" for 10 times during random days at random time. Only at 2 out of 10 times the coffee was drinkable for me, otherwise it was too weak ;-) But on the other hand, I like strong coffee so the problem is mainly on my side. Another solution here is to come to work very early and brew the first pot of coffee or intercept the moment when another pot needs to be started.

About half a year ago I purchased my own coffee machine and I have to admit that it was wise thing to do. My choices expanded significantly:

  • I can brew as strong coffee as I want.
  • I can brew any type of coffee I want.
  • I can brew as much coffee as I want. The fellow geek next door needs a shot of caffeine, no problem at all - I can help him out.

To summarize: though I still visit the nearby cafeteria once or twice a week (mainly because of the social reasons) this experience could serve as a good material for self-help book - "Coffee machine which changed my life and increased my productivity" ;-) I also have to think what'll be the next thing to acquire? Refrigerator or microwave oven or even a cocktail shaker ;-)

Posted: Mar 18 2004, 10:26 PM by gunnarku | with 6 comment(s)
Filed under:
Past-due software projects, can something be done about them? (Part II)

In the first part of this post I described the three most common "different type of scheduling games software engineers play":

  1. Game 1. Everything’s on schedule, but suddenly we need to do some additional work (add a new feature, own an additional area of responsibility, comply with some additional coding standards etc.)
  2. Game 2. Everything’s good in our team, but because of some dependencies or unforeseen events the project will take longer than expected.
  3. Game 3. We tried to do our best while estimating, but made a number of mistakes and now it’s going to take longer than we initially thought.

Here are the ways I try to deal with these different situations. Don’t treat it as some kind of universal strategic guidance. One of my university professors used to say that the term "universal" means that it works equally bad in all different situations ;-)

Game 1

IMHO this is the best situation you can be in. Your team is on track, everything looks fine, and with the pretty high probability you’ll meet the dates you promised to meet. From the negotiations point of view you’re in very good strategic position and you’re doing better than average peer of yours. Usually there are few possible outcomes:

  • Your superiors have a common sense and they understand that for example taking time to review the entire code base and look for new type of security vulnerabilities (integer overflows for example) will take some time and doing the review on Saturday at 03:00 AM won’t probably be the most efficient/productive use of everyone’s time.
  • Your superiors have a common sense, but something just needs to be done by specific date. For example: your product launch date is announced; major customers won’t want your product without specific new feature; person who owned MMC plug-in left the company and somebody needs take care of his/her bugs by some certain date etc. This shouldn’t be confused with the "it needs to be done because slipping will look bad in my record and I’m willing to sacrifice the well-being of the entire team to meet some date" situation.

On the other side being on track can also be a bad thing. Nowadays (if not always) the software projects meeting deadlines and staying on track are rather suspicious than normal. Let’s look at the hypothetical story about two teams:

There are two development teams: A and B. Both are working in parallel on some COM-based software. Developers in team A decide that smart pointers are good thing and they’ll be using them consistently across the project (for the sake of the example let’s not start arguing about smart pointers ;-)) Developers in team B decide that Real Programmers know exactly when to call IUnknown::AddRef and IUnknown::Release and smart pointers are for losers who don’t know how COM works and what they’re doing. Two months after starting the project both teams are feature complete, but when it comes to the bugs then team A has a way less bugs than team B. Team B is constantly struggling with reference counting issues. As a result of this team B spends next two weeks working on nights and weekends, team leads order food for the team every day at 06:00 PM, e-mails about successfully fixing something are sent out at 02:47 AM etc. What kind of perception is generated to the outsider about team B? Mostly the good kind of perception. Team B is committed to shipping the product, they put in heroic efforts, they do whatever it takes to meet the deadlines, and they’re “old school“, yeah. As my manager once told me after I was too cynical during one meeting and later wondered why people complained about this: "perception is a reality" ;-) If you’re insider then you may know that team A has succeeded because they made IMHO better engineering decision and while team B has been fixing bugs, they’ve been doing something else which helps to generate the revenue for the company.

Game 2

Dependencies are generally a very bad thing. Joel on Software contains a nice general article about this. In our product we have been battling with dependencies for years now. The most complicated dependencies based on my personal experience are:

  • You are relying on 3-rd party component. That’s it, the code is out of your control, the design is out of your control, and you don’t know what’s going on inside the component. Getting something fixed isn’t just: "run cdb; look at the source code; figure out the problem; fix it;" Dependencies bad, external dependencies very bad. Probably everyone who ever has used any components for doing either SSL/TLS related stuff, generating database reports, having a fancy calendar control etc. knows what I mean.
  • General libraries, base classes, frameworks etc. Everyone is using ATL, CLR, CRT, .NET Framework etc. But what happens if something in these libraries is blocking shipping your product and there’s no feasible workaround? The good thing about being at Microsoft is that pretty much everything we use is built by some other team ;-) You have a problem with CLR - the answer is just an e-mail away.
  • Bugs in OS. Nothing is ever perfect. The same thing as under previous bullet-point applies here. With dependencies and problems like this there’s not much you can do. Yes, the feature can be cut and some hectic workaround can be implemented, but it still will take a time and increase the risk of breaking something else.

In my pre-Microsoft life somebody once suggested replacing one 3-rd party component with another because the project dates were quite critical. The suggestion was made about two days before the final date ;-) Everyone with common sense will understand that why a month before shipping you can’t start replacing the building blocks of your product.

Game 3

This is a situation where the only thing to do is to say "guilty as charged". I tend to divide this particular situation into two different categories:

  • You have somebody junior reporting to you who made an estimation mistake. Usually all the estimates are rigorously reviewed and reviewed once again and then reviewed again. IMHO in case of junior people it’s 50% the manager’s fault also - insufficient tutoring, reviewing, not assigning a proper mentor etc.
  • You have somebody senior who makes constantly estimation mistakes. Well, in this case you have a problem in your hands.

One can ask that why I’m spending so much time talking about this? Mainly because of the similarities to chaos theory. Everyone has heard the term "butterfly effect". Let me give you an hypothetical example: "Let's assume that I ask somebody how much the task A will take; person replies that task A will be done by Monday morning; I ask for the confirmation, person confirms once more; I go and tell to my manager that we’ll complete something by Monday morning; my manager passes the good news to somebody in upper management who has conference call with external partners in three hours from now; news are passed to partners that they’ll get some results by Monday morning; task A takes longer than expected because of the incorrect estimates and we actually complete it by Tuesday EOD; on Thursday two vice presidents (ours and partners) talk and one happens to mention the fact that they can’t deliver B on time because A was late and therefore B will be late." You can imagine the rest ;-)

Lectures from the authors of RSA available online (Turing Award 2002)

What normal people usually do on Sunday evening? I bet that anything except reading the recent issue of ACM "Software Engineering Notes" ;-) One cool fact that the latest issue mentions is that ACM’s 2002 Turing Award lecture materials are now online. Three presentations from the authors of RSA encryption algorithm are available:

  • Dr. Leonard M. Adleman on "Pre RSA Days"
  • Dr. Ronald L. Rivest on "Early Days of RSA"
  • Dr. Adi Shamir on "Cryptology: A Status Report"

If you have been ever fascinated about the cryptology and related security issues, I would strongly recommend viewing these talks.

One piece of information (which is also mentioned in this talk) many security enthusiasts may not know is that two of the most famous public key encryption methods: Diffie-Hellman protocol and RSA weren’t actually discovered by the people by whom the protocols are named ;-) Both Ross Anderson’s "Security Engineering" and Steven Levy’s "Crypto" give credit to the following GCHQ employees: James Ellis, Malcolm Williamson, and Clifford Cocks. Check these books or Wikipedia for more information and longer explanations.

Posted: Mar 14 2004, 07:46 PM by gunnarku | with no comments
Filed under:
Bastard Code Reviewer from Hell
Prologue

When somebody will ask about the relationship between me and Un*x then I would have to admit that my alma mater was/is rather the Un*x place than Microsoft place. This also means that I “grew up professionally” while reading all these stories about Bastard Operator from Hell ;-) Times have changed and now I’m not reading fanatically Richard Stevens’s books anymore and honestly I don’t even remember the last time when I wrote any code that used fork() system call in it ;-) The frightening image of BOFH is stuck in my memory though.

Current day

Reading the actual product source code is one of the things I think every developer should constantly be doing. This is IMHO practically the only way to learn good coding habits, design tricks, learn how to avoid the pitfalls, increase an understanding about different error handling strategies, pick up a good programming style, understand how software is “really made” etc. When it comes to the code reviews then IMHO they’re one of the most effective ways to teach people how to write a good code. I won’t talk even about the aspect of preventing the defects. Check your favorite software engineering handbook for the references to actual studies and the fact sheets about the usefulness of formal code inspections.

At one beautiful evening I took some time to go through the entire source code base of one of our components. Generally we’re in pretty good shape: we have been automatically running PREfast, PREfix, and some other static source code analysis tools on our code base on a weekly basis. Also the FxCop gets automatically run on our assemblies on a daily basis (majority of our code product code is written in C#). Starting next milestone we’ll be also having certain amount of “code cleanness” as one of the requirements prior to making any changes to the code. As a result of this, when it comes to the usual coding defects: memory leaks, non-checked error codes and parameters, buffer overflows, un-initialized variables etc., we’re in pretty decent shape. We analyze the warnings regularly and have succeeded to keep our code base quite clean from warnings during last couple of years. OK, after this amount of bragging is now time to shut up and talk about the real problems also ;-)

While reading the code there’s still some amount of issues which are left for the diligent reviewer to rave about. Here’s a limited example list of problems I’ll be closely watching while the code for the next milestone will be written. Or if I'll get a life then hopefully somebody else will be doing this ;-) The component I’m referring to is written in unmanaged code, so here are my nitpickings. One of the things I’m quite passionate about is keeping only the code which is necessary to get the job done i.e., the code is perfect when there’s nothing to remove anymore ;-) E. W. Dijkstra once said: “My point today is that, if we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent”: the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.” IRL most of people quote it usually as: “Every line of the code isn't the asset but the liability.”

Passing objects by value instead of passing them by reference

There are lost of places all over the code when both small and large objects are passed around by value. This is no good and if even Scott Meyers bothers to mention this as one of the ways to make your code more effective then everyone should take this into the account.

HRESULT Foo(CComBSTR strHeader, CComBSTR strMessage)
{
   ...
}

It’s not so much work to fix this. Even if you don’t care about the const-correctness then just add an ampersand:

HRESULT Foo(const CComBSTR& strHeader, const CComBSTR& strMessage)
{
   ...
}
Pointless temporary variables
HRESULT WrapperForFoo(DWORD dwSomething, const CString& strSomething)
{
    HRESULT hr = S_OK;

    hr = Foo(dwSomething, strSomething, TRUE);

    return hr;
}

HRESULT Foo(DWORD dwSomething, const CString& strSomething, BOOL fSomeFlag)
{
    ...
}

If somebody can tell me why we should use the temporary variable, please let me know. For the debugging purposes? Just set a break-point in ‘Foo’. Otherwise we end up having five lines of code instead of one. I won’t even mention that we should probably just inline this wrapper function.

Confusing assertions
BOOL DoSomething(int idSomething)
{
    _ASSERTE(idSomething);

    ...
}

When looking at the code then at least I’m confused. Can ‘idCtrl’ be also negative or only positive? This assertion doesn’t give me any additional information about the assumptions we're making. Change it to:

_ASSERTE(0 < idSomething);
if only positive values are allowed or add the comment to the code or make the assertion to produce some meaningful message, but do something different.
// We allow negative identifiers also.
_ASSERTE(0 != idSomething);
My favorite
CFoo::~CFoo()
{
    ...

    if (NULL != m_pBar)
    {
        delete m_pBar;
        m_pBar = NULL;
    }
}

I wrote about this last month. This is just so unreasonable. Why write one line of code when five lines will do it ;-) Also, why set the member variable to NULL in the destructor. If somebody knows, please let me know.

Epilogue

As usual, feel free to ignore my rambling. First of all a good case can be made that one shouldn't spend so much time to worry about these little things and do something useful for change ;-) Some people can view this as another “Vi v. Emacs“ or “tabs v. spaces“ discussion and think that hunting down minor problems like this is a waste of time. I tend to kindly disagree. Using the same logic we can say that it doesn't matter if some kind of memory location is off by a couple of bytes, the pointer points close enough to the actual target ;-) Well, I guess that being bothered by issues like this is just a part of my professional cretinism.

"Hello?" I say
"Can you review my code?" he asks
"Sure, why not." I agree pleasantly
"Well, I have this interface ..."
I interrupt him
"Do you always check for the proper buffer size before using strcpy()?"
"No, why somebody will ever call my public interface with the wrong buffer size" he says innocently
They never learn...
"Stay where you are. You're found guilty in violating the Buffer Overflow Prevention Policy. The Code Police is on their way to arrest you..."
Over the phone I hear him running away

More Posts Next page »