Archives

Archives / 2004 / March
  • Would you be willing to use speech based authentication IRL?

    One of the frequent questions we’re getting from our customers is what we plan to do about biometrics and speech based authentication in Speech Server? Particularly people are interested in telephony scenarios á la:

    ...
    System: Please state your name.
    Caller: Gunnar Kudrjavets.
    System: (doing some magic) You’re authenticated. What do you want to do?
    Caller: Savings account balance, please.
    ...

    Speech is very different from traditional biometrics based security measures: face, fingerprint, and iris. I’m just curious to know, would you trust something like this? Yes, I know that it depends on person, but hopefully if enough people will respond, I can make some useful conclusions from this ;-) Google will point you to lots of good links related to the current state of speech based authentication, error percentages, research articles etc.

    Read more...

  • How to Shoot Yourself in the Foot with Code Coverage? (Part II)

    Honest code coverage

    One of the phrases I quite often use is “honest something“. What do I mean by this? First of all I believe that everything what is worth doing should be done with the highest quality possible. Secondly I believe in doing things “by the book“, but this shouldn't be confused with blindly following orders or having very fixed and limited thinking process ;-) The best way do explain this is to give a concrete example:

    • Honest development“. For me this means that before any feature is added to the code base the following general steps happen: 1) after everyone has agreed with the specification the developer thinks about design, writes a design document, and extensively reviews it with his/her peers; 2) writes code, ask his/her peers to review it and steps through every line of it in debugger; 3) makes sure that warnings generated by different static source code analysis tools are fixed; 4) develops a number of unit tests to test his/her code; 5) works with his/her buddy tester to make sure that the feature has received decent amount of testing coverage before it's added to the code base.
    • The other way“. Write the code, make it sure it compiles, spend 5-10 minutes testing it, hope that STE-s will find all the other issues, and add it to the code base.

    Everyone can see that using the first approach is much harder - it requires way more engineering discipline, takes more time, and forces the developer to work harder. Microsoft is the exceptional place in the following sense that most of the developers exercise the majority steps given in the first bullet point. I spent 5 years writing code before Microsoft in different software companies and I have to admit that I've only used the second approach during this period ;-) It's not something to be proud of, but at least it gave me a very good understanding how not to do the things.

    Here's the trouble I'm having - how to define “honest code coverage“? I reduced the problem to the following - “feature is well-tested and has percentage of code coverage“. The sad thing is that it's not very easy to determine when something is well-tested. Number of books and PhD thesis’s are written about this problem alone. Yes, the usual things can be down: analyzing the bug trends, looking at the number of test cases (which is IMHO as pointless as measuring LOC to determine programmer's productivity), looking at how much user scenarios are covered, using certain mathematical models to determine how much bugs are left in the code, using the "green" - "yellow" - "red" - "green" or any other variations of unit testing from Extreme Programming methodologies etc. Unfortunately testing in controlled/simulated environments is quite different from product usage IRL. That's why all this fascination with “eating your own dog food“, having early deployments to external customers etc. The best feedback/indication about your product quality is IMHO still the reaction from your customers after you finally decide to ship ;-)

    Requiring certain amount of code coverage for every new feature

    One thing I've seen other teams succeeding with is requiring certain amount of code coverage for every new feature before adding it to the code base. IMHO this is quite reasonable thing to do. What it also means is that scheduling development and test related work items will be noticeably affected. One needs to add some amount of time to schedule for making sure that developers will have time to write (basic) unit tests, STE-s will have time to (more complicated) write test cases, and both sides will have time to work together and make sure that the right things will happen.

    Usually when bug fixes are accepted late in the product cycle they'll go through very rigorous process and one of the things everyone is worried about is that how well the new code will be tested? Some of the teams require that for code changes like this the code coverage information is also given. Initially I thought of this as not being very useful when it comes to the cost vs. benefit, but last months before shipping the Speech Server have changed my mind ;-)

    Dealing with non-exercised code paths

    The process we currently use in our team is following:

    • Once a week we have special instrumented build made which contains information necessary for code coverage tools to do their thing.
    • All the automated test cases are run on this instrumented build and specific metrics about code coverage is published. Till this point everything happens automatically and practically no human intervention is required.
    • Depending on how much time the STE-s have, some people will run manually a number of test cases. As we all know that manual labor is expensive and therefore we don't do this every week. After completing these steps we achieve something which we declare as our final code coverage numbers.

    What happens next is that IC-s start looking at the blocks of source code which aren't covered by the current set of test cases and start giving estimates about how much time it'll take to write code which will exercise those code paths. After that this work is prioritized like any other work items.

    At the end of the last year we took some time and went through all paths in source code which weren't exercised by the test automation for our product. Yes, I mean it. Every single function/method which wasn't touched. Every single conditional branch which wasn't executed. I still like to get my hands dirty with “real work“, therefore I did this exercise for two of the binaries in our product. It took me while and at the end I was thinking that I'll go crazy ;-) Finally I had this nice document where per every non-exercised code path I had estimate about how much it'll take to write code to exercise this specific piece of code. It paid off pretty well actually, because every time somebody complained about something related to code coverage, I took this document and said something á la “To exercise this code path we need to do A, B, and C. To do so it'll take us n days. This piece of code will be executed only under conditions D, E, and F. Here's the list of other things this team is currently working on and here are the priorities we're following.” This usually ended the discussion ;-)

    The main thing I learned from this experiment for the next milestone/version of the product is that doing this exercise is very useful and though it takes time I think the end result pays off - you know the details of the code what you're not exercising, why you're not exercising it, and what it'll take to do so. One thing to note is that this exercise should be done quite late in the product cycle when the code isn't changed so much anymore.

    Read more...

  • Writing blog entries is no different from managing software projects

    Actually most things in life are like managing software projects: you need to understand how much time (or any other resources) you’ll have for the tasks you want to perform, what’s the priority of these tasks, in what order they need to be performed etc. To set myself into the habit of writing at least two posts during the week I finally decided it’s time to publish the schedule for upcoming posts. Most human beings have different kinds of fears: some people are afraid of dark, some people are afraid of flying etc. One of the things I’m always afraid of is not meeting any of the dates I promised to meet (feel free to analyze me now) ;-) So, hopefully publicly telling that by date D I’ll post blog entry on subject S will force my subconscious to schedule my time better. Here are the subjects I plan to write about with the date, short summary, and title:

    Date Title Summary
    03/27/2004 How to Shoot Yourself in the Foot with Code Coverage? (Part II) I'll describe a number of things I would like to see both our development and test organizations to be doing during the next version of our product in regards to code coverage: a) definition of “honest code coverage“; b) how to make sure that before the feature is added to the code base it'll have already certain amount of code coverage; c) how to deal with the code paths which aren't covered during the daily automated test runs etc.
    03/31/2004 Punch it in the nose - if it stops bleeding, punch it harder This quote is actually from one of the internal presentations about the quality, but it just sounds so right while describing the desired attitude ;-) There are a number of things I would like to cover: a) the right attitude towards the product quality and how it's different from the strive to the perfection; b) what are for me the signs of the right attitude in STE-s (Software Test Engineer), SDE-s, and SDE/T-s (Software Design Engineer (Test)); c) some of the success stories we had while working on the Speech Server during last 3 years when people showed the right attitude; d) general rambling about Software Engineering Code of Ethics and Professional Practice.
    04/05/2004 Why reporting every single bug is so-so-so important This post will exist mainly because I give the similar speech at least once a day to somebody. We'll see why bothering to take the time to report bugs is extremely important for: a) product quality in general; b) planning the next version; c) doing root cause analysis and looking at the bug trends; d) helping our customers with their problems etc.
    04/08/2004 How to cut your couch into pieces? (Part II) In second part there won't be any couches to cut into pieces, but there'll be one big white couch which we had to remove from the apartment through the second floor balcony.
    March/April 2004

    One possible negative outcome will be that nobody will read this blog anymore because the excitement á la “I wonder what this crazy guy will write about next time?” is gone. On the other hand the possible positive outcome can be something like - “Today I'll get up earlier because this is the day when I'll hear the story about how Gunnar was helping to move out the big couch through the balcony of his friend's second floor apartment.” We'll see. OK, now I need to stop flattering myself and do something useful!

    Read more...

  • "The average software developer reads less than one professional book per year...

    ... (not including manuals) and subscribes to no professional magazines." This is actually from DeMarco and Lister, Peopleware, 2d Ed, 1999. I don’t have the book in my hands currently to find out on what page it was, but I stumbled across this quote while reading "Professional Development Handbook" by Construx Software which is Steve McConnell’s company. If you care about developing yourself or your direct reports / organization then this is IMHO a good document to read.

    But back to the title. I remember that the first couple of times I read "Peopleware" it didn’t struck me as a big deal. Possibly I’m currently too spoiled by the intellectual atmosphere and all the smart people surrounding me at Microsoft, but it sounds like hard to believe. Is it really that bad?

    Read more...

  • How to Shoot Yourself in the Foot with Code Coverage? (Part I)

    Prologue

    Usually after every milestone or after every version of some product most of the groups have something called postmortem. Encarta Dictionary defines postmortem as "an analysis carried out shortly after the conclusion of an event, especially an unsuccessful one." This doesn’t of course mean that every time you have postmortem, something went wrong. Quite the opposite, analyzing the positive experiences and writing down keys to success have the same benefits as learning lessons from negative events. Most people have unconsciously their personal postmortems daily thinking about recent events and stuff ;-) This is my code coverage postmortem.

    First of all, here are the things the current post isn’t about:

    • It’s not about explaining what code coverage is. I’ll make an assumption that you know the basic principles. There are plenty of articles written about code coverage, most software engineering textbooks touch the subject in one way or another, and there are a number of tools in the market also.
    • It’s not about how to use code coverage properly. Let’s be honest, I’m in the middle of working on the efficient process myself. And of course it’s always easier to criticize (even myself) than tell how to do something right. What’s this about? It’s about the experiences I and my team had with the code coverage during last year or so. The best paper written about misusing code coverage I’ve found so far is located here. It’s written by Brian Marick and BTW he has lots of other interesting things in his blog which are IMHO definitely worth reading.

    First I’ll cover the major incorrect acts or decisions which were made and in the second par of this post I’ll mention a number of things I plan to improve during the next version of our product. So, hopefully one year from I can write about the smart decisions made and excellent results achieved ;-) Curious reader may also wonder what code coverage tools we use to get the job done. We used toolset called "Magellan". Microsoft Research web site the previous link points to has somewhat more information.

    Mistake 1: underestimating the complexity of code coverage itself

    When I started to use any code coverage tools then the first thing I did was to spend significant amount of time while reading code coverage related articles in ACM Digital Library and I have to admit that I still don’t feel very confident talking about code coverage. There are number of things: function coverage, block coverage, statement coverage, path coverage, condition coverage etc. To efficiently use any of this a time needs to be invested in training people, choosing proper metrics, explaining metrics to all team members, and picking key things to measure. If I would have to start from scratch I would probably plan at least 1-1.5 days for in-depth workshop for the entire team to go through all the terms, their usage, and applicability to our particular situation.

    Mistake 2: falling into the general "bigger numbers = good testing quality" trap

    Whenever I read articles where somebody recommends that one should strive for 100% code coverage I want to start screaming ;-) This statement is IMHO in the equivalence class with stating that absolutely all the bugs in the product need to be fixed. What tends to happen IRL is that there’ll number of team: teams A, B, and C and we’ll start measuring how much code coverage these team have for the components they own. What happens next is very natural to human nature - people tend to optimize their work based on how it gets measured and the competition starts kicking in.

    I usually give two examples how measuring the level of component testing is absolutely not relevant to how much code coverage you have. First example: assume that I’m one of these people who like to write their own versions of strlen(). My personal rationale is that I want an assertion to be fired when NULL pointer is passed to the function as a parameter. My code may look like this:

    UINT YetAnotherStrLen(LPCWSTR szSomething)
    {
        _ASSERTE(0 != szSomething);
    
        UINT nCount = 0;
    
        while (L'\0' != *szSomething++)
        {
            nCount++;
        }
    
        return nCount % 42;
    }

    Let's assume that somebody will write a quick unit test calling this function with parameter "Foo". The result checks out and nobody may even notice that programmer who wrote this routine made a mistake of returning all the results modulo 42. If somebody will measure how much code will be covered by this single unit test then there’s a pretty good probability that your favorite code coverage tool will report 100% of lines covered. Can we make any decision based on that? Of course not. You should use the most powerful weapon you have - common sense. The main point I’m trying to make is that like any other thing the code coverage can be very easily misused.

    One of the most senior people in our development organization told me once story about his previous team: they were writing some number of tests to be run before any addition to code base (we call them check-in tests, your mileage may vary) and one of the developers was given a task to write a test which would have execution time less than 15 minutes and will cover as much code in product as possible. It took this person about a day and he achieved ~80% of block coverage. They did this to have something which will make sure that most of the code is executed and during the execution nothing bad will happen, but nobody ever stated that this test will prove that most of the features work as they’re supposed to. This is an example of doing something and understanding what's done and why it's bad or good.

    Mistake 3: trying to achieve more than 80% of code coverage

    Based on my experience getting more than 80% of code coverage gets extremely tricky and there’s very little in return for your investments. There’s an urban legend in circulation about one of the software companies where developers were pushed to achieve some unrealistic code coverage number and they ended up removing error handling code…

    One of the other examples I like to give to people is analyzing the cost vs. benefit when trying to write some amount of unit tests to get all the error handling cases covered. Let’s look at hypothetical situation: you have a service, something bad happens (let's assume that service code contains some amount of AI and it detects that security has been compromised ;-)) and you’ll need to stop the service, if stopping the service fails then you would like to log the reason for a failure to NT Event log. The pseudo-code may look like this:

    ...
    
    if (fSomethingBadHappened) {
        // Get the handle to the service.
        SC_HANDLE hService = ::OpenService(...);
    
        if (NULL != hService) {
            // Go ahead and stop the service.
            ...
        } else {
            // Try to open an event log.
            HANDLE hLog = ::OpenEventLog(...);
    
            if (NULL != hLog) {
                // Go ahead and log the reason for failure.
                ...
            } else {
                // We can't even open the NT Event log.
                // Do something else.
                ...
            }
        }
    }
    
    ...

    Let’s say that we need to write an automated test case for making sure that code in the last error case (we can’t open an event log) is executed. What do we need to do? First of all we need to simulate this condition to make "something bad to happen", then we need to make somehow OpenService() fail and then we need to make OpenEventLog() to fail. Usually this can be done using technique called fault injection. How much time it’ll take? Well at least one day if not more: writing the code, testing it, asking somebody else to review the code, checking it in, including the test case in the suite of automated test cases, monitoring the behavior of the test case next day etc.

    Decision if to write an automated test case or not isn't any different from deciding if some bug needs to get fixed or not. IMHO you should not waste resources to achieve another percentage of code coverage if these resources can be used somewhere where it makes more sense. Remember, common sense rules ;-)

    To be continued...

    Read more...

  • Do you have a personal coffee machine at work?

    It's quite late in the Seattle area now and I'm in the process of writing a first draft of a blog entry about how to misuse the code coverage. We've learned some lessons during the last year or so in Speech Server team and it's a good time for the root cause analysis. Of course it's also always easier to criticize something than do it properly yourself ;-) While analyzing our failures and successes I drink coffee to stay mentally sharp and suddenly realize that I actually need to write something about coffee.

    How this old joke was: "software engineers are machines that take coffee and requirements as input and produce lines of code as output?" After having a personal coffee machine at office for last 6 months or so, I consider this being an investment which pays of very well. Generally there are two main ways to get coffee at Microsoft (I won’t talk about the option of bringing your own coffee to work):

    • "Starbucks". Almost every Microsoft cafeteria has one. IMHO "Starbucks" has quite high quality and wide choice. There are a couple of problems though. Getting "Starbucks" consists of the following steps: walking to the cafeteria, standing in the line, ordering your coffee, waiting for the coffee, and walking back or taking an additional 5-10 minutes to chat with your colleagues. Don’t get me wrong, I don’t have anything against the exercise, I get plenty of that almost every day. The question for me is the time and getting back into zone.
    • Community coffee machine. Algorithm is simple: if somebody has made some coffee then drink it, otherwise brew the coffee yourself. The trick with the community coffee machine is that if person A makes coffee then persons B, C, and D may not like it. It’s too strong or in most of the cases it’s too emaciated. I ran once a series of experiments where I went and tasted "community coffee" for 10 times during random days at random time. Only at 2 out of 10 times the coffee was drinkable for me, otherwise it was too weak ;-) But on the other hand, I like strong coffee so the problem is mainly on my side. Another solution here is to come to work very early and brew the first pot of coffee or intercept the moment when another pot needs to be started.

    About half a year ago I purchased my own coffee machine and I have to admit that it was wise thing to do. My choices expanded significantly:

    • I can brew as strong coffee as I want.
    • I can brew any type of coffee I want.
    • I can brew as much coffee as I want. The fellow geek next door needs a shot of caffeine, no problem at all - I can help him out.

    To summarize: though I still visit the nearby cafeteria once or twice a week (mainly because of the social reasons) this experience could serve as a good material for self-help book - "Coffee machine which changed my life and increased my productivity" ;-) I also have to think what'll be the next thing to acquire? Refrigerator or microwave oven or even a cocktail shaker ;-)

    Read more...

  • Past-due software projects, can something be done about them? (Part II)

    In the first part of this post I described the three most common "different type of scheduling games software engineers play":

    1. Game 1. Everything’s on schedule, but suddenly we need to do some additional work (add a new feature, own an additional area of responsibility, comply with some additional coding standards etc.)
    2. Game 2. Everything’s good in our team, but because of some dependencies or unforeseen events the project will take longer than expected.
    3. Game 3. We tried to do our best while estimating, but made a number of mistakes and now it’s going to take longer than we initially thought.

    Here are the ways I try to deal with these different situations. Don’t treat it as some kind of universal strategic guidance. One of my university professors used to say that the term "universal" means that it works equally bad in all different situations ;-)

    Game 1

    IMHO this is the best situation you can be in. Your team is on track, everything looks fine, and with the pretty high probability you’ll meet the dates you promised to meet. From the negotiations point of view you’re in very good strategic position and you’re doing better than average peer of yours. Usually there are few possible outcomes:

    • Your superiors have a common sense and they understand that for example taking time to review the entire code base and look for new type of security vulnerabilities (integer overflows for example) will take some time and doing the review on Saturday at 03:00 AM won’t probably be the most efficient/productive use of everyone’s time.
    • Your superiors have a common sense, but something just needs to be done by specific date. For example: your product launch date is announced; major customers won’t want your product without specific new feature; person who owned MMC plug-in left the company and somebody needs take care of his/her bugs by some certain date etc. This shouldn’t be confused with the "it needs to be done because slipping will look bad in my record and I’m willing to sacrifice the well-being of the entire team to meet some date" situation.

    On the other side being on track can also be a bad thing. Nowadays (if not always) the software projects meeting deadlines and staying on track are rather suspicious than normal. Let’s look at the hypothetical story about two teams:

    There are two development teams: A and B. Both are working in parallel on some COM-based software. Developers in team A decide that smart pointers are good thing and they’ll be using them consistently across the project (for the sake of the example let’s not start arguing about smart pointers ;-)) Developers in team B decide that Real Programmers know exactly when to call IUnknown::AddRef and IUnknown::Release and smart pointers are for losers who don’t know how COM works and what they’re doing. Two months after starting the project both teams are feature complete, but when it comes to the bugs then team A has a way less bugs than team B. Team B is constantly struggling with reference counting issues. As a result of this team B spends next two weeks working on nights and weekends, team leads order food for the team every day at 06:00 PM, e-mails about successfully fixing something are sent out at 02:47 AM etc. What kind of perception is generated to the outsider about team B? Mostly the good kind of perception. Team B is committed to shipping the product, they put in heroic efforts, they do whatever it takes to meet the deadlines, and they’re “old school“, yeah. As my manager once told me after I was too cynical during one meeting and later wondered why people complained about this: "perception is a reality" ;-) If you’re insider then you may know that team A has succeeded because they made IMHO better engineering decision and while team B has been fixing bugs, they’ve been doing something else which helps to generate the revenue for the company.

    Game 2

    Dependencies are generally a very bad thing. Joel on Software contains a nice general article about this. In our product we have been battling with dependencies for years now. The most complicated dependencies based on my personal experience are:

    • You are relying on 3-rd party component. That’s it, the code is out of your control, the design is out of your control, and you don’t know what’s going on inside the component. Getting something fixed isn’t just: "run cdb; look at the source code; figure out the problem; fix it;" Dependencies bad, external dependencies very bad. Probably everyone who ever has used any components for doing either SSL/TLS related stuff, generating database reports, having a fancy calendar control etc. knows what I mean.
    • General libraries, base classes, frameworks etc. Everyone is using ATL, CLR, CRT, .NET Framework etc. But what happens if something in these libraries is blocking shipping your product and there’s no feasible workaround? The good thing about being at Microsoft is that pretty much everything we use is built by some other team ;-) You have a problem with CLR - the answer is just an e-mail away.
    • Bugs in OS. Nothing is ever perfect. The same thing as under previous bullet-point applies here. With dependencies and problems like this there’s not much you can do. Yes, the feature can be cut and some hectic workaround can be implemented, but it still will take a time and increase the risk of breaking something else.

    In my pre-Microsoft life somebody once suggested replacing one 3-rd party component with another because the project dates were quite critical. The suggestion was made about two days before the final date ;-) Everyone with common sense will understand that why a month before shipping you can’t start replacing the building blocks of your product.

    Game 3

    This is a situation where the only thing to do is to say "guilty as charged". I tend to divide this particular situation into two different categories:

    • You have somebody junior reporting to you who made an estimation mistake. Usually all the estimates are rigorously reviewed and reviewed once again and then reviewed again. IMHO in case of junior people it’s 50% the manager’s fault also - insufficient tutoring, reviewing, not assigning a proper mentor etc.
    • You have somebody senior who makes constantly estimation mistakes. Well, in this case you have a problem in your hands.

    One can ask that why I’m spending so much time talking about this? Mainly because of the similarities to chaos theory. Everyone has heard the term "butterfly effect". Let me give you an hypothetical example: "Let's assume that I ask somebody how much the task A will take; person replies that task A will be done by Monday morning; I ask for the confirmation, person confirms once more; I go and tell to my manager that we’ll complete something by Monday morning; my manager passes the good news to somebody in upper management who has conference call with external partners in three hours from now; news are passed to partners that they’ll get some results by Monday morning; task A takes longer than expected because of the incorrect estimates and we actually complete it by Tuesday EOD; on Thursday two vice presidents (ours and partners) talk and one happens to mention the fact that they can’t deliver B on time because A was late and therefore B will be late." You can imagine the rest ;-)

    Read more...

  • Lectures from the authors of RSA available online (Turing Award 2002)

    What normal people usually do on Sunday evening? I bet that anything except reading the recent issue of ACM "Software Engineering Notes" ;-) One cool fact that the latest issue mentions is that ACM’s 2002 Turing Award lecture materials are now online. Three presentations from the authors of RSA encryption algorithm are available:

    • Dr. Leonard M. Adleman on "Pre RSA Days"
    • Dr. Ronald L. Rivest on "Early Days of RSA"
    • Dr. Adi Shamir on "Cryptology: A Status Report"

    If you have been ever fascinated about the cryptology and related security issues, I would strongly recommend viewing these talks.

    One piece of information (which is also mentioned in this talk) many security enthusiasts may not know is that two of the most famous public key encryption methods: Diffie-Hellman protocol and RSA weren’t actually discovered by the people by whom the protocols are named ;-) Both Ross Anderson’s "Security Engineering" and Steven Levy’s "Crypto" give credit to the following GCHQ employees: James Ellis, Malcolm Williamson, and Clifford Cocks. Check these books or Wikipedia for more information and longer explanations.

    Read more...

  • Bastard Code Reviewer from Hell

    Prologue

    When somebody will ask about the relationship between me and Un*x then I would have to admit that my alma mater was/is rather the Un*x place than Microsoft place. This also means that I “grew up professionally” while reading all these stories about Bastard Operator from Hell ;-) Times have changed and now I’m not reading fanatically Richard Stevens’s books anymore and honestly I don’t even remember the last time when I wrote any code that used fork() system call in it ;-) The frightening image of BOFH is stuck in my memory though.

    Current day

    Reading the actual product source code is one of the things I think every developer should constantly be doing. This is IMHO practically the only way to learn good coding habits, design tricks, learn how to avoid the pitfalls, increase an understanding about different error handling strategies, pick up a good programming style, understand how software is “really made” etc. When it comes to the code reviews then IMHO they’re one of the most effective ways to teach people how to write a good code. I won’t talk even about the aspect of preventing the defects. Check your favorite software engineering handbook for the references to actual studies and the fact sheets about the usefulness of formal code inspections.

    At one beautiful evening I took some time to go through the entire source code base of one of our components. Generally we’re in pretty good shape: we have been automatically running PREfast, PREfix, and some other static source code analysis tools on our code base on a weekly basis. Also the FxCop gets automatically run on our assemblies on a daily basis (majority of our code product code is written in C#). Starting next milestone we’ll be also having certain amount of “code cleanness” as one of the requirements prior to making any changes to the code. As a result of this, when it comes to the usual coding defects: memory leaks, non-checked error codes and parameters, buffer overflows, un-initialized variables etc., we’re in pretty decent shape. We analyze the warnings regularly and have succeeded to keep our code base quite clean from warnings during last couple of years. OK, after this amount of bragging is now time to shut up and talk about the real problems also ;-)

    While reading the code there’s still some amount of issues which are left for the diligent reviewer to rave about. Here’s a limited example list of problems I’ll be closely watching while the code for the next milestone will be written. Or if I'll get a life then hopefully somebody else will be doing this ;-) The component I’m referring to is written in unmanaged code, so here are my nitpickings. One of the things I’m quite passionate about is keeping only the code which is necessary to get the job done i.e., the code is perfect when there’s nothing to remove anymore ;-) E. W. Dijkstra once said: “My point today is that, if we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent”: the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.” IRL most of people quote it usually as: “Every line of the code isn't the asset but the liability.”

    Passing objects by value instead of passing them by reference

    There are lost of places all over the code when both small and large objects are passed around by value. This is no good and if even Scott Meyers bothers to mention this as one of the ways to make your code more effective then everyone should take this into the account.

    HRESULT Foo(CComBSTR strHeader, CComBSTR strMessage)
    {
       ...
    }

    It’s not so much work to fix this. Even if you don’t care about the const-correctness then just add an ampersand:

    HRESULT Foo(const CComBSTR& strHeader, const CComBSTR& strMessage)
    {
       ...
    }
    
    Pointless temporary variables
    HRESULT WrapperForFoo(DWORD dwSomething, const CString& strSomething)
    {
        HRESULT hr = S_OK;
    
        hr = Foo(dwSomething, strSomething, TRUE);
    
        return hr;
    }
    
    HRESULT Foo(DWORD dwSomething, const CString& strSomething, BOOL fSomeFlag)
    {
        ...
    }

    If somebody can tell me why we should use the temporary variable, please let me know. For the debugging purposes? Just set a break-point in ‘Foo’. Otherwise we end up having five lines of code instead of one. I won’t even mention that we should probably just inline this wrapper function.

    Confusing assertions
    BOOL DoSomething(int idSomething)
    {
        _ASSERTE(idSomething);
    
        ...
    }

    When looking at the code then at least I’m confused. Can ‘idCtrl’ be also negative or only positive? This assertion doesn’t give me any additional information about the assumptions we're making. Change it to:

    _ASSERTE(0 < idSomething);
    if only positive values are allowed or add the comment to the code or make the assertion to produce some meaningful message, but do something different.
    // We allow negative identifiers also.
    _ASSERTE(0 != idSomething);
    
    My favorite
    CFoo::~CFoo()
    {
        ...
    
        if (NULL != m_pBar)
        {
            delete m_pBar;
            m_pBar = NULL;
        }
    }

    I wrote about this last month. This is just so unreasonable. Why write one line of code when five lines will do it ;-) Also, why set the member variable to NULL in the destructor. If somebody knows, please let me know.

    Epilogue

    As usual, feel free to ignore my rambling. First of all a good case can be made that one shouldn't spend so much time to worry about these little things and do something useful for change ;-) Some people can view this as another “Vi v. Emacs“ or “tabs v. spaces“ discussion and think that hunting down minor problems like this is a waste of time. I tend to kindly disagree. Using the same logic we can say that it doesn't matter if some kind of memory location is off by a couple of bytes, the pointer points close enough to the actual target ;-) Well, I guess that being bothered by issues like this is just a part of my professional cretinism.

    "Hello?" I say
    "Can you review my code?" he asks
    "Sure, why not." I agree pleasantly
    "Well, I have this interface ..."
    I interrupt him
    "Do you always check for the proper buffer size before using strcpy()?"
    "No, why somebody will ever call my public interface with the wrong buffer size" he says innocently
    They never learn...
    "Stay where you are. You're found guilty in violating the Buffer Overflow Prevention Policy. The Code Police is on their way to arrest you..."
    Over the phone I hear him running away

    Read more...

  • Past-due software projects, can something be done about them? (Part I)

    Anyone who has ever read CHAOS report by Standish Group or participated in at least one software project knows that Murphy’s law works – if anything can go wrong, it will ;-) Like Fermat’s Last Theorem puzzled mathematicians for centuries, the problem of estimating correctly the time to complete any given software project has been extremely difficult to solve, although most of the people in software industry battle with it every day. In spite of the fact that Fermat’s Last Theorem has been finally solved and mathematicians are cheering, when it comes to delivering software on time, it’s still very hard to see the light at the end of the tunnel. Being also a chronic pessimist (though I rather prefer to be called a skeptic ;-)) I doubt also that we’ll find some silver bullets anytime soon. To summarize: when software engineering as a separate discipline was established, things were bad and they aren't much better now ;-(

    But enough of being negativistic and let’s better take look at what mortals like you and I can do to keep the wheels of software industry running more smoothly. For myself I’ve set two simple goals:

    1. Being able at any given moment to have an adequate understanding where the software project stands and where it's going. Is it on track? Is it late? How much is it late? What can be done about this? Who is doing something about this? Does it actually make any sense to strive for our initial dates? Do I see any potential disasters in nearby future? This has been proven to be extremely useful mainly to avoid cases like this: you have 6 months for accomplishing something; every single week everyone reports that they’re on track; a couple of days before the final deadline one of the team leads says that there’s a slight problem and it’s going to take at least three weeks to meet the final set of requirements. This is a variation of "I’m done with coding; now I only need to debug, test, fix some bugs, and refactor the code a little bit." Unfortunately situations like this happen more than anyone wants to admit ;-( Fortunately in my team there’s something that we call "management by no surprises" and believe it or not - we don’t shoot the messenger for giving realistic status updates. Otherwise I would have been demolished a long time ago (most of the news I deliver start with the phrase "There seems to be a problem with ...").
    2. Being able to analyze, document, and understand all the issues which caused something to be late. What are exactly the things which caused us to be late? How did they happen? What was the root cause? How can we prevent these things from happening in future? For me this has proven very helpful to understand where the actual problems lay and do something about them. Please don’t confuse this with the interrogation style root cause analysis á la "Now we need to find somebody to point finger at and claim it’s all his/her fault! You, Alice, what do you think, whose fault it is?" I still believe that making any of the mistakes more than one or two times is the sign of incompetence and one should spend significant amount of time analyzing his/her failures and learn from them.

    Attaining even only these two goals isn’t as trivial as it may sound. Because of being classical Virgo by birth, I have these strange habits of trying to categorize, classify, and label everything ;-) Here is the list of major categories of problems I’ve encountered while being responsible for completion of different software projects. I call them actually "different type of scheduling games software engineers play", mainly because of this book:

    1. Game 1. Everything’s on schedule, but suddenly we need to do some additional work (add a new feature, own an additional area of responsibility, comply with some additional coding standards etc).
    2. Game 2. Everything’s good in our team, but because of some dependencies or unforeseen events the project will take longer than expected.
    3. Game 3. We tried to do our best while estimating, but made a number of mistakes and now it’s going to take longer than we initially thought.

    I’ll write about my strategies when playing these "games" in the second part of this post, but first it’s essential to see what major possibilities there are dealing with software projects which are late:

    1. Just move the end date. Nobody likes this solution even if it could be justified (market conditions change, the end date isn’t as important as it used to). Moving the end date is generally viewed as a sign of incompetence and admitting the fact that errare humanum est ;-)
    2. Add more resources (people). Every decent engineer knows the Brooks Law - adding manpower to a late software project makes it later, but there are exceptions to every rule and one should use common sense to determine when it’s appropriate or not. I’ve seen the Brooks Law uphold in most of the cases, but I’ve also seen the cases where adding manpower has been extremely successful. For example: in case of our product we achieved reasonable amount of happiness with involving a developer from not our core team to develop some piece of GUI for us. If a person knows already ATL, C++, and Win32 GUI programming then writing isolated GUI isn’t so complicated task and it doesn’t require deep knowledge of how server code actually behaves. Of course this type of “outsourcing“ won’t work in case when we would take a person not familiar with our architecture and ask him/her to optimize the way our error handling model is designed ;-)
    3. Make people work harder. You all know what this means. One morning you wake up and understand that there are no more clean clothes at home, refrigerator is empty, and you don’t know what’s been happening in world for last couple of weeks. I have possibly too much "Marine Corps" mentality in me, but I think that for the necessity of developing one’s character and world view every software engineer should participate in at least one Death March project. Believe me; it’ll make you love the time you’ll spend on estimating your project completion date next time ;-)
    4. Cut features. Sometimes the end date is extremely important. You need to ship with some other product at the same time. You need to ship as a part of a bigger product. Shipping the product is extremely important to success of a company overall. Shipping a product is important to get first into the market.
    5. Cancel the project. Refer to the CHAOS report mentioned earlier. Seems to happen quite often in software industry.
    6. Lower the quality. I usually tend to say that either we spend an hour to fix some problem now or we’ll spend a week fixing it later. Everyone in software industry experiences this probably every day. Yeah, let’s not handle these error codes or exceptions because we know this error can’t happen, it takes some time to write this code, and if it happens then something is really wrong and our product wouldn’t work anyway. Two weeks later one of your key customer tries to install your product, setup terminates and doesn’t say anything. Key people are pulled off their projects, spend day to investigate the problem only to understand that bad things can happen. Fix is made and the new version is deployed to customer. Time spent: order of magnitude more than it would have been for Alice or Bob to write an error check which would help customer quickly identify the problem.

    To be continued...

    P. S. Though I consider watching TV as wasting time, I have to admit that premiere of “The Shield: Season 3” today was fabulous ;-)

    Read more...

  • Complete Idiot’s Guide to opening bad bugs ;-)

    Unfortunately the Complete Idiot's Guide site is currently being reconstructed, so if you don’t know what it is, you would have to search elsewhere ;-) But first two major terms to get started:

    • 'Bug' - an unexpected defect, fault, flaw, or imperfection.
    • 'Triage' - the assigning of priority order to projects on the basis of where funds and resources can be best used or are most needed.

    Depending on the phase of the milestone, I spend from 1-3 hours daily while attending the triage meeting. This time is spent mainly on analyzing and understanding the new bugs, changing the priority and severity of bugs, asking additional clarification questions from people who discovered the issue or are fixing it, accepting or declining the bug fixes, looking at the bug metrics etc. To summarize: I spend lots of time dealing with bugs every day. After doing this for years, I’ve reached the conclusion that I’m finally ready to write on how to report really badly written bugs in the bug tracking system ;-) Here’s the short list of things you must IMHO do to report a bug which will cause triage team to be confused and force them to ask additional questions from you. Just pick one thing and this will generally cause the desired effect:

    1. Title. Make sure that your title won’t capture the essence of the problem and its importance. Make it two or three sentences long, include contradictory statements, and give as much fuzzy information as possible. This is a good bad title: “When I do X then something seems to go wrong or I receive strange results.” Especially good titles can be determined by the following experiment: more than ten people read the bug title a couple of times and nobody can understand what this is all about.
    2. Steps for reproducing. Best thing to do is not to include them at all. Or specify only two minor steps and then omit the important information like build number or what you actually did. Also don’t mention irrelevant facts like MS Office being installed on the test machine, missing QFE-s or the fact you plugged out the network cable before the problems started to happen.
    3. Customer impact. Avoid in all cases. Just don’t think about how it’ll affect the customer or any major user scenarios. Additional suggestions include making the bug title looking really scary, but forgetting to mention that the customer will be impacted only if all planets are aligned, there are more than three people riding the bicycles outside the building, and your neighbor’s coffee machine must be running for the bug to occur.
    4. Priority. This is a good one also. Cool thing you can do is to make a title look like something really bad is happening and then set the priority to the lowest value as possible. Believe me; this will stun everyone present at the triage meeting. The opposite is also true. Good bad bug is finding the missing comma in the documentation and assigning the highest priority possible to this bug.
    5. Additional information. In case when for example AV (access violation) happens then make sure not to get any stack traces. Avoid attaching debugger to the process, loading symbols, getting a minidump, and alerting a relevant developer. The coolest thing to do is to make a screenshot of AV and include this as the only piece of information in the bug. Don’t even try to look at the source code or try to do the initial diagnosis yourself.
    6. Rare performance and stress bugs. When something which usually happens once a month occurs in your workstation then don’t store any logs, don’t take any notes about what happened, and don’t keep the machine around for somebody to investigate the root cause. Rebooting the machine is definitely the best thing to do or even reinstalling the operating system (just in case).
    7. Misspellings. Every good bad bug must have decent amount of misspellings. Make sure that bug description, title, and steps for reproducing include lots of misspellings. This speller thing is overrated anyway.
    8. Description. Copy and paste hundreds of lines of log file or NT Event Log events to the bug description field. There’s never enough of a good thing. Good approach is also writing an entire essay about the bug. The description should be long enough to make sure that it’ll take at least 10 minutes of reading to get through it.
    9. Blocking issues. When you have an issue which will potentially block something or someone for days or cause team not to meet deadlines, mention this little fact somewhere where nobody will ever notice.

    These are my personal Top 9 issues. Everyone is welcome to propose an additional tip to achieve Top 10 ;-)

    But to get back from my trip to the sarcastic side, let’s look at why reporting good bugs is extremely important. The main issue here IMHO is the time and the fact that everyone knows: time = money, especially in the software industry. Wasted time can reveal itself under various circumstances:

    • If there’ll be 15 people attending the triage meeting and because of poorly written bug we spend 10 minutes discussing this issue, the total amount of wasted time is 2.5 hours. Convert these 2.5 hours to the money and you’ll see. Now think about how much poorly written bugs you triage every day …
    • A developer tries to reproduce the problem for a couple of hours and fails. He/she goes back to tester only to discover that you have to have this “other application” running at the same time to make the bug to reproduce.
    • It can even get worse. Rebooting the machine where extremely rare timing related issue occurred will manifest itself two months from now when one of the key customers will encounter the same issues in the production environment and the engineer needs to be flown to customer site to troubleshoot the problem.
    • Finally, everyone's favorite. Not bothering to investigate the root cause of the problems leaves one critical buffer overflow to be unnoticed. Three weeks later MSRC (Microsoft Security Response Center) issues another bulletin ...
    • ...

    One of my favorite software engineering gurus, Frederick Brooks, once said: “How does a project get to be a year late? ... One day at a time.”

    Read more...

  • Another pitfall to avoid while using assertions

    Yesterday evening I was archiving my old e-mails, deleting obsolete ones, and squeezing any useful information out of them. While doing this I encountered one e-mail talking about non-trivial pitfall with assertions. To summarize, one of our developers wrote code which looks something like this:

    DWORD SomeFunction(...)
    {
        ...
    
        if (NULL == ::SomeWin32APICall(...))
        {
            // Let's assert in debug builds and make some noise.
            _ASSERTE(!"This can’t happen unless something is very wrong.");
    
            // Return the last error code value to caller.
            return ::GetLastError();
        }
    
        ...
    }
    

    On the first look this seems like a perfectly legitimate and reasonable code, but let’s not forget what GetLastError() does ;-) Per MSDN: “The GetLastError function retrieves the calling thread's last-error code value.” Is the last error code generated by “SomeWin32APICall()”? No, _ASSERTE does lots of stuff internally, including calling some other Win32 functions, so the original error code is lost unless you store it away somewhere.

    Quite easy to verify:

    #include <crtdbg.h>
    #include <cstdlib>
    #include <iostream>
    #include <windows.h>
     
    int main()
    {
        HANDLE hFile = ::CreateFile("/foo/bar/baz",
                            GENERIC_READ,
                            FILE_SHARE_READ,
                            NULL,
                            OPEN_EXISTING,
                            FILE_ATTRIBUTE_NORMAL,
                            NULL);
                
        DWORD dwOrig = ::GetLastError();
        
        if (INVALID_HANDLE_VALUE == hFile)
        {
            std::cout << "::GetLastError() returned: " << dwOrig << std::endl;
            _ASSERTE(!"INVALID_HANDLE_VALUE == hFile");
    
            DWORD dwLatest = ::GetLastError();
            std::cout << "::GetLastError() returned: " << dwLatest << std::endl;
        }
    
        ...
                
        return EXIT_SUCCESS;
    }

    If we're talking about how to prevent this issue, then one could probably write a simple awk/Perl/YourFavoriteScriptingLanguage script to search for this pattern. Discovering even one mistake like this will be probably less costly than writing the script itself ;-)

    Read more...

  • Honestly estimating time available for software development/testing tasks

    Prologue

    Microsoft Speech Server 2004 is set to launch and I'm looking back on how it was to work on this product. For last 2.5 years my full-time job has been to carry out the duties of PHB. In addition of writing code now being a guilty pleasure for me ;-), this also means that I spend fair amount of time dealing with things like:

    • Giving the estimates for certain tasks,
    • Changing these estimates based on some events,
    • Justifying my estimates to anyone who asks ;-),
    • Providing constant status updates to my superiors about the status of the project, and
    • Comparing my initial estimates with the actual time spent and trying to understand what I missed while doing my initial analysis. This is usually the phase where I beat my head against the wall and yell: “How didn’t I think about this earlier? Bad Gunnar, bad Gunnar!“

    When I'm being asked to estimate anything then one of the first things I usually do is try to figure out how much actual working time I really have? I know it sounds a little bit naïve. This isn’t a complicated problem, just count the days in calendar, right? Month is a month of development. Two months are two months of development time and so on. Well, actually it’s more complex than it sounds. Software estimation in general is a tricky beast. I’ve spent pretty noticeable portion of my life working nights and weekends because of unrealistic schedules or the estimation mistakes I made because of my own incompetence. Edward Yourdon's “Death March“ is a book I try to reread every four-five months or so to motivate myself to spend enough time on just thinking before doing anything ;-)

    Honest estimates

    There’s something I call “honest estimates” i.e., realistic estimates about how much time there’s really in schedule for getting the real work done e.g., how much time developer really has to design a solution, write code, debug something, fix bugs etc. I don’t use anything fancy (ever heard of COCOMO II?), just common sense to figure out the amount of available resources. Is there anything new I have to say about this subject? No, just some pragmatic recommendations which are proven to be successful in my immediate workgroup. Bertrand Russell once said: “Most people would rather die than think: many do.” The first thing I’m negotiating with my superiors is trying to establish what expectations are made in regards to working hours. I usually try to negotiate towards 8h per day and five days a week schedule. Organizational culture may vary, but to be able to estimate correctly anything at all, you need to get sign-off from people to whom you report that they agree with your assumptions. Working 40h a week may not always be the case, sometimes 60h is expected, sometimes 80h is expected etc. It depends. The main thing to take away from this is that everyone in your organization should be clear what the expectations related to working hours are.

    Enough theory, let’s get down to business. To better understand how I do my basic estimations; let’s look at the hypothetical development team consisting of Alice, Bob, and Eve (crypto people, sounds familiar?). Let’s say that we need to estimate how much each of these individuals will have time to do the real work during March 2004. So, I start out by writing down how much calendar days we have in month and how much of them are weekdays:

      Alice Bob Eve
    Calendar days 31 31 31
    Minus weekends 23 23 23
    Version 1

    Simple thing, but I just wrote down an assumption that nobody will work on the weekends. Next thing I do is check for following: a) Do we have any official holidays in March? Let’s assume that March 12th is for example Elbonian holiday and the entire team will get a day off. b) Does anyone have any vacation plans? Let’s also assume for the sake of the exercise that Eve wants to take a week off and visit her parents (from 03/15 till 03/19). This gives us the following:

      Alice Bob Eve
    Calendar days 31 31 31
    Minus weekends 23 23 23
    Minus holidays 22 22 22
    Minus vacation time 22 22 17
    Version 2

    What else is time not spent working? Sometimes people take professional training sessions to learn new skills. For example Bob may decide to take a 2-day training course (from 03/08 till 03/09) about how to write secure C# code. As his manager, I fully support this idea ;-) Also I may notice that as a morale event the entire division is scheduled to go to the movies on March 25th. This is a time not spent writing product code. Let’s adjust our estimates once more:

      Alice Bob Eve
    Calendar days 31 31 31
    Minus weekends 23 23 23
    Minus holidays 22 22 22
    Minus vacation time 22 22 17
    Minus training 22 20 17
    Minus morale events 21 19 16
    Version 3

    Now I start thinking about how much meetings we actually have in our team? Usually there are: a) A weekly 1:1 session between individual contributor and his/her manager (let’s say it takes 1h) on Monday. b) Weekly team meeting (another hour) on Tuesday. c) Bi-weekly meeting for bigger workgroup (yet another hour) on Friday. For Alice this means five 1:1-s, five weekly team meetings, and three workgroup meetings – total 5×1 + 5×1 + 3×1 = 13 hours of time spent not working in her office. For Bob it means four 1:1-s, four weekly meetings, and three workgroup meetings – total 4×1 + 4×1 + 3×1 = 11 hours of time spent not working in his office. For Eve it means four 1:1-s, four weekly meetings, and two workgroup meetings – total 4×1 + 4×1 + 2×1 = 10 hours of time spent not working in her office. Let’s say that all these meeting won’t take a full hour and round the time down as one working day lost in meetings and preparations for meetings. This gives us the following:

      Alice Bob Eve
    Calendar days 31 31 31
    Minus weekends 23 23 23
    Minus holidays 22 22 22
    Minus vacation time 22 22 17
    Minus training 22 20 17
    Minus morale events 21 19 16
    Minus meetings 20 18 15
    Version 4

    This is where I usually stop and say that the precision achieved is a good enough for me. There are a number of other things which I could theoretically take into the account:

    • Business trips.
    • Possible power outages in the building.
    • Sick days.
    • Time off because of personal reasons.
    • Unplanned meetings.
    • ...

    I’ve taken the approach that I count these events as non-working time when something actually happens and then readjust my schedule accordingly. This has proven to be good enough. Summa summarum, currently I have something with what I can go to my superiors and say: “Actually we don’t have a full month – Alice has 20 weekdays available, Bob 18 weekdays, and Eve only 15 weekdays available. Therefore I can only get done this much.” It's kind of hard to argue against these basic facts. Fortunately I’ve been extremely lucky in this sense that because of some strange reason, people I work for have agreed with my style of estimation (which is just common sense and nothing else). Skeptic reader can have a question about this approach working in real life? The track record of my team shows that we haven't missed a single date during last two years i.e., everything we promised to accomplish by certain date, we did. There were cases when we were forced to readjust the dates because of the reasons not directly under our control, but we were able always to provide heads-up to relevant parties' months ahead of a time.

    Problems with this approach

    First time I went to the review meeting and presented my estimates, I was very scared ;-) Imagine telling some number of influential people that you will actually get done twice as less as everybody is expecting you to do. The best approach which helps to overcome the tension is being able to justify every damn thing you say and being able to offer alternatives. For example:

    • If we would cut Bob’s training then we can get feature A implemented, but Bob won’t probably be very happy with this.
    • We can assume that during March the team will work 60h a week, but this will have an impact on morale and work-life balance.
    • We can cut the bi-weekly workgroup meeting and therefore have more time for debugging.

    Epilogue

    This was just scratching the surface when it comes to software project management ;-) In my personal bookshelf there are lots of scary looking books covering related subjects in very detailed manner and I don't want to retell them. My main goal will be to write some posts about the things which seem to be working for my team and are simple enough to cover in blog.

    Read more...