Punch it in the nose - if it stops bleeding, punch it harder
This quote is actually from one of the internal presentations about the solid software design and quality. Personally for me it describes exactly the attitude I’m looking for in people working with me - never be happy, never be satisfied, there are always ways to break the product ;-) When you’re working on a server product then most of the things we do are about the quality. In server world quality is everything. Nobody would want to use a database server which needs to be restarted every 6 hours; nobody would like to bet success of their business on a telephony server which stops accepting calls during the peak hours etc. Therefore having people in your team who care about the quality, who are doing everything in the name of quality, and who use product quality as a primary factor while making their decisions, is extremely important.
It's all about the attitude
This is correct, IMHO it's mainly the attitude what determines which people will be successful in their positions are which ones won't. One of my peers once mentioned that most of our top performers seem to be constantly dissatisfied because they always have ideas about how to do things better, how to be more productive, how to find new categories of bugs, how to write better code etc. After thinking about this I tend to agree with him. Yes, you heard me correctly - I like “unhappy people” ;-) because IMHO they are the ones who are responsible for the progress. I usually tend to say that we all should celebrate the fact that for example Thomas Alva Edison wasn't satisfied with the way things were, otherwise we would still read books using the candle light ;-)
STE with the right attitude?
Let me just give you an example how IMHO good STE-s should think. To be crystal clear, this is my personal opinion:
- ...
- Hmm, what's this? New design document for our cache manager. Cool. Let’s see ... this design can be done much better. Why can’t we use separate heap to keep all the tree nodes together? Also, these two methods shouldn’t be public.
- Bingo, I see new check-in e-mail, apparently some fresh code got checked in. I’ll go and read through every line of new code, I’ll check that every error condition is handled, there are no memory leaks, no obsolete API-s are used, I’ll run all the static source code analysis tools known to me on this code, I’ll step through every line under the debugger and see what happens.
- Today is my lucky day - I have some time to play with the debug build. Let’s see if any assertions will pop up, let’s see what’s written in the debug output. And if I’ll have more time later then I’ll install debug version of operating system and I'll see what happens during running our product.
- I finally get it - AppVerifier is my best friend! Therefore I'll run very often applications under AppVerifier and I’ll investigate every little problem I see.
- Fresh piece of documentation. Let’s see, first I look if there aren’t any broken links. Then I’ll run the content through the spell-checker, and then I’ll spend some time thoroughly reading the documentation and reporting every mistake.
- New GUI. How wonderful! I’ll go and see if anything is against the “Official Guidelines for User Interface Developers and Designers”. Any misaligned button, any font with the wrong size, any deviation from the guidelines and I’ll open a bug about it.
- This new API call I have to test looks pretty cool! It takes name of the file as a parameter and look, here's one writable buffer. OK, where are my favorite books: “How to Break Software”, “How to Break Software Security”, and “Writing Secure Code”? I’ll apply every attack possible. I'll overflow every buffer and every integer. I'll just break the stuff!
- ...
To summarize this: intellectual destructivity sounds like a right term to use ;-) One thing to watch out for is not to strive for the perfection. For example: taking 2 days and opening bug per every comma in documentation is a good thing to do, but can we use this time for something more important? Taking an API and generating 12,000 test cases for all the possible combinations of parameters and starting implementing them is cool when we would have an infinite amount of resources and time, IRL we unfortunately can't do this. Spending a day from development schedule to make sure that all member variables are prefixed with m_ will stylistically look good, but can this time be better used for fixing some customer issues? You get the point - always second-guess yourself, always prioritize, always use common sense.
SDE or SDE/T with the right attitude?
Here it gets tricky - after finishing the draft of this post I ended up with two pages of text about this paragraph alone. As this post is already long enough and writing boring and long essays isn't the thing I want to do, I'll morph this paragraph into separate post by EOW.
Success story
About 7-8 months ago we established a zero tolerance policy when it comes to the test case failures. The logic we used was very simple: when test case fails then it’s either a bug in the product or a bug in the code which exercises the product or problem with hardware or underlying infrastructure. In all the cases there needs to be bug opened in our internal bug tracking system. All the failures must be investigated. Every one of them. Every day. Every time. Half of the people reading this are probably laughing now and thinking - “This is just so basic, why you even need to worry about this?“ Sounds easy? Actually it's a little bit more complicated. Number of test cases is usually in thousands, all of them are executed on daily basis using all the possible editions of our product and on all the operating systems we support. Sometimes lots of them start to fail and you need to know ASAP what exactly caused these failures?
Anyhow, after lots of pain (which I won't describe) we finally did it. It took enormous amount of engineering discipline, but it paid off very well. We found a number of bugs in:
- Different components and libraries we rely on.
- Development and test tools we use.
- Our test cases - some cases were outdated, some were not applicable anymore, and some just contained bugs.
- Our product of course ;-)
Simplified form of our knowledge base may look like this:
|
Failing test case |
Bug number |
| Do A and then expect B to happen. |
12345 |
| Do C and then expect D to happen. |
67890 |
| ... |
... |
Simplified test cases vs. bugs table
As a result we increased a reputation of our team noticeably (step closer to famous Black Team ;-)). After the moment we achieved a solid base data then every day is just investigating the differences in results from previous day. I and number of my peers hopefully sleep better because every day we know exactly where the product quality stands and if there are any problems then we know exactly what they are. There were a number of similar “little things” we did during the last year both in development and test organizations which took us step closer to being a solid server team.
Engineering ethics
If you have looked at Microsoft titles then you notice the word 'engineer' in them. We have for example titles like 'Distinguished Engineer', 'Software Design Engineer', 'Software Test Engineer' etc. One of the first things I usually try to explain to new people is that computer science and software engineering are two different things and what we do here is engineering. I won't go into the details why I think so, instead I'll point you to Steve McConnell's book “Professional Software Development (2nd Edition of After the Gold Rush)”. Fortunately Steve McConnell has the chapter called “Software Engineering, Not Computer Science” online available for reading. It begins with one of my favorite quotes from Fred Brooks:
A scientist builds in order to learn; an engineer learns in order to build.
Internally we have number of very useful software engineering guidelines available which describe the best practices and recommended efficient ways how to get the things done. I constantly refer to them whenever I have argument with someone about how to accomplish some particular task ;-) But back to the ethics part. ACM has something called "Software Engineering Code of Ethics and Professional Practice". I would strongly recommend for everyone to read through this at least once. We all assume that when our car comes back from the maintenance then people responsible for maintenance checked that the all the bolts in wheels are tight, brakes are working, oil level is normal etc. Same thing applies IMHO to the software engineering – one should do as much as possible to make sure that customers won’t suffer data losses (day worth of work shouldn’t disappear after your text editor crashes), buffer overflow which wasn’t fixed because of the lack of development resources enables cracker to gain access to customer data, deadlock situation which wasn’t investigated causes some other problems etc. Number of people have told me that I'm being very unrealistic assuming that all the developers want to follow the Code IRL. True, but then again, nobody ever promised that it's easy to achieve the quality.