- Story size was difficult to estimate and kept crossing iteration boundaries.
- The programmers really didn't see the value of estimating stuff they weren't familiar with and in our planning poker session gave out lots of ? and 100 cards.
were spending to much time researching the time it would take to fix a bug
that we weren't going to work on for several weeks.
After reading posts by David Anderson and Amit Rathore I started to examine why we were estimating, how much time we were spending at it, the psychic cost, and the results. The results were not encouraging.
Not surprisingly the reason we were estimating was to help the business prioritize stories. If stories X and Y both had the same value to the customer, but story X cost 10x more than story Y (where cost in my world = time) then the business had more information to base their priorities on.
The next question was how much time and effort we were putting into estimates. It turned out that we had two extremes. Either we spent 2-3 minutes estimating a story, or several hours researching a defect (another name for a type of story), but not much in-between.
The psychic cost seemed to be directly related to the time spent estimating. The quick estimating (which used planning poker) often had a background rumble if "I don't really know" which caused some resentment when the estimators had to eventually pick a number. For the more detailed research the task switching had a level of frustration built in, but by far the frustration was that once the detailed estimate was created the work itself was almost complete. Instead they had to task switch back to the planned tasks resulting in more time wasted.
Our results weren't spectacular. The stories seemed to get either 100 points or less than 5. The new unknown work was 100 and the previously researched defects were less than 5. I'm not a big believer in tracking actuals, but it was clear that each 100 point story was not taking about the same amount of time. The defects were typically completed in about the time estimated.
Given this information one could easily conclude that we just needed to do more research so that the new features could be estimated more accurately. But wait a minute, why are we doing estimates in the first place? Are we estimating to create an accurate budget, or to price a bid? In our case we are not. This isn't to say that some people need to do these things, but we don't. Rather we are estimating to provide prioritization information.
I started to wonder if we were trying to be more precise in our estimates than we needed. Enter the concept of t-shirt sizing. Each story could be estimated as a small, medium or large story.
After a few discussion we determined that our smallest stories (mostly defects) really took no less than 3 days from start to finish. So I rounded up to a week and called that a "Small" story. The next boundary to identify was the "Large" story. We decided to call anything that takes more than 1 month a large story. So anything that was between 1 week and 1 month was a "Medium".
I chose the week and month boundaries because they are easy to communicate and part of our everyday language. Remember that the estimates are going to be used by non-technical managers for prioritizing so units like ideal days and gummi bears just get in the way.
Because we now have 3 simple possible outputs from the estimating process I didn't want to spend lots of time coming up with small, medium or large. Accordingly I set the upper limit on research time to 30 seconds. Why 30 seconds? Because for the most part 30 seconds is enough time to develop a gut feel and a gut feel is close enough for the intended use.