Key to Duarte argument, and something I didn’t originally appreciate from his initial Tweets is: he is not saying story points are rubbish, forget about them. What he is saying is: it is simpler and equally accurate to just count the stories as atomic items. This is equivalent to saying “All stories are 1 point” and having done with it.
For a mature team with a good relationship with its stakeholders I could see this working well. However, for less mature teams (who have difficulty agreeing among themselves) or a team with bully-boy stakeholders (or bully boy anyone else for that matter) then I think being able to put a higher point score on the card serves as a useful warning mechanism.
Duarte says in his blog “the best predictor of the future is your past performance!” On this I couldn’t agree more. He then poses three questions - and answers - which I think are worth reviewing.
Q1: Is there sufficient difference between what Story Points and ’number of items’ measure to say that they don’t measure the same thing?
Here he finds there is a close correlation. I’m not surprised here, in fact I would expect this to be the case. Teams are encouraged to write small stories, in fact Scrum almost mandates this because work should be completely done at the end of a sprint. In effect there is an upper bound placed on the size of a story.
Actually, I’m not so keen on this rule. I allow work to be carried from iteration to iteration but I only allow points to be scored when the work is done. Thus I encourage stories to be completed in an iteration but I don’t mandate it. One of the exercises I do with teams on my courses actually sets out to illustrate this point.
At the very least I would expect teams to settle on an “average story size” implicitly. Notice also that the correlation applies whether all stories are of size 1 or of size 2, 3 or any other number. Its a correlation between two series of numbers.
However, given all this Duarte has a point: if your stories are clustered around an average size then you might as well count the stories.
Q2: Which one of the two metrics is more stable? And what does that mean?
Duarte’s analysis says that both stories and story points have similar standard deviation. Thus they are of similar stability. Since these two are closely correlated this isn’t a surprise. In fact, given the correlation, it would be a surprise if one was notably more stable.
Q3: Are both metrics close enough so that measuring one (number of items) is equivalent to measuring the other (Story Points)?
Duarte’s data seems to measure the same thing - again, if they are closely correlated then this is exactly what you would expect. You can write out the equation:
Story Points ~= (Correlation Co-efficient) x (Number of Stories)
(The ~= is supposed to mean approximately equal.)
With this out of the way Duarte moves on to consider Mike Cohn’s claims for story points.
Claim 1: The use of Story points allows us to change our mind whenever we have new information about a story
Duarte says: Story Points offer no advantage over just simply counting the number of items left to be Done.
I agree here. I’ve long encouraged teams to move away from story pointing work in the distant future. Yes I encourage them to story point some stories in the backlog - say a few months work - and story point tasks - for the next iteration. But for stuff that is “out there” or has just arisen my advice is usually: just assign it your average story point value.
In other words, assume your average story point value is your Correlation Co-efficient. When work gets close then estimate it traditionally, you might find the value changes. When it gets really close break it down into tasks.
Claim 2: The use of Story points works for both epics and smaller stories
Duarte says: there is no significant added information by classifying a story in a 100 SP category
Again I agree. To be honest I’m not a fan of Epics and while some of the teams I work with use them I often encourage teams to dump them. To me an epic is just a collection of stories around a theme.
Actually, what Cohn and Duarte are saying are not at odds here. Cohn doesn’t (seem to) make any additional claims. Its just a scaling question.
Claim 3: The use of Story points doesn’t take a lot of time
Duarte says: In fact, as anybody that has tried a nontrivial project knows it can take days of work to estimate the initial backlog for a reasonable size project.
Here I have issues with both Cohn and Duarte.
If you are estimating stories then it does take time. Fast as it is, even planning poker takes time. However there is also a lot of design and requirements discussion going on in that activity. Therefore I don’t see this as a problem. In fact I see it as an important learning exercise.
True, on a none trivial project it will take time to estimate a large backlog. But a) that is valuable learning and b) I won’t try. I’d either estimate it in chunks or I’d apply an average estimate to work which wasn’t going to happen anytime soon - see claim #2.
I deliberately delay estimation as long as possible to allow more information to arrive and because work will change. It might be changed out of all recognition or it might go away completely.
Claim 4: The use of Story points provides useful information about our progress and the work remaining
Duarte says: This claim holds true if, and only if you have estimated all of your stories in the Backlog and go through the same process for each new story added to the Backlog.
Again I agree. However I doubt the usefulness of the concept of “work remaining”. Its only work remaining if you think you have a lump of work to do. In my experience work is always negotiable. Its just that people don’t want to negotiate until they accept that they won’t get everything.
One of my clients has gone through the very expensive exercise of estimating all the work they might do. Earlier this year they realised they had 3000 points to do by Christmas. They also realised they had capacity to do less than 1000. This brought home to fact that they couldn’t do everything - something many people on the project had long known or suspected. The company are still working through this issue but at they are having the discussion now, in March and April not September and October.
Claim 5: The use of Story points is tolerant of imprecision in the estimates
Duarte says: there's no data [in Cohn’s book] to justify the belief that Story Points do this better than merely counting the number of Stories Done. In fact, we can argue that counting the number of stories is even more tolerant of imprecisions
Again I agree. But then, if there is a high correlation between story points and stories then this is self-evident. And again, as I said before: we need to work with aggregates and averages.
Claim 6: Story points can be used to plan releases
Duarte says: Fair enough. On the other hand we can use any estimation technique to do this, so how would Story Points [be better than counting the number of stories]
Again, agreement, and with correlation its self evident.
Duarte goes on to give a worked examples in which a project does not achieve the desire velocity and gets cancelled. His story makes no use of story points, simply stories. To be honest I’m missing something here. True the stories in his story have no points, but I don’t see where that makes a difference. What he describes is exactly the way I would play the scenario although I would have story points in the mix.
His conclusion: “Don't estimate the size of a story further than this: when doing Backlog Grooming or Sprint Planning just ask: can this Story be completed in a Sprint by one person? If not, break the story down!”
This is interesting because while Duarte is working at the story level this pretty closely models the way I advise teams to work. I always tell teams:
- “I’d like a story to be small, to fit in one iteration but that isn’t always the way.”
- “In my experience stories need to be broken down, both so they can get done but also so you get flow and as part of a design exercise.”
When a team have a feel for points then I will have them put points - the same units, just bigger - on Blues. For Blues that won’t be done for a while then I’m happy to assign averages, and for Blues that are further out then I’m happy to leave them unpointed.
So, thank you for reading my analysis of here and staying all the way.
My conclusion? I think I agree with Duarte but I don’t agree with him. I think I actually disagree with Mike Cohn but I’d have to go back and look at what he says himself.
I think the way I estimate with teams, the way I used to do it when I ran teams and the way I teach clients to estimate is more different than perhaps I appreciated. My method grew from my interpretation of Kent Beck’s XP planning game and velocity. However I now think my approach has drifted from this. The result of my experience, seeing what works and what doesn’t has refined my approach.
The approach I’ve ended up with has similarities with what Duarte describes but is also different.
Despite lots of authors attempts to describe Scrum/XP planning and estimating I still find myriads of minor variations. Some improve things, some not. Until this moment I’ve always believed my approach only differed in minor ways. Now I’m thinking....
And what of the Duarte’s harmful claim? Actually, although he uses the word in the title I don’t see any discussion of the harm they cause.
Story points might be pointless but do they do any harm? I don’t really see it, they may be a waste of time, there might be more effective ways of doing the same thing but that’s not the same as harmful. Story points might mislead, there is little evidence here so I’ll hold judgement on that one.
To be continued....