It unfortunately seems that there are many more ways to get software development metrics “wrong” than there are ways to get them “right,” so be forewarned that much of my advice here will be about what not to do rather than what to do. Nonetheless, here are a baker’s dozen worth of tips on how to avoid the worst metrics pitfalls and produce constructive numbers.
Don’t Confuse Product Development with Manufacturing. While there are similarities, product development processes are in some ways different from manufacturing processes: most fundamentally, the goal of product development is to develop something new (to create meaningful variation), while the goal of manufacturing is to build the same thing over and over again (to minimize or eliminate variation). If you approach metrics for software development expecting the process to be perfectly repeatable and mechanical, then you are very likely to end up looking for love in all the wrong places.
Fudging the Numbers. Robert D. Austin, in a revelatory little book titled Measuring and Managing Performance in Organizations, points out that there is a difference between informational measures — data collected to help analyze a situation — and motivational measures — data collected and reported in order to influence people’s behavior or motivate them to achieve higher levels of performance. He also points out that this difference is especially important if the people whose performance is being measured are the same people being asked to report the data — a frequent situation in software development. Austin talks about the fallacy of the “dashboard” analogy.
One frequent analogy casts the manager in the role of an airplane pilot guided by organizational measures that are like cockpit instruments.
Mechanistic and organic analogies are flawed because they are too simplistic. Kaplan and Norton’s cockpit analogy would be more accurate if it included a multitude of tiny gremlins controlling wing flaps, fuel flow, and so on of a plane being buffeted by winds and generally struggling against nature, but with the gremlins always controlling information flow back to the cockpit instruments, for fear that the pilot might find gremlin replacements. It would not be surprising if airplanes guided this way occasionally flew into mountainsides when they seemed to be progressing smoothly toward their destinations.
In other words, if you devise a motivational metric (and don’t kid yourself — all metrics sponsored or endorsed by management are inherently motivational) that is dependent on accurate data entry and reporting from the people being “motivated,” don’t be surprised or shocked when they are subtly or not-so-subtly influenced to fudge the numbers to make themselves look good.
Maintain Balance. Austin also points out that, if you’re not careful, you may end up with unbalanced measures that push your practitioners too far in one direction or another. Let’s say, for example, that you launch a new Quality program with the laudable goal of achieving zero post-release defects of any kind or severity. Sounds like a perfect world, right? If that is your only metric, though, then you are likely to find over time that you have achieved this goal at the expense of customer satisfaction, productivity and cycle time, since some of the practical ways to achieve zero defects are to become excessively risk-adverse, to test obsessively, and to haggle with customers over whether a particular problem should be classified as a defect or an enhancement. Whoops — probably not what you intended.
Measure Up. Austin also advises — and Mary and Tom Poppendieck echo his recommendations — that measurements be taken at a level above the level at which individuals can be directly held accountable for the numbers. Although this may seem counter-intuitive, or unfair, the beauty of this approach is that it encourages practitioners to collaborate to improve the performance of an overall system, rather than quickly figuring out the easiest way to improve their own numbers to make themselves look good.
Avoid a Metrics Monument. If you build an expensive, inflexible metrics collection mechanism and/or repository, then you will tend to continue using it long after it has outlived whatever original usefulness it may have once had, and it will ultimately become a shrine in which you have thoroughly entombed your once lofty ambitions for your metrics program. However tempting it may sound, and even if some well-meaning executive supplies you with funds to build such a thing, avoid this particular siren call.
Natural Data Collection. When information needs to be collected from practitioners, it is generally best to derive this data from sources already being populated as a natural result of process execution. The alternative — to ask people to maintain special spreadsheets, to have them copy and transform data to feed into some special metrics repository — often generates resentment from practitioners, and provides many opportunities for loss of data integrity.
The Medium is the Message. Carefully consider the messages — intended or unintended — that may be sent by your choice of metrics. If you measure lines of code produced per day, for example, you may be inadvertently sending the message that you are less interested in the quality of the work being performed than in its sheer quantity. If you measure customer satisfaction, on the other hand, you will be sending the message — both to your developers and your customers — that you are really concerned about how well your customers are pleased by your development efforts — and sending this sort of message can be as important as the value you derive from the numbers you ultimately produce from the exercise.
Improvement Goals without Plans. Some leaders seem to believe that they can induce their organizations to improve just by setting incremental improvement goals year over year, without any plans on how those goals will be achieved. While these sorts of approaches may appear to have some success in the short-term, they rarely achieve any sort of meaningful, lasting improvements.
Grain of Salt. There are a number of industry sources of software development metrics expertise available. They all have valuable information and opinions to offer, but I would recommend you take their advice with a grain of salt. It is in the nature of professional experts to over-sell their wares and to give you canned answers to complex problems.
Performance to Plan. Since one of the biggest problems in software development is taking much more time and/or money to deliver a solution than was originally planned, there is a natural tendency to obsess about cost and schedule performance, and to over-react to the slightest deviations from budget or schedule. In most cases, however, the things that cause slight deviations are not the same things that cause catastrophic overruns, so this compulsion to flag slight deviations as “condition red” situations is generally misplaced.
Jim Highsmith has persuasively argued that project sponsors should establish acceptable constraints for cost and schedule performance — rather than targets of perfect conformance to plan — and then allow projects to vary from perfection without getting overly concerned. In most cases, he argues, a 5 - 10% deviation would be acceptable to the business, so long as they obtain the business benefits they are looking for.
The negative consequences of trying to drive plan conformance to zero are twofold: you encourage people to fudge the numbers, in order to avoid negative attention from management, and you send the message that plan conformance is more important than customer satisfaction or delivery of business value, focusing the team on less productive goals than you would like — both of the outcomes that Austin warns us to avoid.
Tell a Story. Numbers by themselves have no meaning. Any good story has a beginning, a middle and an end, and your metrics should be part of such a progression. You had a problem/opportunity, you took certain actions, and you had these sorts of results. Even if your numbers show an improvement, without understanding the why and how of the improvement, it may be hard to make any good use of the numbers.
To this end, be sure to collect anecdotal evidence as well as the raw numbers. Real stories from actual practitioners in the field can help to provide background and context for your numbers, and help to fill in the “inside story” of what’s actually going on out there.
Different Strokes for Different Folks. To some extent, you will have different sets of stakeholders who want to see different metrics for different purposes. While a metrics purist might want to try to rationalize and unify all these into some sort of cohesive metrics program, it is often better to allow these different sets to each be developed and maintained with some independence. Consider the following variations on the metrics theme:
Annual Executive Goals. These tend to get set yearly as part of some sort of executive goal-setting exercise, and often change from one year to the next. Don’t over-think these, and try to accept them for what they are: communications mechanisms through which leadership identifies areas of particular importance, and temporary motivational measures intended to nudge organizational behavior in a certain direction.
Team Performance Measures. These allow a team to gauge its own improvement over time. Team velocity is a great example often used by agile teams. The most important factors here are that the team feel ownership of the metric, and that it be aligned with organizational priorities.
Organizational Performance Measures. These allow a larger organization to determine if they are getting better over time. These are tricky, especially in IT organizations where many different teams, types of work, and customer sets may all get lumped together into a single number.
CMMI Measures. This is a null set, and a bogus category. You should not have any special metrics that exist only to please a CMMI appraiser or consultant, or to help “pass” a CMMI (aka SCAMPI) appraisal. (And I say this, not out of any disrespect for the CMMI, but rather out of a deep respect for its underlying intentions.)
Three Key Metrics. The Poppendiecks argue that you should really have metrics in three different areas: customer satisfaction, business value delivered, and the overall cycle time from the initiation of a request to its completion and delivery to the customer (from “concept to cash,” as they describe it). If you have specific metrics in each of these three areas, they you will tend to have a balanced set of metrics that will drive substantive, long-term improvements in development practices.
And isn’t that what it’s really all about?
September 24, 2009
Next: Developmental Levels