The Velocity Metric That Quietly Broke Our Client
No story points were harmed in the making of this post. They were already dead.
A founder we work with pulled us into a meeting last summer. He had a chart on his screen. Story points completed per sprint, going back two years. The line went up and to the right.
"Look at this," he said. "We're 40% more productive than we were eighteen months ago. So why does it feel like everything is falling apart?"
We looked at the chart. Then we looked at his bug tracker. Then at his last six months of customer churn.
The chart was lying. Not on purpose. It was measuring the wrong thing.
How a number becomes gospel
This client started tracking velocity in early 2024. Reasonable thing to do. Most teams running anything Scrum-shaped track it. Their VP of Engineering set it up, presented it at the monthly board meeting, and the board liked it. Number go up, board happy.
Once a number gets shown to a board, it stops being a number. It becomes a target.
The VP left six months later. The number stayed. His replacement inherited the dashboard, the board expectation, and the unspoken rule that velocity must keep climbing.
Here's what nobody on that team would say out loud: every engineer knew exactly how to make the number go up. Estimate generously. Break work into smaller tickets. Skip the cleanup task. Don't volunteer for the gnarly bug because it'll wreck your sprint average.
The chart kept climbing. Everyone knew what was happening. Nobody could stop it because the board was watching.
What we found when we actually looked
We spent two days going through their codebase and their tickets. Not the dashboard. The actual work.
Test suite took 47 minutes to run. Eighteen months ago it took 9. Nobody had time to fix it because fixing it wasn't worth any story points.
Three of their five core services were running on a Node version that hit end of life eight months prior. Upgrade ticket existed. Been in the backlog over a year. Eleven story points. Never picked up. Eleven points spent on three small features moved the chart further than eleven points spent not breaking production.
Bug count was up 3x year over year. But bug tickets were getting estimated at one or two points each, while features got estimated at five or eight. So the team was technically "doing more" while shipping a worse product to fewer happy customers.
The number on the chart and the health of the product weren't just unrelated. They were moving in opposite directions.
Why we couldn't just tell them to stop
We told the founder: kill the metric.
He looked at us like we'd suggested he set the office on fire.
"The board sees this number every month. I can't just stop showing it to them. What do I replace it with?"
That's the part nobody talks about when they tell you a metric is bad. You can't remove a metric that an executive audience has been trained to expect. You have to replace it with something they trust more. And if you don't have that ready, you're stuck.
Same trap we've watched senior engineers fall into when they argue against velocity tracking. They have the right argument. No replacement language. Nothing changes and the number keeps going up while the product keeps getting worse.
We had to build the replacement before we could touch the original.
What we replaced it with
We didn't pick one number. We picked four. None of them are perfect. All of them are harder to game than story points.
First: time from "bug reported" to "bug closed in production" for the worst 10% of bugs. Not the average. The average lies. The long tail is where the pain lives. If your worst bugs take three months to close, that tells you whether your team is actually shipping quality work.
Second: number of changes shipped per week that touched code older than one year. We stole this from a habit we'd built internally. If nobody is touching old code, your team is either avoiding it or doesn't understand it. Both are bad. A healthy team cycles back through old code, making small improvements, deleting things, refactoring as they go.
Third: customer-reported issues per active user per month. Not raw bug count. Bug count goes up when you ship more features. Issues per user normalizes for growth. If this number climbs while your user base climbs, you have a problem that velocity will never surface.
Fourth: engineer-reported confidence in the codebase. Asked once a month. Single question: "How confident are you in the code you're shipping this month, on a scale of 1 to 5?" Anonymous. We tally it and show the trend. When confidence drops, something is wrong, and it's usually visible weeks before any other metric catches it.
We presented these to the board alongside the old velocity chart for three months. Then we dropped velocity. Nobody on the board asked where it went.
The part where we caught ourselves
We're not writing this from some safe outside-expert vantage point.
About four months into this client engagement, we noticed something uncomfortable about our own tracking at Norveon. We'd been counting PRs merged per week as a rough pulse on team output. Not for the board, not for anyone external, just for ourselves.
Going through the client's history made us look at our own number.
Two engineers consistently merged the most PRs. They were also the ones most likely to ship a small fix that broke something else two weeks later. One engineer merged the fewest PRs. He was also the engineer whose code never came back as a bug, who other people trusted to review their work, and who had quietly fixed three of the worst parts of our codebase over the previous year.
Our internal number was rewarding the wrong people. Not in any high-stakes way. Nobody was getting fired or promoted based on it. But it shaped how we talked about each other in retros, and the people who deserved more credit were getting less.
We killed it. Replaced it with nothing for two months while we figured out what we actually wanted to know. Now we don't track PR count at all. We do a monthly thing where each engineer shares one thing they shipped they're proud of, and one thing in the codebase they wish they'd had time to fix. Lower fidelity. More honest.
The part that's hard to accept
A bad metric is worse than no metric.
No metric means you have to use judgment, talk to people, look at the actual work. That's slow and it doesn't fit on a slide. A bad metric means you have a number, the number feels objective, and the number is quietly steering you toward decisions that hurt you.
Story points per sprint isn't the only offender. We've seen test coverage percentage do the same thing. Lines of code. Deployment frequency. Even uptime when measured alone. Any number that's easy to game and easy to present will eventually be gamed and presented.
The test we use now, before we agree to track anything: if we optimized hard for this number for six months, would the product actually be better? Or would we just be better at producing the number?
If we can't answer that honestly, we don't track it.
Where that client is now
They're still running the four metrics. Their bug long-tail dropped about 60% over eight months. Customer-reported issues per user dropped too. The confidence number was the slowest to recover. Took almost a year before it climbed back to where it had been before the velocity-chasing era.
That delay tells you something. The trust a team has in its own work takes years to build and gets damaged in months. No metric warns you about this in real time. By the time the dashboard catches it, you've already lost the people who would have fixed it.
We're still helping them. They're not done.
Currently helping another client unwind a "deployment frequency" obsession. Ask us in six months whether the new dashboard survived contact with the board.


