A method for measuring analytical work

Our only job should be to make people more decisive.

Nov 05, 2021

To be, or not to be, that is the question. If only this analysis did a better job of helping me decide.

One of the great ironies of the analytics industry is its utter inability to measure itself. Despite “defining key metrics” for ambiguous business processes being a key responsibility in nearly every analytics role, we enthusiastically reject doing it for ourselves. Our work has been, and frustratingly remains, unmeasurable.

Of course, we still have to pass judgement on what’s good and what isn’t. Teams have to make the case for bigger budgets; performance reviews have to be written; we have to decide when a project is ready to ship. But without a generally accepted way to determine the quality of a team, individual analyst, or piece of work, we fall back on informal—and flawed—proxies.

In most cases, we simply rely on the opinions of “experts.” Analysis is good if good analysts say it’s good (which, obviously, raises the question of what makes a good analyst). This makes us little more than art critics: Even if we attempt to apply loose rubrics when assessing someone else’s work, our emotions often get to vote last.

When we want to be more rigorous, we try to judge analysis by its effect. Did it lead to a good outcome? But this is also applied selectively. While positive results are celebrated, we excuse bad results if we determine that the underlying analysis was sound (as determined by the experts, of course). These outcomes are merely bad luck.1

Worse still, we often can’t measure the result at all. Suppose, for example, a company is trying to decide if they should open a sales office in London or Tokyo. The analyst recommends Tokyo, and the office is a mild success. There’s no way to know if the decision was good or not, because there’s no way to assess the counterfactual. Maddeningly, the best method we have is the original analysis itself, which, almost tautologically, will of course confirm that Tokyo was the right choice.

This paradox—that analysis can only be measured in circular ways—has real effects that go beyond stressing out armchair philosophers with Substacks. It provides cover for bias. It promotes nepotistic favoritism of supposedly good analysts’ analysis that makes our industry more insular. And it makes it incredibly difficult for young, aspiring analysts to grow and improve.

On this final point, consider a junior analyst joining a company without an established data practice. There are no “experts” to provide feedback. There are few public examples of corporate analysis that they can emulate. They can’t easily see the impact their work has on business outcomes. How can they measure themselves? On what scale can they weigh their work? How does an aspiring dancer, growing up in Bomont in 1984, with nobody to imitate and no Tiktok to watch, know if they’re good?

I’ve long felt trapped by this problem, but recently stumbled into a potential way out: Analysts should judge their work by how quickly people make decisions with it.

He who hesitates is lost

The moment an analyst is asked a question, a timer starts. When a decision gets made based on that question, the timer stops. Analysts’ singular goal should be to minimize the hours and days on that timer.

That’s the only metric we should measure ourselves on—straight up, without qualification or caveat. The lower the number is across all the decisions we’re involved in, the better we’re performing.

First and foremost, if the point of analysis is to help people make decisions, this measures that goal most directly. While we think of analytical experts as the best judges of our work, decision makers—the executives or business stakeholders who have the most context about a problem and are most invested in solving it—are better juries. They may not know the nuances of analytical or statistical law, but they’re the people we have to convince. Just as any decisive argument in court is a good one, convincing analysis is, almost by definition, quality analysis. And the faster the jury delivers a verdict—regardless of what the judge thinks about it—the better the case.

The inverse is also true. Sloppy work—analysis that uses bad data, conclusions built on flawed reasoning, recommendations with major omissions or oversights—delays decisions because decision makers often see these holes and ask for work to be revised or redone. If the jury finds a lawyer ineffective, they are; if decision makers find us ineffective, we are.

Beyond measuring the right goal, focusing on the time it takes to make a decision has several other benefits, both in how it measures our work and in the incentives it creates for us as analysts.

First, it forces us to understand the problem we’ve been asked to solve. By tying ourselves to a decision, this measure discounts dashboards searching for a problem, or undirected exploratory analysis prompted by people who are “just wondering” something. We can’t minimize the time it takes to make decisions without knowing which decisions are being made.

Second, it encourages us to see problems as others do rather than as we do. We have to understand the intuitions that people use to make decisions, and the operational context in which they make them. For example, if we’re trying to help the sales team rank leads, we can’t persuade sales reps to call engineers working in finance if they’ve constantly been frustrated by prospects in the banking industry and they know most engineers don’t answer their phones. Analysis will only be convincing if it respects those priors.

Third, it provides a useful counterbalance against analytical excess. One of the challenges for analysts is to figure out when something is done. How many threads do you pull on? How rigorously do you tie off loose ends? This metric provides an answer: You take the time that’s necessary to give someone the confidence to make a decision, and no more.

Fourth, minimizing this time puts a clear—and appropriate—emphasis on the value of communication. Analysts have a tendency to riddle our reports with caveats and hedged language, or to include every detail of our research in our decks.2 But someone has to work through these nervous suggestions and wordy decks. If we measure ourselves by how quickly people cut through that noise, we’d do a better job of cleaning it up ourselves.

This also encourages us to pay more attention to how our work looks. It’s easy to discount aesthetics as shallow vanity, but people trust things that look good.

Finally, even if we don't directly measure the time it takes to make a decision—it’s hard to know when, exactly, a decision was made, and some analyses affect many decisions—we can at least estimate it, or sense when things are taking a long time. And compared to something as squishy as “was the outcome good?,” it’s practically hard science.

Alternative facts

A skeptic might notice a small, politician-shaped hole in this proposal. If analysts see their job as helping others make a decision quickly, doesn’t this push them to be advocates for their opinions rather than the truth? Or, in extreme cases, wouldn’t this encourage analysts to omit results that complicate a decision?

I disagree with the premise of the first objection. Data teams aren’t faceless filing cabinets of static facts; analysts should have an opinion. While that opinion should trace its genesis to a different origin than those of other people—we rarely have a horse in the race, and we have to form our ideas on data rather than direct experience—that doesn’t it them any less of an opinion.

To an executive, feigned analytical neutrality isn’t fair; it’s frustrating. Analysts have explored territory others haven’t; expeditions through data show us things others don’t seen. We develop gut feelings through these efforts, feelings that are more often rooted in real things we can’t quite describe than the meaningless psychic wanderings of our entrails. Withholding our opinions buries these feelings, conceding—incorrectly, I believe—that the only evidence that matters in making a decision is that which we can articulate with a chart.

Moreover, no matter how hard we try, we can never be truly neutral. Our analysis will always be colored by small biases. In the earlier example about Tokyo and London, for instance, we may have initially favored Tokyo. Regardless of the reason—sales in Asia have always been higher; we liked the Olympics; we hated The Crown—this bias likely makes us more skeptical of results that lean toward London. Without even noticing we’re doing it, we might dig into those conclusions more, present them with more qualifications, and accept results that favor Tokyo with less diligence. We’re better off acknowledging our opinion than pretending that it doesn’t exist.

As for the second concern, it’s true that analysts, if incentivized to get people to make decisions quickly, might present misleading results and omit confounding information. But—so what?

To play out the different scenarios, suppose the marketing team came to me and asked me to recommend if they should spend more money on events or Facebook ads. My immediate preference is for Facebook.3 What if, rather than giving both options a balanced look, I just told the part of the story that makes the case for Facebook ads. I omitted promising evidence in favor of spending the money on events, and dressed up the case for Facebook as best I could.

In one world, the marketing team comes away unconvinced. They sense my tilt, start asking a lot of questions about events, and send me back to redo the work. This slows down the time to a decision, scoring my analysis as poor. Moreover, the hill is steeper next time, because the marketing team would start to look at my work with more skepticism or suspicion.

In a second scenario, I do a better job of presenting my case, and the team is convinced. In this case, the Facebook ads pan out; it’s money well spent. Though it’s tempting to tar this as a bad outcome, it led to a good decision that was made quickly. Is a slower path to the same destination really better? If someone consistently gives me good advice, I don’t care if it comes from rigorous analysis or from Miss Cleo; I just want the advice.

In the final scenario, I convince the team but my recommendation doesn’t work. The ads are a bust. When we try to understand what happened, people would either find the things I omitted or catch me with my finger on the scale. In either case, my reputation takes a hit. The next time the marketing team asks me for help—if they ask at all—I’ll get grilled.

That points to the final value of the speed metric: To whatever extent it creates bad incentives in the short term, it counteracts them by creating the right incentives over the longer term. If I want people to make a decision quickly tomorrow, I might be dishonest; if I want them to make fast decisions next year, I can only risk politicking for my opinion when I’m very confident it’s the right one. Which I’d argue is actually the correct behavior: If you know the right path forward, do whatever it takes—cheat, lie, it doesn’t matter—to convince me to take it.

This reputational dimension also provides a useful nudge for analysts wondering where they can take their careers next. The second fastest way for us to influence a decision is for people to take our recommendations on their face, no questions asked. But the fastest way to drive a decision is to make it yourself. That, I think, is how analysts go from being advisors to executives: Build such a reputation for making convincing arguments that people simply hand the decision off to you.4

And that journey starts with a simple step: Ignore all the ambiguity around measuring analytical quality and ROI, and do whatever it takes to make others more decisive.

An aside about my writing process

People periodically ask me where these posts come from. Am I working through a big backlog of rants that I’ve collected over the last ten years? Or do I write them in a frenzied panic every Thursday, with no idea what I’m going to talk about next week?

The answer is a bit of both—ten years in any industry will make you salty and opinionated, but those opinions are messy, and I don’t have editorial plan from week to week. The eureka moment for most posts comes from reading things other people wrote, or having conversations with people in the industry. Those moments provide the spark by organizing loose ideas that are banging around in my head into something much more coherent. In that way, most of my posts are less original songs and more mixtapes of other people’s ideas, recut with too many words, too many analogies, and too many decades-old (or, in today’s case, centuries-old) cultural references.

This post is a particularly extreme example of this sort of plagiarism. Last Friday, I had a very interesting conversation with Boris Jabes, the founder and CEO of Census,5 about today’s topic. Unlike other posts, which often contain short samples from a variety of conversations, this one was ripped from a single track. If you liked it, give most of the credit to Boris.

Several weeks ago, the Braves and Brewers were playing in the first round of the MLB playoffs. In the eighth inning of Game 4, the score was tied and the Braves were coming up to hit. The Brewers sent out pitcher Josh Hader, their hard-throwing closer who strikes out nearly half the batters he faces and is one of the best lefties in the game. Hader came into the game in part to face Freddie Freeman, the reigning NL MVP. Freeman, a left-handed hitter, puts up huge numbers against right-handers and is a slightly better-than-average hitter against lefties.

It didn’t work out for the Brewers. Freeman hit a home run; three outs later, the Braves won the game, won the series, and sent the Brewers home. Still, despite the catastrophic result, few people would say that the Brewers’ decision to put in Hader was wrong. Sometimes, good decisions just don’t pan out. And therein lies the problem. How, if not by its result, do we determine the decision was good?

Optimistically, we do this because it’s truthful. The world is complex, and the “potential relationships” that “suggest” a “marginal effect” for the “particular subsets” of the population “with reliable data” reflect that. More cynically, we do this because sharing everything we know provides us with cover, leaving it up to the reader to navigate that complexity. But in neither case do these details help make a decision.

Because, obviously, it’s full of so much spontaneous fun. Plug me into your totally human matrix, Mark.

If a bartender recommends a stock, I’d want to hear a lot more about it before buying it. If Warren Buffett recommends a stock, I’d hand him my money and tell him to do whatever he wants with it.

I’ve included this link to save you from having to figure out what to Google to find a data company named Census (it’s a great product, by the way).

Nathaniel

Nov 23, 2021

Cool insight! Gen Patten came to a similar conclusion. I might have to start running a timer for myself now. While this is definitely a great way to reframe self evaluation, or maybe compare individuals, do you think that this would be a good metric for a company to measure its analytics teams performance? Do you think this could introduce something like response time quotas that could get decoupled from realistic expectations as sales quotas often do?

Expand full comment

1 reply by Benn Stancil

ASF

Nov 22, 2021

One issue that I can think is about each team data maturity. For example, if I've a business that is starting to analyze their data / dont have much reporting, their questions normally tend to be simpler, there is tons of things that we can optimize (because nobody was looking). As the business complexity grows, a lot of the simpler decisions are automated (dashboards / data cubes / etc...) and the questions tend to be more and more complex.

If I measure the time by measuring how long the data team are taking to reply to a question, I could be measuring the increase in business complexity instead of measuring the data team performance. Also, how can I measure the efficiency provided by questions that are automated via dashboards/alerts/algorithms?

13 more comments...

benn.substack

Discussion about this post