Discover more from benn.substack
Data's big whiff
How to escape our dashboard rat race, learn from data, and love the job again.
Fool me once, shame on you. Fool me twice, shame on me. Fool me a thousand times, and I might be a data scientist, answering the same ad hoc questions I answered a month ago, wondering why I’m still not working on more interesting projects despite building more dashboards than a Ford factory and writing enough documentation to land a golf cart on the moon.
For many data teams, it’s Groundhog Day: We build dashboards and self-serve tools to free us from answering mundane questions, with the promise of working on strategic initiatives as soon as we pay our operational dues. But that day never comes. No matter how much we build, it’s never enough. We keep making “investments for the future,” but we’re stuck in the present, outnumbered by repetitive ad hoc questions that we forget as fast as we answer.
Are we doomed to this Sisyphean struggle forever? No—but we’re looking in the wrong place for solutions.
Where the wild things are
Analytics teams can split their work into two rough buckets. The first, which includes making dashboards, is about building systems for people to answer questions. It includes maintaining business intelligence tools for “business users” (even the manic ones), creating dashboards for executives, and generally supporting some notion of self-serve analysis around an organization.
When we create these tools, we try to do it systematically. We define governed data models for our tools to read from; we monitor the pipelines that feed these models so that we can reliably detect anomalies; we curate discovery platforms to help people find relevant data and dashboards. We’ve even created a role–the analytics engineer—to own and maintain this system.
Though we’re not necessarily good at everything in this bucket and a lot of people now dispute its actual value, it certainly feels valuable. A big reason for that is, in a word, marketing: BI and its associated applications are sold as platforms. They're infrastructure on top of which more things get built. To build this system is to invest in organizing your data, with the promise of a more productive and data-driven future as the payoff.
This makes it an easy sell, to both ourselves and to those who give us our budgets. Rather than having to make the case for building a dashboard or report, we can argue for expanding the system. We’re not building individual houses; no, we’re laying down the roads and pipes that make unincorporated land a city; we’re civilizing the uncivilized. For all of us—and especially control-hungry execs—that has a lot of emotional appeal. So we expand; we build; we “invest.”
The other half of our jobs is doing analysis directly. This work is mostly commonly referred to as ad hoc analysis,1 though some people call it advanced analytics, or decision science, or just “answering questions.” This is, presumably, what want to do rather than build the dashboards we complain about; we build self-serve tools, we say, so that we can focus on this type of work. Looker sells this promise directly: “Looker helps to streamline processes to save valuable time, freeing up data scientists to focus on the more rewarding aspects of their job.”
We prefer this work in part because it’s less tedious than adding the 1,000th filter to a dashboard, and in part because this is the work that actually matters. Ad hoc analysis is meant to support ad hoc decisions. These decisions are, almost by definition, the most important decisions companies make—they’re the ones you only get to make once. Jeff Bezos’ famous one-way doors are the stuff of ad hoc analysis, not a BI report or self-serve dashboard.
And yet, there’s no officially recognized system for storing these conclusions or finding what’s been done before. It rarely exists inside the civilized walls of the self-serve systems we invest so much time into building. Yes, it’s sometimes built on top of the foundational elements that underpin BI tools, like governed dbt models. But its final products—the materials that contain analyses and their associated recommendations—are often scattered around analysts’ computers, buried in emails and Slack posts, and built on top of ungoverned queries and Python notebooks that blend development work with final recommendations.2
To put it more bluntly, the most valuable—and most fun!—things we create as analysts live in the wilderness, while we carefully curate systems to support trashboards that their builders hate building and their users don’t use.
The result is devastating, and it goes beyond blowing up data scientists. Dashboards and BI reports are operational tools for making immediate decisions. Their usefulness is fleeting; they don’t provide lasting value. We need car dashboards to drive, but we’re not better drivers today because we used our dashboard yesterday. Nothing we learn from them is durable.
There’s a dismal irony in this. The BI systems we build don’t accrue value. Ad hoc analysis—the things we learn, the things that actually make us smarter, like an analysis on the actions we could take to improve our driving—does, but we have no system for capturing it. We invest in the ephemeral, and throw away the enduring.
Better dashboards, better deck chairs
I’m not the only one frustrated by this, and sometimes, other people boil over too. But so far, most solutions, which focus on dashboards and BI, are incomplete improvements. No matter what we do within the system—no matter what new form factor we give dashboards, or how much observation and governance we layer on top of them—we’re still addressing a tangential problem. The real issue is happening outside of the system: ad hoc work is both essential and essentially ungoverned. No amount of “reimagining the deck chair” will keep our Titanic afloat.
To escape our dashboard rat race—and more importantly, to help our organizations actually learn and build durable knowledge—we should start by organizing our ad hoc analysis.
Unfortunately, there aren’t any clear answers yet on how to best do this. At best, we’ve produced a swirl of indirect solutions. Older frameworks like Airbnb’s Knowledge Repo, though probably too clunky and research-oriented for most organizations, could serve as a starting point (shoulders, if you will) for more accessible methods. Curation tools like Select Star and Workstream don’t quite govern ad hoc work, though they could organize it. And we could draw lessons from other disciplines, including the processes support teams use to build knowledge bases and the mechanics that power social platforms like Reddit.
I have a few other thoughts about this, which I’ll save for a later post. In the meantime, I’d propose a simpler—and potentially even more impactful—campaign as a starting point: We stop calling ad hoc analysis ad hoc analysis.
The word makes the work
In 1929, while conducting experiments in the Canary Islands, German psychologist Wolfgang Köhler discovered that he could show people nonsense shapes and words, and people would consistently pair the same shapes with the same words.
Adam Alter, referencing the study in a 2013 New Yorker article about the power of names, proposed a linguistic Heisenberg principle: “As soon as you label a concept, you change how people perceive it.” This idea goes beyond the usual claims that giving something a name makes it concrete; it suggests the name can alter the thing itself.
In this light, it’s little surprise that ad hoc analysis is undervalued and often discarded—its name tells us to do it.
For a class of work that serves as intellectual underpinning behind a company’s most important decisions, the term “ad hoc” carries a lot of bad connotations. It suggests that the work is temporary, like scratch work meant to sketch out an idea and then be thrown away in favor of something more permanent. If you’re not steeped in the language of an analyst and don’t know our codes, you’d be forgiven for not seeing the need to invest in ad hoc analysis and its retention. Imagine an engineer saying they’d like to prioritize ad hoc projects, or a marketer running an ad hoc campaign. These things feel like offshoots, tangential to the primary goal. No wonder we never seem to find the time to do it.
It gets worse. Not only does the term “ad hoc” undermine its value, but—per Köhler—it changes the work itself. By naming it something temporary, we make it temporary. By calling it something that exists outside of a system, we place it outside of the system. We excuse our lack of organization because it’s defined by its disorganization.
These effects feed on each other. Because ad hoc work is seen as disposable, we don’t invest in making it durable. Because we don’t invest in making it durable, it’s assumed to be disposable. To break this cycle, we need to rebrand it.
A new name should accomplish three goals. First, we need to highlight that this work is important. “Strategic analysis,” for example, makes clear that it’s not a side project, but the most important part of our job.
Second, we need to place this work inside of a system, not outside of it. A system that organizes some results and loses others isn’t organized; ungoverned work anywhere undermines governance everywhere. “Core analysis” can’t be an appendage.
Finally, we need to demonstrate that this type of work isn’t meant to be discarded, but should be accumulated. After a year of uncovering new results, we should be organizationally smarter and more informed.4 “Analytical research” isn’t meant to be thrown out, but cataloged, organized, and expanded.
Personally, I don’t love any of these names (I’m not a marketer; I’m a cheap growth hacker). But surely we can do better than ad hoc analysis. We came up with big data; we came up with data science; we came up with decision science, and data mesh, and deep learning. As a field, we aren’t afraid of dressing up twenty-year-old technologies with buzzy puffery that’s simultaneously vague and inaccessible, better suited for a Star Trek set than a company standup.5 But when it mattered most, we got cold feet, and opted for understated Latin legalese.
So, by all means, let’s stop building dashboards. But first, to convince everyone—including ourselves—of the value of what we should be building instead, let’s come up with some better technobabble.
Gleb groups analytics workflows into three buckets: Self-serve analytics, dashboards, and ad hoc analysis. I’d argue that there’s not much distinction between the first two categories.
And yes, in Mode.
We’re a doctor who is unconcerned by doing harm. We’re a civil rights lawyer who refuses to judge people by the content of their character. We’re a priest who does unto others without a moment’s thought about how we’d have them do unto us.
This points to one of the most damning questions we can ask about how we currently manage ad hoc analysis. How much smarter is your organization because of the analysis you’ve already done? How many future decisions will be better because of the analysis you did in the past? Analysis should strive to not just answer one question, but to also give us a head start on the next question.
“We swept the data mesh, Captain. The readings found anomalies consistent with new life forms, and the Starfleet’s decision science computer says it's safe to investigate. But the deep learning scans, sir - they found nothing new. In fact, the neural networks are identical to scans from ten years ago. Sir, I think - I think we’ve been here before.”