If data is a product, what is production?

We can’t build something unless we know what it is.

Sep 09, 2022

An “ad hoc” traffic pattern in San Francisco.

We did it, y’all—the “data as a product” hype cycle is officially complete. Over the last few years, the idea rocketed through the maturity curve, starting somewhere unknown, made its way onto Medium, and graduated to popular talk at a popular conference before eventually coalescing into a seminal blog post. People wrote explainers about it; other people wrote explainers to summarize the explainers. Vendors created guides with unique, accurate page titles and many frequently-searched keywords. The critics published contrarian takes. And now, in the inevitable closing phase, the once-novel proposal—that data teams should think like product teams—jumped the final corporate shark: McKinsey wrote a report about it.1

Still, I’m glad the idea had its moment. As useful as it is for data teams to learn from engineering departments, they’re an imperfect guide. Sourcing inspiration from product teams (and design teams, and support teams) surely makes for a better collection of best practices than getting all of our ideas from one place.

But before the concept of data as a product retires to its final sterile form—co-opted for a Teradata billboard on the 101 or for the vision statements of a dozen YC applications, presumably—I’d like to slip one last thought in under the deadline: If data is a product, we should have a better definition of production.

Everything, everywhere, all at once

When companies create a product like a website, or a car, or a movie, it’s usually pretty straightforward to define what’s in production and what isn’t. It’s the website that people on the internet use; it’s the mass-produced car that people drive off the lot; it’s the sequence of pictures and sounds that people see when they’re in the theater. Though various things might get worked on that don’t make the final cut—features that bomb in internal testing, clay concept cars that never make it past the dramatic shot in a commercial of Cadillic reinventing itself from the road up, movie scenes that get edited out—there’s a clear line between development and production.

This makes sense. Tech companies have to support and update the products they ship, and need to be thoughtful about which features are worth that investment. A movie would never hold together if the plot had to include every idea from the writer’s room. Production has to be a protected space, to make it feasible to build, functional to use, and affordable to maintain.

Data teams, unfortunately, haven’t developed the same habit. Production is a fuzzy concept, with a wide and blurry boundary between it and development. Some things are obviously in production, such as core dashboards that are used by an executive team and customer-facing features, like Spotify’s algorithmic playlists.2 And some things obviously never make it out of development, like the hundreds of one-off queries that I write in the course of trying to remember how various tables are supposed to be joined together. But a huge percentage of a data team’s work sits somewhere in a muddy middle. We share one-off reports to answer one-off questions. We create new dashboards for a product launch, or to fill an urgent demand from an executive. We copy existing dashboards to make new versions that tweak some calculation for a specific customer or marketing campaign. We build data apps that solve narrow problems, like reconciling revenue figures for a financial audit. We ship report after report around the business, each of them addressing something specific, none of them meant to last forever.

When I send a report like this to someone, there’s an implicit contract attached to it: It works now, for the thing I said it would work for. It is, at that moment, in production. But that contract has an undefined scope and an uncertain expiration date. Will it work in the future? For how long? Can the report be extended to answer related questions? People don’t usually ask these questions, and data teams don’t usually volunteer answers.3

As a result, production becomes an expansive, nebulous Frankenstein.4 An analyst sends a report to an account manager who’s putting together a monthly business review for an important prospect; the account manager finds it useful, bookmarks it, and keeps returning to it well beyond its intended (but unstated) “best before” date. A designer finds an old analysis on customer segments and uses it to make decisions about their upcoming user study. An operations analyst builds their own reporting on a few tables they find in Tableau, and never tells the data team. A product manager creates a series of new user engagement metrics to assess a recent release, and starts regularly reporting on them to the executive team.5

And everyone eventually ends up frustrated. Data teams lose control over what they’re expected to maintain, and can’t make changes without upsetting at least one side of the seven-dimensional Rubik’s cube of canonical dashboards and lingering ad hoc reports.6 Everyone else loses track of what they trust. And the problem compounds, because data teams can’t fix production if they’re never sure what’s in it; all we can do is add to it.

So what’s a data team to do? Think like a product team, and start defining production.

Commit more, less often

I’ll take a hard line here: Data teams should be explicit and comprehensive in defining what’s in production. Whenever we create something with an external edge—a dashboard, an explorable dataset, an operational pipeline with a user or system on the other end—we should make a choice: Is this a production asset? If it is, it should be marked, recorded, and supported until decided otherwise.

Admittedly, this would be somewhat onerous, and could make people reluctant to declare something as a production thing—but that’s the point. A company with a few dashboards and a handful of key metrics can focus on what’s important; a company with hundreds can’t focus on anything. A data team that supports a small collection of production reports can keep them fresh and work on other projects; a data team with reports everywhere can’t do either.

This doesn’t mean we shouldn’t answer questions quickly, or turn around one-off dashboards to explore tangential curiosities. We should—speed is a big part of what makes analysts valuable. But we should be realistic about what happens to things after we make them: They expire. We stop supporting them. They should probably self-destruct. We might as well be honest about this, and neither confuse people who stumble on them six months in the future, nor burden ourselves with wondering if we need to update them when we make some change to the tables they sit on.

In exchange for this peace of mind, we get a new responsibility: We actually have to support the smaller set of things that we officially designate as being in production. We have to stand behind them, just as a product team stands behind features they ship. When some data source or dbt model changes, we have to make sure these things still work. And when we want to stop supporting something, we have to actually deprecate it, rather than letting it drift into slow decay.7

Of course, data teams already do this; it’s just not usually that explicit. Though that may seem like a minor distinction, the guarantee makes all the difference. Imagine a ski resort with vaguely defined runs: Resorts would never quite know which slopes people expect them to keep clear, and skiers could never be sure if they were somewhere safe or dangerous.8 This would be chaos, for everyone. To paraphrase a favorite data cliché, that which is defined is maintained.9

De facto production

Without a clear definition of production, data teams risk getting caught in another trap: They can lose control of their own roadmaps.

For most data teams, production is a de facto state. Something gets created; it gets used; it gets maintained out of necessity; it is, for all intents and purposes, now in production. Though data teams can push the reports and dashboards they think are most valuable, they can still get caught chasing the behaviors of their users. Our runs become the paths people ski, not the runs we want to groom.

For example, suppose a data team has created a set of user engagement metrics that provide, in their view, a good summary of how people are using a product. A financial analyst who’s building an account health model asks for an adjusted set of metrics that they think will be useful predictors of customer churn. As a data team, we’ve got a choice: We can say no, and shoot down a potentially valuable project. Or we can say yes, and create the new metrics. However, without a firm line around what we consider production, these metrics could easily drift from a one-time exploration into something that we’re expected to regularly support. The new report could also undercut our existing engagement metrics, and confuse people about which ones they should be focused on.

Product teams can handle these sorts of “feature requests” pretty easily. They tell their customers that they always love to hear from them, that their feedback will be taken into consideration, and in the meantime—and maybe indefinitely—to do their best with the product as it is. And then the product team makes a choice: Can they not only build the feature, but also maintain it and support future updates? Does the feature fit into the vision of what they want the product to be? If it does, they build it; if it doesn’t, they don’t.

If production for data teams was better delineated, we could take the same approach. We could decide up front if the financial analyst’s request fit into our roadmap or not. If it didn’t, we could still help them without accidentally signing up for a long-term maintenance commitment. We could still create new metrics without threatening the primacy of our existing ones. We could stay true to our vision—these should be our key metrics, these are our core dashboards—without having to say no to anything that might overlap with it. And if new reports become so valuable that we want to elevate them to production assets, we could—but the choice would be ours.

Always be shipping

Data teams and product teams do differ in one very fundamental way though. Product teams are built to ship stuff. Yes, they sometimes have research arms, and yes, people build prototypes that are meant to be thrown away, but all of this is in service of eventually delivering something to production.

Data teams shouldn’t have this mindset. Lots of the work we do—answering one-off questions, sharing self-destructing dashboards, creating new metrics for a financial analyst’s churn model—should never make it to production. We should keep production narrow and exclusive, and be content going days and weeks at a time without ever shipping anything to it.

How do we develop that approach, and build processes that support this way of working? The answer to that question, I suspect, is something we can’t borrow from another department. It’s one we’re going to figure out for ourselves.

The boomers are trying to stay hip.

TIL that you can see your “top tracks this month” and my ego did not need that.

A report is like Velveeta. It won’t go bad right away, and it definitely won’t be good forever. But how long does it last? It depends on a lot of confusing factors, it’s hard to tell when it does go bad, and most of the ingredients are artificial.

I know what you pedants want this footnote to say about Frankenstein, and I won’t do it.

To be clear, these aren’t complaints about the people using dashboards or creating things. How are they supposed to do anything else? If we tell someone they can trust it today, why should they assume they can’t trust it tomorrow? It’s a problem with the world they work in, not with what they’re trying to do in it.

Unless that data team employs my former colleague Nan Ma, who actually solved one of these things.

My my, hey hey, it’s better to deprecate than fade away.

Either way, you’re getting eaten by the Abominable Snowman.

Given what he was measuring and trying to improve, we should probably come up with another favorite cliché.

Nick E.

Sep 9, 2022

Yes Benn! We haven't worked together at a company.. but as a former CFO working with IT/developers on the reports/dashboards/queries that you have described above, it is like you just described 10 years of our separate/together joint life! 🤣😭🤘

We all wonder how many decisions at a company are being informed by inaccurate reports from 3 years ago built for a 1 off purpose... Ugg!

Expand full comment

3 replies by Benn Stancil and others

Nick Zervoudis

Oct 14, 2022

As I was reading through this post (which I really enjoyed, thank you for writing it!) I was trying to relate the scenarios you describe to my current and past work. Upon reflection, it seems that the model described is more applicable to reactive data teams (which is many if not most of them!).

If the data team is the answer-producing black box that business stakeholders come to every time they need an answer, the observations you make about things being one-off or in that muddled we-built-it-as-a-one-off-but-people-want-it-updated area make total sense. But creating products (and taking them into production) should be more proactive and collaborative than that, or at least aspire to be, no?

In my last team, we had a very explicit set of products we set out to build. They weren't in production from day one, but the vision was one that was the joint product between the commercial leadership team and our department. Instead of waiting for different business teams to come with their queries (almost all of which would end up being answered with one-off projects), we set out to create reusable, extensible data assets that standard models could be built on (and that were then also used for some one-off projects, or given to local teams who wanted to do one-off work). What was production grade* (reliable, refreshable, quality-assured etc.) was the underlying asset (standard data models, standard ML models), which was where the team spent most of their time on.

Of course, many of the challenges you describe crept in anyway, and the products weren't perfectly... product-like from the start (I wrote about this last month, if you're interested: https://bit.ly/productisation), but they definitely weren't a continuous and recurring pain.

Going from reactive to proactive is hard, don't get me wrong. It requires business partners to change how they work, and because it requires analytics translators and data product managers to occupy the chasm between data and business successfully, and proactively identify and explore areas where these sort of product opportunities might exist.

I started off writing this comment thinking I'd be outlining where I disagree with you, or at least where I don't think the observations you've made apply. Thinking it through, what I think I've done above is actually just fill in some blanks: If a data team is stricter about defining production, and as a result commits more but less often, then it has the luxury (and probably the prerequisites too) to be much more proactive.

4 replies by Benn Stancil and others

17 more comments...

benn.substack

Discussion about this post