Or, more accurately, the front is splitting into a thousand tiny pieces, dumping 20,000 tons of crude oil into our corporate environments.
In this case, our enormous faceless frigate is the front of the modern data stack.1 Over the last decade, the data industry has been building a giant ship, now worth hundreds of billions of dollars, for ingesting, storing, transforming, and shipping data to every corner of every company in the world. Three-fourths of our boat has taken a clear shape around emerging architectural principles. We scrapped legacy ETL processes and replaced them with ELT; we’ve agreed to centralize our data in cloud warehouses that speak ordinary SQL (and have come to terms with Snowflake taking a tithe on everything we do); we do transformations in the warehouse, in SQL, and are starting to debate if we should define metrics in similar ways.
Though far from universal, these approaches are at least normal.2 If a data leader pitches this design to their CEO, they can find hundreds of analyst reports, blog posts, and customer testimonials to back them up. Nobody gets fired for buying I(ngestion), B(ig cloud data warehouses), and M(odels in SQL).
But the front quarter of the ship—analytics and BI tools for data consumption—the front is a very different story. There are no defaults. There are no generally accepted standards. There’s barely a shared understanding of what companies should be doing with the data they have, much less how they should do it.
Instead, the front of the data stack is represented by an explosion of tools, all tacking in slightly different directions. There’s traditional BI; there’s modern BI; there’s headless BI; there’s open-source BI; there’s Bitcoin-based BI. There are notebooks for analysis, notebooks for SQL, notebooks for collaboration, notebooks for apps, and apps for notebooks. There are data visualization tools, data visualizations for notebooks, and notebooks for data visualizations. There are SQL editors for teams, SQL editors for people who don’t want to write SQL, and SQL editors for Snowflake customers. There are collaborative workspaces, and tools that combine lots of things together. There are spreadsheets we can’t get rid of and spreadsheets replacing the spreadsheets we can’t get rid of; there are rebuilt spreadsheets; there are spreadsheets, but BI. And more of everything is coming.
All of these tools do ostensibly the same thing; they help people analyze data, and help companies make sense of that analysis.3 Which raises two obvious questions: First, do we need so many nuanced options, or will the shape of the front of the ship, much like the rest of the boat behind it, settle on a more narrow consensus? And second, if it does, what will it look like?
The case for choice
A few years ago, I was talking with a handful of people at a data meetup in San Francisco. One person was sharing their frustrations about their current team. "We hired a couple of data scientists to solve hard problems like building ML models,” he said, “but they've mostly only had time to answer business questions like analysts."
As someone else was quick to point out, his premise was wrong: Helping people use data effectively is the hard problem. Unlike purely technical work, or the work of moving data around the lower levels of the data stack, solving business problems requires that data cross the chasm from computer to person. The fragmentation of the analytics space may simply be a reflection of the boring truth that people are different, we understand things in our own ways, and we’ll never have a standard API into people’s heads.
Moreover, different companies also use data in lots of different ways. The consumption layer is the interface to those use cases, and may need to be as varied as they are.
To continue our long history of food analogies, every business needs to cook different things with their data. Our kitchens are only so big, and don’t have the space or budget for every appliance and piece of cookware from Sur La Table. We’ve got to choose if we want a juicer or an Instapot; if we want a tortilla press or an immersion blender; if we want a conical burr grinder, a digital gram scale, a gooseneck pour-over kettle, and a Chemex carafe, or if we’re happy making coffee with a can of instant Folgers and the hot water from a garden hose that’s been sitting out in the sun.
The best choices aren't universal. We all have different problems, and different aptitudes and preferences for how to solve them. What architecture is best? How should we compose our kitchens? It, as the classic line goes, depends.
Still, kitchens have some standards. No matter what we’re cooking, we all need a sharp chef’s knife, a stock pot, and a couple sauce pans. Even if the consumption layers’s details remain stubbornly variable, surely, surely, we’ll eventually agree on a few essentials.
The choices we face
In fairness, people have been trying to make sense of how we consume data for decades, and these aren’t exactly novel questions. But in fairness to being fair, a lot of the more recent conversations about the consumption layer have been dominated by voices who have a very big stake in which perspective prevails.4
People who are incentivized to say that we shouldn’t consume data through dashboards say we shouldn’t use dashboards. People who are incentivized to say that analytical applications are different from self-serve BI tools say that analytical applications should be different from self-serve BI tools. People who are incentivized to say that collaborative, document-inspired experiences are best say that collaborative, document-inspired experiences are best. People who are incentivized to say that legacy BI is dead declare legacy BI dead.
It’s not that these arguments are wrong (though they can’t all be right). Nor is there anything wrong with people making them—presumably, we created these incentives for ourselves because we believed them first. But, this makes the conversation about the consumption layer a campaign, and proxies opinions about architectures through specific products. That muddies feature sets with more fundamental questions that haven’t yet been sorted out.
In the spirit of trying to figure out what exactly we’re all doing here—of stepping back from talking about which kitchen appliances we need and not the brands behind them—these questions, however, are still worth asking.
Specialized, or a suite?
Few people would disagree that different jobs call for different interfaces. Spreadsheets, notebooks, dashboards, exploratory visualizations—they all have their place, just as docs and slides have their place in office productivity apps.
The interesting question is about how they fit together. Should analytics tools exist as completely separate products, like the old desktop Office suite? Should they all be under one integrated roof, though remain generally distinct, like Google Apps (I mean Google Apps for Work, I mean G Suite, I mean Google Workspace)? Or should the lines between them be fully blurred, as they are in Notion and Coda?
For analysts, or for everyone?
No modern analytics tool would dare not be collaborative. But collaborative among which groups of people? Specifically, should analysts and data scientists primarily live in an advanced tool, and the rest of the business live in a BI and reporting tool, with people occasionally interloping between the two? Or should everyone always gather together in one spot, whether they’re there to look at a dashboard of ad spend or to do a strategic investigation of why search ads are suddenly outperforming social ads?5
This question is complicated by domain-specific apps, from traditional tools like Google Analytics to whatever operational tools the future cooks up. In a potential world where everyone lives in their functional apps, do we even need a tool for generic dashboards?
Who’s an analyst, anyway?
Analysts’ and data scientists’ roles are getting compressed from both sides. Analytics engineers are eating into the upstream edge of their work, designing data models and configuring business logic that analysts used to be responsible for. And quantitatively savvy business experts are squeezing the downstream boundary, self-serving (in theory, at least) answers without analysts needing to intervene.
The latter case raises foundational questions about how non-analysts should consume data. Should they work in environments with high walls and protected paths, limited but precisely governed? Or should people be encouraged to gradually venture off the trail?
Looker and Tableau provide useful examples of this dimension’s two poles. Though both sell to a general, “code-free” business audience, they clearly see that audience differently. Looker emphasizes governance and control; it’s BI with enough padding that nobody can hurt themselves. Tableau has a steeper learning curve and can be more easily misused, but, for the folks who invest in learning it, can stretch much further.
Separate, interoperable, or embedded?
We can also imagine more extreme reconfigurations of the consumption layer. Rather than every tool building their own cut of a notebook, or visualization engine, or SQL client—not to mention content management systems, admin tools, and application cruft that’s necessary in every modern SaaS product—vendors could simply provide composable pieces that get glued together elsewhere. This consumption layer could look like Wordpress: An open platform where everyone chooses their favorite plugins. Just as it has for other parts of the stack, will there be a day that modularity and interoperability—true interoperability, not just APIs shouting at each other—comes for consumption too?
Technology trends are often cyclical, swinging back and forth from centralized and decentralized, from bundled and unbundled. In the analytics space, however, the pendulum has lost its period.6 That’s not necessarily a bad thing; innovation emerges from disorder. But it seems inevitable that the industry will eventually pull itself back in line, potentially corralled by community consensus, or, more harshly, yanked together by an inescapable economic gravity. But either way, someday, our boat will get its face.
omg you must be soooo surprised
And actually, it turns out, really do have a minimum crew requirement of one.
I forced a bot to read the marketing sites of a thousand data companies and write its own. Here is the first page: Collaboration and team collaboration help data teams collaborate! Announcing Notebook, the democracy for data teams. Stack modern data with AI-powered analytics at the speed of time. We’re humbly backed by $800 million in seed funding to remake the Snowflake data lake. Join our growing Slack community; no credit card required.
Disclaimer: I have a very big stake in which perspective prevails.
Because, hoo boy, are they ever.
That music, though.
Could we really consider all of these consumption modes described here as different data IDEs? They are really just different interfaces to the data, and different tools give you slightly different angles. But where the convergence will actually take place is at the next step: how it's delivered to its audience.
That seems like the place where no one is even fighting yet, because we just assume it's going to end up in Google Docs or Slides or Slack or Confluence or some other static place. And I think it's much bigger than just data, it's about the forum where organizations communicate and strategize.
Tnx, as always, for the great article. It definitely resonates. Re the comment on "Helping people use data effectively is the hard problem", I think that in many cases you can abstract it one additional step; people don't know what questions to ask. Why is this relevant, because when people do see data they try and rationalize it based on their modus operandi (e.g. one healthcare provider we work with said that they almost never change patient orders, but once their were confronted with the data the line changed to oh, those are meaningless or common changes). Don't rock the KPIs or to paraphrase another famous TLA, NIH (Not Ingested Here)...