All the conferences are happening now,1 and it’s 2022 again:
Last week, Salesforce bought Informatica, a data management service, for $8 billion.2
Also last week, Hex, an analytics and BI company, raised $70 million.
On the same day, dbt Labs launched an entire rewrite of dbt, a VS code extension, a new brand, and a new name.
To kick off its annual conference this week, Snowflake, a database company, bought Crunchy Data, a hosted Postgres service, for $250 million.
A few days before that, Databricks, another database company, continued its tradition of frontrunning Snowflake’s annual conference by doing the same thing that Snowflake is about to do, but bigger. Databricks bought Neon, another hosted Postgres service, for $1 billion.
At the same conference, Snowflake launched a war on everyone, including ETL providers like Fivetran—Snowflake Openflow will let customers write data directly into Snowflake “through pre-built and extensible connectors”—BI products and SQL chatbots—”Snowflake Intelligence offers business users and data professionals a unified conversational experience…to ask natural language questions and instantly uncover actionable insights”3—observability vendors—Snowflake Trail will “monitor, troubleshoot, and debug” data pipelines—and data modeling tools like dbt—Snowflake semantic views and Snowflake dbt Projects now store “all semantic model information natively,” to reduce context switching and streamline pipeline development.4
The beleaguered data industry, back from the dead.
A programming note: Ok, so, it is apparently sometimes a thing, here on substack dot com, for people to meet in person. Is it a good thing? Doubtful. Will it go well if this blog does it? Unlikely. Will anyone’s experience of reading this be enhanced by hanging out with me? Heavens no. But yOu CaN jUsT dO tHiNgS, as they say, so why not, let’s see what happens.
I’m going to be in SF next week. The best thing about SF, by a mile, is its tiki bars (if it comes frozen and in a fun cup—Dippin’ Dots, coffee, eight different atomic rums—I want it). So, we’re gonna try to do a little thing at a tiki bar.
When: Thursday, June 12, starting at 5pm PT.
Where: Last Rites. Maybe. Last Rites is not big, and I have no idea how many people will show up. If there is some giant mixup and more than, like, four of you mistake this event for something fun, then we might have to migrate to Zeitgeist, which is 1) much bigger, 2) a ten minute walk away, and 3) will probably be overrun with tech people who actually know what they’re talking about.
Will there be an open bar tab? No.
What else? If you do think you might want to come, please fill out this form, which will help with the whole “where should we actually do this?” question.
If it moves, how will I know? I’ll post changes to the Substack chat thing, which is now more locked down because of our fun little spammer moment. If you decide to come, I’d recommend checking that shortly before heading over. I’ll also try to email people who fill out the form, but that might not be possible for last minute changes. So the chat is the best bet.
Let’s see how this goes. Probably terrible. But you never know; sometimes you find unexpected things at SF data meetups.
My brief history of the brief history of the modern data stack goes something like this:
Back in the early 2010s, we all got very excited about data. Specifically, because of some famous success stories—Facebook, Nate Silver, the Oakland A’s, hedge funds—we began to believe that, if you collect the right data and analyze it just so, you can use it to predict the future.
This became the mantra of modern industry: You must be data-driven. Analytics is urgent. It’s data or death.
An entire ecosystem of technology companies—contemporaneously known as the modern data stack, but whatever, call it what you want—spun themselves up to make this work easier. We got far better at collecting data, at moving it around, at cleaning it up, and at making charts of it.
Importantly, the point of all of this was, overwhelmingly, for reporting and analysis. The earliest reference I can find to the term “modern data stack” is a podcast that calls the whole thing the modern data analytics stack. A couple months later, Fivetran said in a 2018 fundraising announcement that they’re “poised to rapidly expand our critical role in the modern data stack” by “enabling companies to leverage powerful analytics and business intelligence tools.” And a bit later, Chartio asked, “So why build a modern data stack?” Their answer:
[To allow ] anyone — analysts and non-analysts alike — to simply drag and drop their way to powerful dashboards.
More recently, in his eulogy for the term, dbt Labs’ Tristan Handy suggested rebranding the whole apparatus as the analytics stack:
Today, I’m swearing off using the term “modern data stack” and I think you probably should too. …
dbt still does data transformation. Fivetran still does data ingestion. Looker still does BI. Each of these products (and more) are all leading players in the analytics stack. …
We help people do analytics. Our products are bought from analytics budget lines. Analytics is both a profession and a source of business value creation.
Calling our ecosystem the “modern data stack” is continually fighting the last war. But the cloud has won; all data companies are now cloud data companies. Let’s move on. Analytics is how I plan on speaking about and thinking about our industry moving forwards—not some microcosm of “analytics companies founded in the post-cloud era.”…
It also grounds us all more firmly in the history of the analytics space.
Yes, right. Most of the technical infrastructure and data tooling that people built over the last decade was meant to ultimately find its way into a report, sit behind some analysis, or support some decision. Or, more stylistically, the data industry’s first layer of means was a bunch of databases and pipelines, its second layer was dashboards,5 and its ends were decisions. That was why all of this stuff existed.6
So was it a success? Eh. The technology worked, and we can all make a lot more dashboards with a lot less headache. But we didn’t get that much better at making decisions. Analytics teams haven’t become corporate oracles, and data is barely winning its war against anecdotes.
Part of the problem, as we talked about a few weeks ago, is that the data itself might not be that useful. The Oakland A’s can find hidden patterns in a hundred years of baseball statistics; a B2B software company may not be able to find the same magic in a few hundred sales calls. And second, analysis is hard. Even if those calls can predict the future, you have to be immensely clever to use that crystal ball.
In particular, if you sell a BI tool, there’s a story that you hear all the time: The data team spent a few months setting up a database and feeding a bunch of data sources into it. They got a dbt project up and running, and put all that data in neat little tables. They bought a BI tool—a self-serve BI tool—and shipped a bunch of fancy reports, complete with filters and drag-and-drop charts and Explores, so that people could “simply drag and drop their way to powerful dashboards.”
And then nobody does it. The dashboards go unfiltered; nobody drags; nothing is dropped; the only exploring anyone does is exporting to Excel. People “self-serve” by just looking at whatever the dashboard says when it first loads. There are a thousand paths they could take—so many insights, so much knowledge, just a few clicks away—they choose to stand still.
This happens, I think, because the fundamental theory of self-serve BI is flawed. The challenge with data exploration—and often, with analysis as a whole—is not that people don’t have the ability to manipulate data; it’s that they don’t know what they’re looking for. We kept building faster tools and more accessible interfaces, but that’s not what anyone really needed. Instead, they need direction.
“Would you tell me, please, which way I ought to go from here?”
“That depends a good deal on where you want to get to,” said the Cat.
“I don’t much care where—” said Alice.
“Then it doesn’t matter which way you go,” said the Cat.
Lewis Carroll, Alice in Wonderland
Roughly, this is what undid a lot of the promise of the analytics stack. The technology behind it was useful, but the people who were supposed to be its ultimate users—both analysts and self-servers alike—didn’t really know where to go once they had it. The hopeful analytics engineer, undone by the wayward analyst .
Or, to extrapolate that a bit further, in trying to help people make better decisions, the data industry might’ve been working towards the wrong goal. There are whiffs of crypto in analytics: The blockchain was a technology in search of a problem; all of our databases and pipelines and charting tools are technologies in search of a solvable problem.7
For example, consider the “operational analytics” hype cycle, or the year we spent talking about rebuilding tools like Salesforce on top of databases like Snowflake. One way to interpret these fads is that they were the data industry innovating, expanding, disrupting. Having made so much progress on the home front—making decisions—we were out to conquer more.
But another way to interpret them is that we were lost. We built a massive engine to support an analytical experience that simply didn’t work nearly as well as we hoped, and we were looking for other things to do. If fast pipelines and fancy databases and composable data stacks didn’t ultimately lead to better decisions, maybe we can use them to automate more marketing emails? Make a better CRM? We needed something.
We’ve found the next attempt, it seems. Because, while all the big announcements from the last couple weeks are giving 2022,8 they’re very distinctly 2025:
“Informatica’s rich metadata,” Salesforce says, “will empower AI agents to interpret, connect, and act on enterprise data with meaningful context.”
Hex’s “primary and sole focus as a company is re-(re-)defining analytics workflows now in the AI era,” and they raised the money to “double down on the new era of agentic analytics.”
dbt Labs’ new features are “all purpose-built to help our customers and users tackle the next wave of analytics and AI.”
Crunchy Data will be relaunched as Snowflake Postgres, to “empower developers with familiar tools designed to build production-ready AI agents and apps.”
Databricks will use Neon to “to deliver serverless Postgres for developers and AI agents.”
“With Snowflake Openflow, we’re redefining what open, extensible, and managed data integration looks like, so our customers can quickly build AI-powered apps and agents without leaving their data behind.” Snowflake’s semantic views will “support both traditional BI metadata…as well as metadata that is important for high-quality AI-powered analytics experiences.” Snowflake Trail will “debug and optimize your generative AI agents and apps.” Snowflake Marketplace, which also got updates this week, “adds agentic products and AI-ready data” so that “enterprises can now enrich their AI apps and agents with real-time news and content.”
Sure, some of this is marketing fluff and buzzword clickbait, but the pivot appears mostly real: The data stack isn’t for analysts; it’s for agents. Whereas Snowflake used to talk up how they were using AI to make a better analytical database, the focus has mostly inverted,9 and Snowflake is now a tool to make better AIs. As Sam Altman said in a teaser for his keynote at Snowflake Summit, “data is the backbone of AI innovation, and the way we harness data will be essential to driving the next wave of AI breakthroughs.” And the Crunchy Data (and Neon) acquisitions seem to be explicitly for that purpose: They are databases designed to be created by AI agents and power AI apps.
Though I have no idea what happens next—maybe it’s just another trend; maybe not—it certainly seems possible that the gravity of the whole industry shifts away from analysis and analytics, and towards AI infrastructure. Pipelines companies source data so that it can be mashed into agents’ prompts; dbt becomes an MCP server for business logic; BI tools start assuming all of their users are robots. It’s not a rewrite of the data stack, but a new purpose for it.
But that’s technology, I suppose. Sometimes, the first version isn’t quite right. The technical layers can be good stuff, but the people who were supposed to use them get lost. And so, there are twists and turns, messes and mistakes. The work of the whole industry may not take us where we imagined it would at that first Snowflake circus in Las Vegas three years ago, but it still may take us somewhere yet. That just depends a good deal on where you want to get to.
Well, no, this is what’s happening now, and it feels a little dumb to talk about data conferences in the middle of a food fight between the person running the United States government, and Donald Trump. Which, you know, neither here nor there, but again: If the person who owns Grok starts dictating how Grok responds to political topics, is that an unauthorized modification to Grok’s system prompts?
The ghost of Frank Slootman eventually comes for all of us.
Maybe they were dashboards embedded in some other application, or weren’t called dashboards but “data apps” or “Powerpoint decks,” but they were stylistically dashboards: Charts and tables of numbers that people looked and and said, “that’s good,” or “that’s bad,” or “I have no idea what that means.”
For example, from McKinsey, in 2018: “The fundamental objective in collecting, analyzing, and deploying data is to make better decisions;” and Sequoia, in 2019: “data analytics is invaluable not only for counting numbers, building dashboards and shipping products, but for helping to define goals, roadmaps, and strategies. Arguably this is the highest leverage provided by an analytics team.”
To be clear, it’s not that the data industry didn’t have any direction. The ecosystem broadly knew where it wanted to go: Towards democratized data access, or universally accessible analysis, or self-serve BI, or whatever. The problem is that people din’t know how to use that data once they got it. And if they (we?) didn’t have direction, then the whole industry ends up sort of lost, because it’s serving a product to a customer that doesn’t know what to do with it.
We gotta talk about this whole “giving” thing. Look, I love a good analogy, and any turn of phrase that encourages a fun and unexpected one—”Aaron Rodgers is giving Uncle Rico;” “Trump v. Elon is giving Drake v. Drake”—can’t be all bad. But the whole construction seems to be less about that, and more about avoiding accountability or commitment. Don’t say, “Aaron Rodgers is like Uncle Rico,” because then people might disagree! Don’t say “Aaron Rodgers reminds me of Uncle Rico,” because now you’re a part of that sentence! That’s too much exposure! Too much risk! But Aaron Rodgers is giving Uncle Rico—safe, detached! An opinion with no author! An observation with no observer!
It’s giving passive voice.
(Also, wait, what? What????)
In fairness, Snowflake did release some updates that use AI to make a better database, like aggregation functions for text. Which I am obligated to say are pretty cool, as far as these things go.
Will the AI Agent have an "export to CSV" tool?
Great overview as always (I'm surprised you didn't said anything about dbt's new open source license that makes sure Snowflake can't run Fusion on their own compute).
My hunch on where this is going - data teams will have a deeper understanding that they exist to serve the business and will do EVERYTHING in their power to better understand the business and help the business users understand what they can do with the data.
All this tech doesn't help if the business users have no idea what to ask for.
BTW - a nice experiment to do is to look at the data, and use AI to come up with questions the data can answer, and then show that to a business user. So, instead of using AI to just answer questions, use it to think of questions to ask in the first place...