By most traditional measures, dbt Labs isn’t the biggest company in the data ecosystem. It doesn’t make the most money; it doesn’t have the most employees; it has fewer followers on Twitter than the community account for Qlik developers.
But, ever since publishing their vision for a better analytical workflow, dbt Labs has been the most influential. They advocated for analysts to borrow ideas from engineers; today’s data teams almost universally aspire to do so.1 They created a product; we swarmed it with integrations. They championed a new role; in four years, other companies speed ran2 it through the entire hype cycle—from launch to peak to disillusionment to iteration to a two-hundred year old toothpaste company opening an analytics engineering position in suburban New Jersey.3 dbt Labs built a thriving ecosystem of users and evangelists around a Slack workspace; now every company and upstart data ideology has its own Slack community with the same channels and fewer members. Time and time again, amid the landscape’s chaotic explosion, dbt Labs has been our philosophical pacesetter, the drum major to our violent noise.
Congeniality
They also set another less widely-acknowledged tone: They made neutrality popular. dbt Labs found a spot in the market where they could be everyone’s friend and nobody’s direct competitor. Seeing their commercial and community success, other companies tried to follow the same beat. They marketed themselves as grateful stewards of passionate groups of users, and chipper partners to a neighborly coalition of vendors. Save their rip-and-replace landing pages that can only be unearthed by the right Google search run by the right demographic, most data startups are polite to their competitors, and effusive about everyone else. Browse their sites, and you’ll find countless blog posts announcing exciting new partnerships, infinitely scrolling pages of integrations, and diagrams of “all the data tools you love” swirling around a single unifying nucleus. You must choose their product, they say, but the rest is up to you. We are all God’s children, and we cannot possibly choose a favorite.4
This cooperative and congenial demeanor was echoed in the industry’s embrace of modularity. This was itself a core tenet of dbt Labs’ founding viewpoint, though it was initially proposed as a way to avoid convoluted data pipelines and repeated query logic. Over the last seven years, the principle expanded to include tools and technologies as well. The best data ecosystem, we said and now almost reflexively assume, is one that’s full of interoperable parts. Customers should be able to pick and choose from an à la carte menu of tools, with each pairing nicely with all the others. If standards emerge—if we offer a prix fixe option—they should be chosen democratically, via the dollar-denominated votes of data teams.
It’s an pleasing vision—and an unlikely one. dbt Labs’ position as Switzerland was earned, not simply chosen. They found a hole in the market that was overwhelmingly complementary with the rest of the industry, and quickly built a popular product that discouraged head-on competition. Furthermore, there’s a practical problem with too much modularity. Vendors can’t reasonably maintain a bunch of bilateral relationships with one another. Without that, the data stack, assembled from a dozen tools produced by the same number of startups, will remain frustratingly disjointed and incongruous.
This is a steep price for customers to pay for optionality and protection from vendor lock-in. Because choice and flexibility are means to an end: The ability to choose better products. People are content to lock themselves into the Apple ecosystem because they have faith in what Apple builds.5 For a neutral, modular, interchangeable data stack to be similarly popular, the final product—the full stack, as a cohesive unit—has to be just as good.
We’re not there yet, and the proof is the products. The latest wave of data companies were created to solve the problems caused by the first wave. Companies like 5x and Mozart6 promise to hammer down these rough edges for you by offering the entire modern data stack as a managed service (aka, the Managed Data Stack). The interest in dash meshes and data contracts are born out of related frustrations, and a desire to enforce some degree of consistency between, among other things, disconnected products and services.
As others have said, over time, this dynamic could forge better products at lower prices. It might look like a mess now; that’s just democracy in action.
But that was then. Today, time may be a luxury the industry no longer has. Buyers are getting impatient, and the data stack can remain fragmented longer than all its pieces can stay solvent.
Collision
Sooner or later, Jerome Powell comes for us all. As the tut-tutting armchair economists who fancy themselves as enlightened because they listen to the All-In Podcast now love to say, kids these days got addicted to a zero-interest rate environment. Paper unicorns that look promising in that climate turn questionable in “normal times,” and become outright fairy tales in SaaS’ nuclear winter.
When customers stop rubber stamping contract increases of 65 percent every year, startups and huge public companies alike have to look for alternative sources of revenue. Companies that previously made efforts to space the floor, to claim neutrality, and to avoid direct competition will have to offer more products to more people, and to chase nearby markets—all while customers tighten their budgets, like the walls closing in on a JezzBall board. More collisions are inevitable.
The common prediction is that our Gordian Knot—the need for the modern data stack to be cohesive, and dozens of startups being squeezed together by a narrowing market7—gets cut by consolidation. The big fish look for acquisition targets when valuations are down. One by one, companies get assimilated by the Borg, bundled into a faceless mothership hovering over Seattle, and tacked on as a loss-leader to sell seats for a CRM or compute bills for a database. Jerome Powell, it turns out, is just the intermezzo; sooner or later, Frank Slootman comes around.
Though I believe there are benefits to this type of consolidation, it’d still be a disappointing end to modern data stack. Big buyers often slowly gut their acquisitions, like Salesforce recently did to Tableau and Google did to Looker. The souls of beloved businesses go away, and get replaced by new fonts and a “A Megacorp company” watermark under their logo.
What if there’s another way?
Conglomeration
In Silicon Valley, we often think of tech acquisitions as means for building a better product. We talk about how this software might integrate with that service, or how a new technology can modernize a legacy vendor. Acquisitions are about synergy, shared roadmaps, and teleprompter excitement for the next chapter of what two teams can build together.
In other industries, acquisitions are about balance sheets. They are a means for making more money. Berkshire Hathaway doesn’t own an insurance company and a jewelry brand because they’re complementary; it owns them because they both make Berkshire Hathaway money. And if Berkshire Hathaway can promote them together—via, for example, an bizarre infomercial from the world’s fifth richest man—that’s a bonus.8
But there are some companies that take a middle road. They buy products that are adjacent to one another—not to use them as tuck-ins and tack-ons to a core service, but to build a balanced catalog of equals. Over the last decade, Adobe, which started primarily as a creative design suite, bought Omniture, EchoSign, Magento, Marketo, Workfront, Frame.io, and Figma. Atlassian built itself into a $100 billion company (for a minute) through a similar march of acquisitions.
In both cases—and unlike a company like Salesforce—neither Adobe nor Atlassian has a clear flagship product around which everything else is primarily a funnel. If customers buy one product, they aren’t obligated to buy others; each is built to stand on its own. Moreover, the shared corporate umbrella provides two benefits that fully independent businesses can’t. First, it makes the buying process convenient. If a team has Photoshop, they can buy Omniture, Magento, and Marketo without having to talk to a new vendor. Second, even if each product can operate independently, they don’t have to. Atlassian customers can provision users and manage billing from a single administrative portal. And Jira has integrations with dozens of tools, but it fits best with Confluence. Instead of having to rely on community standards and loose partnership agreements to make one product compatible with the other, Atlassian can simply make it so.
As the data industry settles, I believe there could be—and probably should be—a couple similar conglomerates that focus on data services. By bringing various pieces of the stack together under a single roof, these businesses could sand down the rough edges between an ETL product and the observability service that’s supposed to monitor it, or a visualization tool and the data discovery platform that catalogs it. Permissions and access controls could be managed centrally and directly. Charges could be rolled up into a single unified bill. Metadata standards could be mandated, not negotiated. And if customers wanted to use a different vendor for a particular piece of the stack, no problem—it won’t be quite as seamless as staying in the family, but won’t be any worse than the experience today. [ Update: Software conglomerates would also be good counterbalances to cloud or warehouse providers who primarily sell compute. The former is incentivized to make great tools. The latter wants to drive more compute—which won’t necessarily push companies to make the best software. Shoutout to Erik Bernhardsson for piecing together this dynamic. ]
Could it happen? Maybe—I have no idea how these sorts of holding companies actually come together. Are they facilitated through financial intermediaries, like Thoma-Bravo-subsidiary Qlik’s acquisition of Thoma-Bravo-subsidiary Talend? Or do they emerge organically, starting with vendors forming exclusive partnerships, and the industry’s informal neutrality doctrine getting replaced by quid pro quo alliances? I don’t know. But if it’s the path that helps companies keep their character while also building more cohesive experiences for their customers, the modern data stack might be better as the modern data conglomerate.
Or do they? Though it’s not uncommon for people to advocate for data teams to differ from engineering teams on the edges, I’ve never heard an argument that working like engineers is directionally wrong. But surely there are some teams out there that outright reject this.
Shoutout to Colgate for being a very unexpected 217 years old. Also shoutout to Colgate—a company that makes toothpaste—for making their mission (click “Mission” in the top nav bar) to be straight-up “changing the world.”
God, of course, is low interest rates and the shoddy diligence of thirsty growth equity investors.
I’m a personal investor in Mozart.
Jerome! Shut down all the garbage compactors on detention level! SHUT DOWN ALL THE GARBAGE MASHERS ON THE DETENTION LEVEL!
While we’re here, we need to talk about this website.
I think you'd get a kick out of and draw parallels from DHH's piece here, which is a rant on the 'circus empire' around Kubernetes and dev tools world.
https://world.hey.com/dhh/they-re-rebuilding-the-death-star-of-complexity-4fb5d08d
always an entertaining take on the data community with johnoliveresque delivery.