How Snowflake fails

Sep 2, 2022

Climate change comes for us all.

37 Comments

The rise and fall of Lotus Notes in the late '90s/early '00s offers an enduring cautionary tale about the dangers of doing too many things and being caught out in what we might call a "disaggregation wave" - the point in the tech cycle where vendors have bundled too many things together and the mood changes to favor componentized approaches. Notes started in the mid-90s as an interesting and fairly unique app/database platform which combined a semi-structured document DB with a formula language-based programming environment. It enabled a class of document-centric business apps (from discussion boards to expense approval workflows) that were damned useful, and spawned its own third-party app vendor and custom developer ecosystem. It was, and I say this without any shame, pretty cool.

However, as the late '90s wore on, a couple of things happened: Email became incredibly important, and so did the web. So Lotus bundled email into Notes, and also added an http server & JVM into the Notes server, renaming it Domino (don't ask me why, I don't know either). These were all obvious enhancements to make at the time, but they caused a couple of things to happen: First, they put Notes in Microsoft's crosshairs, who was trying to get folks to adopt Exchange; but more relevantly for this discussion, it put Domino in competition with the (then) very disaggregated tech stack around web applications. Companies building their cool web startups in 1999 were not about to deploy a whole bunch of Domino servers - they were bolting together Apache http servers with some kind of SQL backend and spinning up their own server-side code execution tech. The bundling of stuff inside Domino just didn't sell; and it didn't help that Domino didn't do any of these new things tremendously well. By the early/mid-2000s, these twin forces had more or less killed off Notes.

So every time I see a software company starting to bundle more and more into their platform, I do wonder what that tipping point might be - the point at which, almost out of perversity, customers say, "no, I don't want to buy this all-singing all-dancing solution, because I want the satisfaction of putting it together myself".

Expand full comment

Thanks for sharing all this - I hadn't heard this story. It makes me think of the question about snowflake in a slightly different way, which is, can they convince people what they're building is one product, or will they see it as a bundle?

The whole data cloud thing is new, so it's possible they successfully market it as a new product that 1) you need and 2) you can't unbundle, any more than you can unbundle an iphone.

Or, they don't, and we all see it as some confusing amalgamation that we can build a better version of ourselves.

Expand full comment

Not so long ago I worked at a company where their data warehouse was running on a dedicated four node SQL Server cluster hosted on AWS that was costing north of a quarter of a million dollars per year in Microsoft licensing alone before we had run a single query. When people tell me they think Snowflake is expensive I ask "relative to what?" and then walk away laughing.

Expand full comment

Sep 4, 2022Edited

I would say the answer is open source. I know that folks will immediately jump in to say that I’m not thinking about the costs of managing the service yourself, that you need to pay large salaries, figure security , management, etc. I agree that if you’re a small player, you should use Snowflake or some other managed solution. Don’t over-engineer when costs are an issue.

Nonetheless, do people really think in the age of terraform and IaC it is impossible to manage an OLAP database? Hundreds of companies are managing their own k8s, airflow instances, and production databases based on OSS, all of which are considerably more complex to get right relative to an analytical database. All of this runs in AWS or GCP.

Expand full comment

Ernest Prabhakar

I agree. I remember when web application servers were expensive $50,000 enterprise products. Now they are all open source. I don’t see any good reason why “data application servers” won’t follow that trend over the next decade.

Expand full comment

I think this depends on what Snowflake ends up selling. If it's just box for storing and processing data, this probably is the direction things go, which is more or less what the margin squeeze point was meant to make. If you can run it yourself on bare AWS without too much headache, the premium you'll pay someone else to manage it isn't very big.

However, if Snowflake successfully markets themselves as a new product, with all of the various services that they offer as representing something different from a database (ie, it's not a warehouse, it's a data cloud), suddenly the cost of managing all of that gets really high, and open source seems less appealing.

The other question I'd have on this - why are other applications not open source? Why do we assume that databases will go the way of infrastructure, and not of applications, for which open source products are a tiny fraction of what people use?

Expand full comment

Ernest Prabhakar

I think my premise is that databases are a (component of) application servers, not actually “applications.” My rule of thumb is that only humans pay for value, and thus eventually systems evolve to where value is captured by those who maximize joy at the point of interaction (see: fashion), ie the enablers of user experience.

Expand full comment

But people pay a whole lot for AWS. So the argument is less that they only pay for applications, but that there are certain types of products/applications/services/whatever that people *don't* want to pay for. And I'm not sure what those are.

Expand full comment

Ernest Prabhakar

Ah, I was taking about Rate vs

Quantity. There is a lot of money to be made in selling a commodity, but only at scale. My argument is that compute will also become commoditized, and only personalized user experiences will command high margins.

Expand full comment

Continue thread →

"Last fall, Erik Bernhardsson made the case that AWS and other cloud providers might be happy to sell core compute services like EC2, and let other vendors—Snowflake, for example—do the hard work of building, marketing, and distributing applications on top of it."

I don't agree with this argument. When both margins and the market are high, the incentives to stay on the sidelines and only provide the hardware layer are not there. AWS consistently copies available open source competitors in the data products layer and so does GCP.

---

I do believe that there is a lot of room for innovation by new incumbents, but they have to follow the advice from Frank Slootman himself: start amping up sales when you check majority of the boxes around product features. see https://stassajin.medium.com/review-of-amp-it-up-f433ae2bbb3e.

Expand full comment

eeeh, I'd say that AWS strategy is mostly for lock in to AWS core services though. The point, I suspect, isn't make money directly from turning some open source thing into an AWS service; it's to ensure that people who want to use that service use it on AWS. (Plus, there's probably some kind of arms race in that, where if GCP can come along and say we've got push button versions of this and that open source thing, AWS needs to have the same thing. So they pile on these services, but they're still all loss-ish leaders for EC2.)

Expand full comment

Yeah the consumption model can be a nasty surprise when the bill shows up, but CXOs will figure that out and put guardrails in place. Snowflake's genius is its data cloud & marketplace model, which is easy for citizen data scientists and business people to understand and spin up in ways they could not do with earlier generation data warehouses and analytics tools. My guess is that most businesses leverage only a fraction of the data that goes through their systems. So, what is Snowflake's ROI? If it costs 2X but you get 4X value, then the business case looks good. I recently wrote a blog post on the last days of the legacy data warehouse. The question is, how quickly can IT teams get their data from an old DW into a new data cloud? It's not fast or easy.

Expand full comment

As a slight aside (and as a half criticism of the data industry as a whole), I'm somewhat skeptical of the argument that we only use a fraction of our data. It's a common argument, and in some sense it's true - there's certainly more theoretical potential than what we get from it. But is it realistically extractable? I don't know. It's a convenient argument to make if you're selling a data product, but I'm not sure if it's right.

Expand full comment

Hi Benn, how would you put the data catalog category in products which are easy to buy? I have seen data catalog inflated evaluation criterias motivated by we want a confluence cum social cum data governance cum observability cum integrations cum consumption layer cum data discovery cum search demands? With some blame at vendors encompassing their offering as platform for everything approach. Is there a way to be called even innovative in this slugfest?

Expand full comment

Eh, I think the problem is there's not that much value in a catalog that's just a catalog. It's not as "live" as docs generated by something like dbt; it doesn't tell you what's happening like observability; it doesn't tell you how it's used like a BI/discovery too. That means you've got to take over some adjacency to be useful, especially if you've raised venture money and have to build a big business.

Expand full comment

I have an interesting anecdote regarding DBT on same idea. We love the product, but we think DBT pricing and value communication is a bit of hassle, for one of the info-sec enterprise crazy use - option of DBT was either build (host your own) or enterprise offering. But it's pricing is confusing (50K aws marketplace costs for 10 licenses a year vs 50 dollars monthly for standard). DBT yet is not a product but a way of working. The I-wanna-avoid-sales, cum price approvals, cum databricks heavy base made us use dtaabricks' delta live tables (a jumbled half of what DBT and Great expectations do together with python added to mix) instead of DBT

Expand full comment

Eh, that seems somewhat like the natural evolution of a product like dbt, where product adoption came first, and the business model came second. I suspect that'll change over time, and evolve into something that smooths out these kinks. Speaking from experience, it takes a fair amount of time to develop good pricing models where you don't end up with weird discontinuities between plans, or things like that.

Expand full comment

😄 true. the natural evolution of product interrupted by funding sprees and investor enthusiasm

Expand full comment

Hot take, the second pillar is maybe neater as a feature than a standalone too. Might be an answer to your perplexing first footnote

eg the recently repurposed https://cloud.google.com/dataform

Expand full comment

A feature to what though? (fyi, an idea like this is in my bad timeline for dbt, but curious how you see it)

Expand full comment

Ernest Prabhakar

More than possible. I’m pretty sure I can merge this with CloudFormation somehow to do exactly that in YAML

https://github.com/TheSwanFactory/fridaay

Expand full comment

Ernest Prabhakar

> This principle—be the warehouse for the modern data stack—could be extended to more fundamental characteristics of the database

Dang it, you went and told everyone my new business model: becoming the thin integration layer over commoditized compute and storage, that makes it trivial for end users to build their own SaaS-like apps.

Fortunately, as with all great ideas, I am confident nobody will believe you (or me).

Expand full comment

Something like this _seems_ possible...

https://twitter.com/bennstancil/status/1553145045896814592

Expand full comment

Hmm. Your article is full of inaccuracies. When reading your article, you assume that most large companies, which employ a lot of data experts, don't know anything about Snowflake. That these same smart people would recommend their companies, who probably pay lots of money to maintain data warehouses in expensive data centers with an army of SysAdmin, DevOps, and 24/7 support team, to migrate TB or PB of data to Snowflake without doing some due diligence. That Snowflake potentially welcoming new customers would not spend the time and money to make sure the client is happy.

There used to be this misconception that Apple was more expensive and Macs were more expensive. People didn't understand what Apple was building, that design and well-integrated hardware and software were keys. Same with Snowflake.

I would respond: You get what you pay for.

If you want to manage a data warehouse with Excel and some CSV files, go for it. If you want to be able to centralize your data in 1 place, have unlimitted storage and scale in milliseconds, access to their "Data marketplace," which allows you to enhance your datasets by joining your data with external "studied and aggregated" data without importing/exporting anything, use python or SQL, share your data warehouse with another department with a few click and monitor your resource usage, then I would go with Snowflake.

The problem is not the software itself. It's the people who need to learn how to integrate it and use it. I've set it up for many clients who use DBT, Fivratran and pay less than $100/month for Snowflake. Is that expensive?

Snowflake will be expensive, just like AWS RDS, Aurora will be costly if you over-provision your database server and abuse it with poor queries or database architecture.

I understand that companies got excited about giving analysts access to Snowflake without fully understanding the ramifications of a "free-for-all warehouse unit access". It is exactly like AWS Cloud services. Your AWS bill will go through the roof if you don't control who can do what. Cloud is elastic. Just make sure your team only pulls a little on the rubberband.

Expand full comment

This is...basically what I said?

"t’s a fallacy to assume that high prices that are created by rising demand will drive people away from Snowflake....If you hammer Snowflake with queries, they can try to make those queries cheaper to process. But at some point, if you use it a ton, they’ll bill you a ton."

Expand full comment

Upper case is the SQL standard default unless qualified in double quotes. This is standard in most RDBMS, SQL Server being a notable exception in not following SQL Standard.

Expand full comment

It's a bit of more nuanced than that. Though most DBs default to being case-insensitive (as does Snowflake), when you query its information schema, case-insensitive objects are returned in upper case, with no indication as to whether or not they're actually case sensitive (which they can be). So if you want to use the information schema to power things like autocomplete, you have to use exactly what's returned to you - ie, almost always upper case - even though you could downcase it.

Expand full comment

Not sure I follow, unless you are expecting only one object with that spelling.

SELECT table_name FROM TABLES WHERE UPPER(TABLE_NAME) LIKE 'FOO%';

returns

FOO

Foo

Expand full comment

So it's a kind of specific thing:

Say you want to get tables from Snowflake to populate an autocomplete function (like we do at Mode). You can get the names from the information schema, but the names are returned in all caps. For most tables, they aren't case-sensitive, so you could autocomplete the names in lower case, and it'd still work. However, they aren't guaranteed to be case-insensitive, so you can't downcase everything and know for sure that it'd work.

For example, if the name of the table in the information schema is "FOO," it's *probably* case insensitive and "foo" will work. But it's possible that someone named the table "FOO" all in upper case, which would return the same name in the information schema table. But for that table, "SELECT * FROM foo" won't work.

Expand full comment

It's called "qualification"; if you want mixed case, you have to embed in double quotes; same for columns. This is standard SQL. SQLServier does not follow the standard.

SnowBuilder (pat. pend.) always qualifies object names and column names.

DBVis handles this just fine.

create table foo (id numeric);

create table "foo"(id numeric);

select * from f

Gives both tables in auto-completion.

Expand full comment

Right, I understand that's how you set it up or query it. My point was just that 1) I don't want to write queries with quotes, 2) I want to autocomplete table names to lowercase, but 3) you can't do that because snowflake doesn't tell you if a table is called FOO (case-insensitive) or "FOO" (case sensitive).

Expand full comment

Sep 3, 2022Edited

The relational database is the most overleveraged, overexposed services sector tool of all time, and that the relational database with hand-tuned schema and queries is optimal to automated machine generated schema and queries is one of the biggest shell games of all time - it's gone on for years.

I'd argue it defines IT and most operational roles these days.

The proliferation of the relational database and relational data schema was developed in the early 1970's in order to service and capture share of the transactions created by the US economy's shift to full-on services and a debt instrument based economy.

Some really interesting reading if you want to get really heady.

IBM's memoriam of the founder of the relational database:

https://www.ibm.com/ibm/history/exhibits/builders/builders_codd.html

WTF Happened in 1971:

https://wtfhappenedin1971.com/

What is really funny about the IBM memoriam is that the same arguments were held in the early 1970's about query optimization, what to store, etc. Sounds like a lot of the discourse of the last 2-3 months around Snowflake, among others...

Remember, the default Unix timestamp is 00:00:00 UTC on 1 January 1970.

---

The entire concept of the relational database is that it developed to support the debt-serviced, equities-laddered, decline-of-Bretton-Woods services growth of services businesses servicing other services businesses after the early 1970's - when the US and Western economy as a whole shifted away from a manufacturing economy to a full on services economy with complex financing.

As such the database and any derivative products are overexposed to the services market as a whole.

When you have services companies parading as tech companies, and getting valued like it with cash thrown at them for equities, we end up in the state we are in today, which is databases servicing other databases made by unprofitable companies servicing other unprofitable companies, with add ons to fan cloud compute that are other services businesses making point solutions off other databases under the hood of their data products.

The database has always been rent-seeking, and on the cloud that is turned up to 11.

Expand full comment

If the argument is that things that are heavily used by services businesses are all a giant pyramid scheme, couldn't you make this argument about anything used by most of today's corporations? In other words, why are databases (or the data industry) uniquely "overleveraged"?

Expand full comment

Comment removed

Sep 3, 2022Edited

Comment removed

Expand full comment

I'd put that money on Snowflake Cloud? If nothing else, Snowflake started there. That's already the game they play. They need to expand their product offering, but not fundamentally alter it or their entire GTM motion. Oracle has to change everything to go from what they sell today to being a true cloud business.

Expand full comment

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts