Discussion about this post

User's avatar
Michelangelo D'Agostino's avatar

I'll age myself by saying that I remember Impala, and also was at the first Databricks Strata tutorial circa 2014. I started using both Databricks and Snowflake relatively early on--2016/2017. And while I love, love, LOVE your posts, I think there's one thing you get wrong here. Before roughly 2019, Databricks wasn't at all "a big, fast database that you can write SQL and Python against." Yes, you could query tables with SQL, but all of the underlying stuff you had to do with s3 and cluster management made it feel a lot more like Hadoop than Redshift or Snowflake. So much so that my DS/ML teams used Databricks because we liked python, but it was totally infeasible to make our Analytics team use it instead of Snowflake.

That all changed in 2020 when Databricks released Delta and very slowly integrated it into their product offering. Delta is basically OSS Snowflake, and since then, Databricks and Snowflake have been slowly converging. Finally in the last year or so Delta feels a lot like Snowflake (with a nice UI, simplified SQL clusters like Snowflake warehouses, etc.). So it really is a big, fast database that you can program with python, scala, SQL.

Approaching from the other direction, Snowflake has tried to open itself up to python with Snowpark, where they essentially copied the Spark API, but as far as I can tell it's mostly just marketing hype. I don't think Snowpark python is even generally available yet.

So I agree--you're totally right about how Databricks should be marketing itself now. But I think their tech couldn't back that up before the last year or two... Not that that usually stops the marketing people. But maybe as reluctant academics that had a bit more shame?

Expand full comment
Ian Thomas's avatar

As someone who has been in the data industry for a long time, and who spent the year between 2012 and about 2018 feeling vaguely stupid much of the time for my inability to mentally stitch together the myriad Big Data technologies that were constantly emerging, merging and disappearing during that time, I find this post to be extremely soothing. Perhaps there is some kind of entropic data tech law that dictates that, eventually, all data tech becomes databases?

Expand full comment
18 more comments...

No posts