(The list was from the HBO site for the show, which seemed like the most fair way to do it? I was surprised the guy from Seasons 1 and 2 wasn't there, but I can't be putting my own suspicions on the scale https://www.hbo.com/the-white-lotus/cast-and-crew)
>In other words, the whole warehouse-native idea might’ve been a bad one, but not entirely. If the world gradually coalesces around common data storage frameworks, people might not rebuild applications on top of databases but on top of data formats. A BI tool could run a DuckDB engine and connect it directly to SAP’s Iceberg bucket. A marketing tool could read from Stripe’s bucket. The next AI BDR could read from a CRM’s bucket. It’s not quite one customer list to rule them all, but it’s a bit closer.
Before warehouse-native was a big thing in MDS, this was already in place with Snowflake Data (and to some degree, Databricks Delta) Sharing. The only thing that's changed now is the popularity of Iceberg and open table formats in general.
Warehouse-native became popular because it helped reduce COGS on the service provider - "we'll use your storage and compute instead of maintaining ours"; however, there were some financial disadvantages to this approach as well (i.e. less revenue). This model also aligned quite well to a data platform sales reps' incentives.
From what I can tell in the SAP PR (not the DBX one), all SAP is doing is building their "new" embedded analytics tool on top of Databricks, and then allow for "bi-directional" sharing of data between joint customer accounts. Companies have been doing this for years on Snowflake, just not explicitly Iceberg.
There's a problem with this approach though if not implemented correctly. For it to work at scale, SAP will need to have compute and storage in effectively every region their customers are in, otherwise egress costs (for both parties) get in the way. It's absolutely possible, but if I'm a customer, I'm not buying it if data is being queried from a different CSP or region.
Yeah, I'm with you that "sharing data via files" isn't exactly a revolutionary idea. People have done this before with basic FTP, or CSVs, or whatever. Some standard framework is what makes it work though, to the extent that it could (which I'm not at all sure it can).
I think that's right about SAP, but even if it is, that's sort of the point? If they are building something underneath Databricks, they're doing it by using some storage format that other engines could use too. Sure, they have all the UI hooks with Databricks, but a giant bucket of Iceberg data could just as easily be hooked up to Snowflake or anything else. Which is why it seems more interesting than just "we are sharing your SAP data with Snowflake through their old sharing features" - because that version supports one engine, and this version supports lots of them. (Or, cynically, that's also the other reason they might've done it with Delta; because Delta is compatible with other data formats, but Snowflake et al isn't compatible with Delta, so it has the appearances of being open but isn't quite.)
One of the blog posts said that they can deploy it across different clouds, so I'm assuming they will likely support other regions too. But also, to your point, that complicates any sort of cross application sharing.
Databricks' entire marketing strategy is "the appearances of being open but isn't quite."
Databricks' Uniform feature means that they could use either Delta or Iceberg (or both) but I guess they'd use their own Delta format to lock customers in. Delta is 'open' but has more/better features if used through Databricks' own tooling.
At some point integration and customisation might become more expensive that developing it yourself (with some prompting). So why not then build it directly on top of your own data stack built on top of (LLM friendly documented) open source data formats.
bro, your survey is missing some key people and/or an "other" field that isn't the monkeys; what good is bad data?!
The monkeys / other!
(The list was from the HBO site for the show, which seemed like the most fair way to do it? I was surprised the guy from Seasons 1 and 2 wasn't there, but I can't be putting my own suspicions on the scale https://www.hbo.com/the-white-lotus/cast-and-crew)
That link is organized by “introduced in” !!! There’s foul play afoot
But Belinda!
>In other words, the whole warehouse-native idea might’ve been a bad one, but not entirely. If the world gradually coalesces around common data storage frameworks, people might not rebuild applications on top of databases but on top of data formats. A BI tool could run a DuckDB engine and connect it directly to SAP’s Iceberg bucket. A marketing tool could read from Stripe’s bucket. The next AI BDR could read from a CRM’s bucket. It’s not quite one customer list to rule them all, but it’s a bit closer.
Before warehouse-native was a big thing in MDS, this was already in place with Snowflake Data (and to some degree, Databricks Delta) Sharing. The only thing that's changed now is the popularity of Iceberg and open table formats in general.
Warehouse-native became popular because it helped reduce COGS on the service provider - "we'll use your storage and compute instead of maintaining ours"; however, there were some financial disadvantages to this approach as well (i.e. less revenue). This model also aligned quite well to a data platform sales reps' incentives.
From what I can tell in the SAP PR (not the DBX one), all SAP is doing is building their "new" embedded analytics tool on top of Databricks, and then allow for "bi-directional" sharing of data between joint customer accounts. Companies have been doing this for years on Snowflake, just not explicitly Iceberg.
There's a problem with this approach though if not implemented correctly. For it to work at scale, SAP will need to have compute and storage in effectively every region their customers are in, otherwise egress costs (for both parties) get in the way. It's absolutely possible, but if I'm a customer, I'm not buying it if data is being queried from a different CSP or region.
Yeah, I'm with you that "sharing data via files" isn't exactly a revolutionary idea. People have done this before with basic FTP, or CSVs, or whatever. Some standard framework is what makes it work though, to the extent that it could (which I'm not at all sure it can).
I think that's right about SAP, but even if it is, that's sort of the point? If they are building something underneath Databricks, they're doing it by using some storage format that other engines could use too. Sure, they have all the UI hooks with Databricks, but a giant bucket of Iceberg data could just as easily be hooked up to Snowflake or anything else. Which is why it seems more interesting than just "we are sharing your SAP data with Snowflake through their old sharing features" - because that version supports one engine, and this version supports lots of them. (Or, cynically, that's also the other reason they might've done it with Delta; because Delta is compatible with other data formats, but Snowflake et al isn't compatible with Delta, so it has the appearances of being open but isn't quite.)
One of the blog posts said that they can deploy it across different clouds, so I'm assuming they will likely support other regions too. But also, to your point, that complicates any sort of cross application sharing.
Databricks' entire marketing strategy is "the appearances of being open but isn't quite."
Databricks' Uniform feature means that they could use either Delta or Iceberg (or both) but I guess they'd use their own Delta format to lock customers in. Delta is 'open' but has more/better features if used through Databricks' own tooling.
Fits a bit into the story that SaaS might get less attractive due to a GenAI driven decrease in software development cost, e.g. https://www.forbes.com/sites/josipamajic/2024/09/30/the-end-of-the-saas-era-rethinking-softwares-role-in-business/
At some point integration and customisation might become more expensive that developing it yourself (with some prompting). So why not then build it directly on top of your own data stack built on top of (LLM friendly documented) open source data formats.