Writing directly to a DW (or filestore) is something the D365 ecosystem has done for some time now. At first it seemed like a bananas proposition. "Why would you do that FOR me!? What's the catch?". In reality - at least in the beginning - the win was more for MSFT than for the consumer imho. It meant that systems never built to report or host giant API queries could go about their merry way in the cloud and let any knucklehead-ery happen on someone else's time in someone else's sandbox. After some time it's come to be a really handy tool (that someone else maintains!) that allows a first-drop of data that can be cleaned, or not, combined/crunched, or not, or pushed downstream to bigger fancier systems.
Writing directly to the warehouse like this isn't an obvious winner to me, but it definitely seems like something that more companies should try. That said, I guess it's probably an expensive thing to build, especially for the first time. Which maybe makes sense why it happens in the MSFT ecosystem, where there's probably more consistency across tools.
I actually _just_ learned that (at least one of) the reasons MSFT killed the DW 'Data Export Service' (which was free) was because it was a huge lift to maintain the pipelines/cdc on the MSFT side for free. The move to lighter file-only copy pipelines ('synapse link to dataverse' or something similar) gives the user less - but is still a _really_ nice start to the 'lakehouse' long-term reporting/data strategy discussion. In the end - I think for big platforms like SFDC and D365 getting reporting/integration/data users _out_ of the OLTP DB is the real win long-term. If nothing else it gives a between-the-lines suggestion for how to keep paid-platform/OLTP setups 'lean' in terms of what data you KEEP online and what data you archive and keep long-term on your own.
Yeah, that also points to an interesting in between companies could provide, where you can export data in easy ways, but they don't manage writing it to the database. I suspect that makes it a lot simpler to build, and databases could probably create receiving pipelines on the other end (something like Snowpipe for Snowflake) that make connecting the two straightforward.
I would love to see a breakdown of data volumes and source. To your point, we probably only need the big SaaS providers follow in SalesForces' footsteps—connecting directly to DWs—to turn the market. You also have companies like CapitalOne building Slingshot (https://www.capitalone.com/software/solutions/) which deepens the DTC connection.
Yeah, my guess is that most usage comes from a handful of sources, but there are so many SaaS tools now, I'm not that convinced that's true. The tail could be really really long at this point.
I just checked and looks like they support ~160 connectors. The big providers like SFDC, SAP, Segment, Adobe, Facebook, GA, etc could chip away at their profits by going DTC but that won't be a nail in the coffin.
I think the bigger issue not addressed is the current stack is involves: Ingestion -> Transformation -> DW -> BI Tools / Reverse ETL.
Platforms are emerging that combine ingestion, transformation, and reverse ETL by using bi-direction streaming + stream-processing. They connect to the operational DBs of these SaaS products.
The drawsbacks are...
1) People are still intimidate by streaming data
2) Building pipelines in these tools is not as easy
However, if these gaps get solved, teams will be comparing FiveTran + Census to companies like Confluent.
I agree that we'll eventually get to a point where streaming becomes the norm. But it seems to me that if the winner of the space is an ETL vendor that figures out streaming, Fivetran's in the lead to be that company. Sure, they don't really do streaming now, but they've got lead in cash and distribution.
Writing directly to a DW (or filestore) is something the D365 ecosystem has done for some time now. At first it seemed like a bananas proposition. "Why would you do that FOR me!? What's the catch?". In reality - at least in the beginning - the win was more for MSFT than for the consumer imho. It meant that systems never built to report or host giant API queries could go about their merry way in the cloud and let any knucklehead-ery happen on someone else's time in someone else's sandbox. After some time it's come to be a really handy tool (that someone else maintains!) that allows a first-drop of data that can be cleaned, or not, combined/crunched, or not, or pushed downstream to bigger fancier systems.
Writing directly to the warehouse like this isn't an obvious winner to me, but it definitely seems like something that more companies should try. That said, I guess it's probably an expensive thing to build, especially for the first time. Which maybe makes sense why it happens in the MSFT ecosystem, where there's probably more consistency across tools.
I actually _just_ learned that (at least one of) the reasons MSFT killed the DW 'Data Export Service' (which was free) was because it was a huge lift to maintain the pipelines/cdc on the MSFT side for free. The move to lighter file-only copy pipelines ('synapse link to dataverse' or something similar) gives the user less - but is still a _really_ nice start to the 'lakehouse' long-term reporting/data strategy discussion. In the end - I think for big platforms like SFDC and D365 getting reporting/integration/data users _out_ of the OLTP DB is the real win long-term. If nothing else it gives a between-the-lines suggestion for how to keep paid-platform/OLTP setups 'lean' in terms of what data you KEEP online and what data you archive and keep long-term on your own.
Yeah, that also points to an interesting in between companies could provide, where you can export data in easy ways, but they don't manage writing it to the database. I suspect that makes it a lot simpler to build, and databases could probably create receiving pipelines on the other end (something like Snowpipe for Snowflake) that make connecting the two straightforward.
I would love to see a breakdown of data volumes and source. To your point, we probably only need the big SaaS providers follow in SalesForces' footsteps—connecting directly to DWs—to turn the market. You also have companies like CapitalOne building Slingshot (https://www.capitalone.com/software/solutions/) which deepens the DTC connection.
We are already see the DTC movement for operational DB. See Google's beta for DataStream product: https://cloud.google.com/datastream-for-bigquery
Yeah, my guess is that most usage comes from a handful of sources, but there are so many SaaS tools now, I'm not that convinced that's true. The tail could be really really long at this point.
I just checked and looks like they support ~160 connectors. The big providers like SFDC, SAP, Segment, Adobe, Facebook, GA, etc could chip away at their profits by going DTC but that won't be a nail in the coffin.
I think the bigger issue not addressed is the current stack is involves: Ingestion -> Transformation -> DW -> BI Tools / Reverse ETL.
Platforms are emerging that combine ingestion, transformation, and reverse ETL by using bi-direction streaming + stream-processing. They connect to the operational DBs of these SaaS products.
The drawsbacks are...
1) People are still intimidate by streaming data
2) Building pipelines in these tools is not as easy
However, if these gaps get solved, teams will be comparing FiveTran + Census to companies like Confluent.
If I had to bet on what unseats FiveTran, it's the abstractions on streaming that finally make it commonplace. You can see this now with Materialize looking to make the experience of using streaming vs postgres identical: https://datastackshow.com/podcast/the-future-of-streaming-data-with-stripe-deephaven-materialize-and-benthos/
I agree that we'll eventually get to a point where streaming becomes the norm. But it seems to me that if the winner of the space is an ETL vendor that figures out streaming, Fivetran's in the lead to be that company. Sure, they don't really do streaming now, but they've got lead in cash and distribution.
Very interesting and solid post. Thank you for sharing! You can check out one of our articles in here https://www.metridev.com/metrics/cycle-time-vs-flow-time-decoding-the-metrics/
Good insight! I love the analysis very much!
BTW, is there any chance that Fivetran will develop its own data warehouse or database to strengthen the ecological barriers.