The most brilliant article ever. Consider me a disciple.
I do have a quibble with the title though. You are (rightly!) calling for Pull based DAG execution instead of the Push execution (as in dbt run). The DAG itself is still invaluable, the problem is the push (vs. pull) orchestration model.
Sounds like you are proposing a declarative approach to data orchestration which would be similar to how kubernetes approaches container orchestration in that you declare. a desired state and the system figures out how to get to that desired state.
I am definitely on board with this. Intuitively I’ve felt this way for quite some time, but had to read your brilliance to articulate it.
I wish more people in data engineering had bash and C skill sets , or at least more Java. Plenty of patterns there for some transfer learning.
I still haven’t used dbt but I’m pretty sure the Oracle pattern the previous engineer built (code base I know own) wrote his own dbt with just sql and bash. With a lot more steps… but still. Impressive
I like the idea of working backwards, and the analogy of how a passenger only cares about the departure/arrival time makes a lot of sense! I think this framework works great for scheduled, regular jobs.
But out of curiosity Benn, how did/would your reverse orchestration system deal with ad-hoc data dumps from production databases?
Excellent article, as always, telling the perfect story. I fully agree and wrote about the same but called it "The Shift From Data Pipelines to Data Products". If anyone is interested, I believe it goes in-depth into what you wish for: https://airbyte.com/blog/data-orchestration-trends.
Down with the DAG
The most brilliant article ever. Consider me a disciple.
I do have a quibble with the title though. You are (rightly!) calling for Pull based DAG execution instead of the Push execution (as in dbt run). The DAG itself is still invaluable, the problem is the push (vs. pull) orchestration model.
Sounds like you are proposing a declarative approach to data orchestration which would be similar to how kubernetes approaches container orchestration in that you declare. a desired state and the system figures out how to get to that desired state.
I am definitely on board with this. Intuitively I’ve felt this way for quite some time, but had to read your brilliance to articulate it.
I wish more people in data engineering had bash and C skill sets , or at least more Java. Plenty of patterns there for some transfer learning.
I still haven’t used dbt but I’m pretty sure the Oracle pattern the previous engineer built (code base I know own) wrote his own dbt with just sql and bash. With a lot more steps… but still. Impressive
I have some good news. A declarative pipeline approach like the one you’re describing already exists. It’s very popular and has a lot of features.
https://youtu.be/pRGNxIz6GzU
I like the idea of working backwards, and the analogy of how a passenger only cares about the departure/arrival time makes a lot of sense! I think this framework works great for scheduled, regular jobs.
But out of curiosity Benn, how did/would your reverse orchestration system deal with ad-hoc data dumps from production databases?
Excellent article, as always, telling the perfect story. I fully agree and wrote about the same but called it "The Shift From Data Pipelines to Data Products". If anyone is interested, I believe it goes in-depth into what you wish for: https://airbyte.com/blog/data-orchestration-trends.