Discussion about this post

User's avatar
Michael's avatar

I am a pretendgineer (although I like to think I do a good job lol).

The way we work at my company is that a pretendgineer will build something useful (but not "optimized" or "efficient" or whatever) and then once data engineering sees that other people find it valuable, they will formally build a model themselves.

That seems to be a decent way to get over data engineering's hesitancy about taking on projects they don't totally understand and/or see the demand for, while not making the business beholden to convincing a couple people that something is a good idea.

Definitely not a perfect system, but I think it's good to have a bias toward building

Expand full comment
Tom Waterman's avatar

Great post as always Benn!

I think you’re exactly right. We don’t have a Django or Ruby on Rails equivalent. We only have a … HTTPServer?

Modeling techniques like DataVault, Kimball, ActivitySchema aren’t it. They online describe the structure of the tables at the start or end of the pipeline. We need something that describes how to organize the code in the middle. And then a library that takes all the boilerplate out of doing it that way.

I hope that a DHH kind of white knight comes along and figures this all out. I’m honestly not holding my breath though - I think data pipelines are just inherently messy and imposing too much structure just doesn’t work. With a web app you’re always going to load your model before rendering a view; with a data pipeline there are no guarantees about the “best” order of operations.

I would love someone to prove me wrong though. The lack of a shared design framework is the #1 reason that large dbt projects invariably regress into shanty towns IMHO.

Expand full comment
38 more comments...

No posts