6 Comments

"Well, it’s sorta like that. dbt models reference each in a much cruder and more brittle way than proper ...functions. They can't take arbitrary inputs, and you can only reuse outputs, not the functions themselves. For example, you can’t create a function for adding sessions to an event stream, and then reuse that same transformational logic on several different tables"

You may be alluding to this (or not) but your article seems to be in the same direction as my own thoughts in that there is an unnatural bifurcation for evaluating data at runtime (in apps, orchestrators etc.) and data at rest ( in sql engines etc) and the resulting various ways we model it. I think if we lost that distinction we'd be better off addressing the messiness you describe.

Expand full comment
Jul 28, 2023·edited Jul 28, 2023Liked by Benn Stancil

"A high-quality dataset is one that is consistent with the business concept it represents... those business concepts are often fluid."

Great point.

Data correctness is extraordinarily difficult, and most of the time, not worth it. Do you need 100% accurate data to make decisions? What sort of decisions are you making that hinge on a 2.4% conversion rate versus 2.6%?

Of course, you don't want to mistake a 2.4% conversion rate for 240%. We want data to be "mostly correct", but not "perfectly correct" because this is an unrealistic goal that wastes tremendous time and effort. A single unaccounted-for record will jeopardize this utopia.

In practice, people want correctness for the most important figures, just like they want clear definitions for the most important business concepts. For the other 90%, people want things to be "mostly right" and "mostly clear", but some ambiguity is inevitable. As we all know, there is a velocity-quality tradeoff, and if we vigorously pursue the elusive quality at all costs, we will correspondingly sacrifice velocity.

In my experience, business teams generally prefer fast and approximate over slow and perfect. (It is doubly painful when the "slow and perfect" team ends up producing mistakes as well, just like everyone does.)

Expand full comment
Jul 28, 2023Liked by Benn Stancil

Check out sqlmesh and sqlglot. Similar concept and already have fairly large community including usage at some large companies: https://sqlmesh.com/ (I'm not affiliated)

Expand full comment