The case for better business models.
I spoke to Lloyd a week ago about dbt models that could express their logic as Malloy! Way ahead of you :P
I have zero confidence that this will actually happen in some kind of near-term, but I say it just because I agree with you. I don't think dbt-as-a-framework should insist that you author your business logic in SQL, it's just that this was the best place to get started. Imperative languages like Python, languages-that-compile-to-SQL--all of these are reasonable ways to express business logic. It's just important that they can all execute in the cloud data platform and not require data movement. Ultimately it would be great if dbt-as-a-framework could understand the syntax of each of these expressions and use that knowledge to not just operate on your business logic as text (as it does today) but know exactly what you are expressing.
Lots and LOTS of work to do.
I do think that dbt's concept of a metric will allow you to do exactly what you're describing with the sales funnel though. We do not currently define a metric of type "funnel" but if you could describe how that should work we could--or you could--implement this (the metrics definitions themselves are all in user-space and all Apache 2.0). Define what a sales funnel is, configure as many of this type of metric as you please.
Great post Benn and thanks for the shoutout. There is a lot of power in the relational model that isn’t accessible via SQL. This is what inspired us at RelationalAI to work on a relational language and system to help people get further than they can get with SQL alone without having to give up the relational model and all its goodness. It’s also exciting to see Tristan write that dbt-as-a-framework has aspirations to support languages like Malloy. Malloy is a great example of a relational language that is not SQL. You can find many other cool and mind-blowing examples of relational languages at Jamie Brandon’s conference on HYTRADBOI (have you tried rubbing a database on it?).
If you think about it, relational logic is extremely complicated when you introduce time into it, and similarly where SQL is wanting is when you need to to metrics with a sequence element.
I think you're often better off trying to define states (as opposed to recording events) in some other type of language too. Jinja doesn't seem like a good solution there.
I love all of these posts. One thing that I think hasn't been discussed here yet, as far as I can see, is row-based versus set-based processing. Row-based processing is perhaps a bit more human intuitive (i.e. "Okay let's look at this case/thing, now this next case/thing, now this next case/thing") but it doesn't scale theoretically or practically. It's kind of like comparing deductive (row-based) to inductive (set-based) logic, although the metaphor isn't perfect. As programmers, which in one way is being applied mathematicians, if we are doing our jobs well, we are always moving towards the general case using inductive logic.
SQL in its basic form is mostly about set-based logic. And that's because relational algebra, which underlies SQL, is a branch of math that is about sets. If you force it, a la PL/SQL, cursors, or whatever other method, you can get it to do row-based processing, but that's usually a bad bargain for someone who isn't comfortable with set-based processing.
This is why, in practice, it often seems a struggle to get folks coming from a non-analytics-oriented software development background to transition to working efficiently with databases - because, in practice, other software development paradigms seem to naturally conform themselves to row-based processing. That might be just an artifact of the way we teach or describe it - I'm not saying it's inherent itself to, say, Python. I can't count the number of times I've spoken to someone and said "OK, having access to an API is nice, but now can I call the API in such a way that it gives me ALL the data rather than having to go row by row or a few records at a time, because I want to analyze the whole thing at one go?" This seems like such an intuitive request to me, but somehow that conversation regularly seems to be difficult and painful. And "painful" not necessarily from the practical concern of "it would be too costly to let you have all the data or expensive on the system, etc." but more like "Why in the world would you ever want more than a few rows at a time?"
There is probably some relationship between a language being declarative and being set-based but I'm not sure it's exactly the same thing.
Also, as others have noted here, SQL in some cases is declarative (within a given query). But.... it is very easy to accidentally code a series of SQL statements and realize that one has built the equivalent of a set of imperative transformations. Maybe not "imperative" in the sense of "read this line from a file", "replace all the cases of the letter 'a' with 'z'", "write a line", but imperative in the sense of "I as a human being can look across all of these intermediate, temporary query steps and see ultimately that there is meaningful output to this, but looking at it step by step, I'm giving very specific details to the computer regarding how to go about it that the computer can't easily optimize the various aspects of." dbt can help with that but it's not a 100% solution.
I've been working in this space for 40+years. Problem is you do need a new language and ecosystem. You also need an AI engine. And there's the rub. The thing about the semantic layer is you can't program it all. You need something that understands how to convert your specification into working code and it can't be done statically. Record by record as process the query.
So here's what I do. https://unibase.zenucom.com. It's been in service for decades building complex high performance applications - ie it's not a proposal or a nice idea - it's a real hard working semantic database.
There's a data dictionary that defines tables, fields, calculations (that might include results from other tables). It also defines relationships between tables.
There's an AI engine that can understand all that so that when a field from a table is requested it can return the stored data, or do a series of calculations and data retrievals to give an answer.
No SQL, just layouts that get interpreted against the dictionary.
It's a big system, as you'd expect after all these years, but the important thing is the semantic layer and AI engine that makes it happen.
SQL is not non-procedural as every statement is an instruction to do something. Unibase is non-procedural as everything is a specification until a program is run to produce an output from the specification. This is not splitting hairs because you don't know how the specification will be interpreted until something tries to use it.
I really like your entries, among other things because you're aware of Microsoft's prior art. Many blogs from startups in the MDS space seem to ignore Microsoft - I'm not sure if it's out of ignorance, fear, hipster coolness, or contempt, but in any case it's weird to pretend one of the biggest BI/DB players doesn't exist.
To the point of this specific post: during the past decade, Microsoft has moved away from multidimensional cubes in SSAS and focused on tabular, putting the emphasis on Power BI and the DAX language (also available in Excel by the way). I have mixed feelings about DAX but you can't argue with the fact it's widely adopted. Yes, in the end it generates SQL queries but comes with syntax and patterns to use inactive relationships when needed, compute time intelligence, and other things you'd expect for analytical/aggregate purposes. But that language is proprietary and I'll be watching with interest whether Malloy gets any traction. DAX has a steep learning curve and it's a bit distressing to tie one's career to a single vendor, even one as big as Microsoft.
Long live sql. It forever to get good at it and everyone attacks sql simply because they want to be 'thought leaders'. Get out of here with you terrible fly by night language. If you are going to solve problems just solve problems - leave sql alone.