LLMs shouldn’t write SQL

Feb 23, 2024

There's no direct path from a business question to a useful query.

23 Comments

Feb 24, 2024

I like how the English speaking world came up with Legalese because English is so ambiguous, and now a whole bunch of people pretend that they can program computers with natural language. 😅

Expand full comment

Reply (1)

Benn Stancil

Feb 25, 2024

Yeah really, if you ever want to lose all faith that English means anything, talk to your corporate lawyer.

Expand full comment

John Wessel

Mar 2, 2024

After reading the first half of this I was going to make the point that humans who had “natural language” at their disposal choose to use SQL instead of natural language to interact with data…. But you pretty much made that point already.

Secondly - everyone think back to your analyst days… When is the last time you got a coherent English paragraph as a request for an analysis/dashboard/report? Mine were always the results of a hand waving conversations with torn off sheets from yellow legal pads with unreadable diagrams and handwriting. So if AI can deal with that - sounds awesome!

Expand full comment

Benoit Pimpaud

Feb 26, 2024

Can't agree more with you Benn. Resonates a lot with recent writing (https://medium.pimpaudben.fr/sql-is-not-designed-for-analytics-079fc97b139c): SQL wasn't design for Analytics in the first place.

Still not mature enough, but I definitely think that the idea behind Malloy (aka. "data is not rectangular") is something too not overlook.

Any views on Malloy on your side ?

Expand full comment

Reply (2)

Benoit Pimpaud

Feb 26, 2024

+ encourage anyone to read that fabulous post around English as a language and relationship to AI : https://orbistertius.substack.com/p/english-is-a-terrible-programming

Expand full comment

Benn Stancil

Feb 27, 2024

Ehhhhh. I think it's clever, but I've never found it particularly intuitive. My critique of it is basically the same as what I'd say about OLAP cubes (https://benn.substack.com/p/ghosts-in-the-data-stack): It hides the raw thing that you're working with, and I find that very challenging to reason about. Everything is a kind of abstraction on top of the underlying tables. And that's the fundamental problem for me - I want these languages to operate like functions on top of data, where I can understand the input, what the function does, and the output. Malloy (and every semantic abstraction that hides the underlying tables) doesn't give you that.

Expand full comment

Reply (1)

Benoit Pimpaud

Feb 28, 2024

Hum I see your point; but working "the old-way" with Spark/Scala, that allows to see inputs/function/ouputs (at least not hidding too much), was quite terrible... (or maybe not 🤔?)

True that it was probably more a human/management problem rather than a language issue.

Finding the right abstraction for this problem sounds like a never ending optimization function... SQL/Malloy/etc. seem to be too much abstraction while Scala/Spark too much complicated (not complex).

While not a big fan of it, but sounds like the middle class here is Python; or at more accurately Polars-like frameworks

Expand full comment

Reply (1)

Benn Stancil

Mar 4, 2024

Python seems like it's probably the right way of doing this tbh, but Pandas has always felt extremely unintuitive to me. I feel like it does everything in a way that's the exact opposite of my instinct for how I expect it to work.

Expand full comment

Reply (1)

Benoit Pimpaud

Mar 4, 2024

Ahah can't agree more with Pandas... chaining methods improved a bit the thinking at some point, but I feel like you...

I got a comment in a recent post about how the tidyverse in R is answering some pains... As a former R user it's true that dplyr or ggplot have very good semantics and they were really pleasant to use. Maybe something to deep dive soon 👍

Expand full comment

Reply (1)

Benn Stancil

Mar 5, 2024

I used R a good bit waaay back in the day (like 2007), but basically haven't touched it since then. But every time I see snippets of it, it seems a lot more natural than pandas. But my experience is so shallow there, who knows.

Expand full comment

Chris Coke

Feb 26, 2024

Tell your brother to check out Change Research for a modern polling / data analytics firm that works with state legislature candidates all over the country (disclosure: I used to run data science there)

Expand full comment

Reply (1)

Benn Stancil

Feb 26, 2024Edited

Dope, I will let him know. Thanks for sharing!

Expand full comment

Mike DeCarlo

Feb 24, 2024

I'm still upset the name "draggy-droppy BI interface" didn't make it into the Mode product. I fought so hard for that.

Expand full comment

Reply (1)

Benn Stancil

Feb 24, 2024

What could have been, that was missing piece all along.

Expand full comment

Alex

Feb 24, 2024Edited

Stephen Wolfram's point has always been that a computational language (read: Mathematica, of course) cannot be displaced by natural language. Computational language differs from natural language precisely insofar as it is constrained and unambiguous: one thing cannot mean another thing. There is no "but I actually meant this." The logic and reasoning of computational language is reproducible - you can always follow the same inputs and get the same outputs.

Natural language is not as precise nor as constrained. Words have "fuzzy borders." Interpretation is ambiguous and not even reproducible person-to-person (i.e. people will have different interpretations of the same words).

For example, if I asked ChatGPT for "the daily average temperature over the last month", am I asking for the average temperature each day over the month (line chart), or an average of that line (a number)? What if I wrote "the average daily temperature over the last month" instead? Not clear! But in SQL of course it is very clear.

However, just like with other people, we can have a discourse in natural language to arrive at computational language. This works from business people to programmers, so it is possible that one day it works from business people to ChatGPT as well.

Expand full comment

Reply (1)

Benn Stancil

Feb 24, 2024

Yeah, I'm sure there are lots of much more formal (and well researched) ways to describe this. (Any time I find myself on any wikipedia page related to linguistics, i'm always shocked by how far that rabbit hole seems to go.).

The weird thing is, not only does this all seem pretty well understood, it seems pretty intuitive? Like, it's a pretty obvious point and something that most everybody would say, yes, of course, that makes sense. And yet, we seem to be constantly ignoring it in how we're building some of these AI things. Which, maybe it's because of your last point, and we all assume that AI can "just figure it out." Which, I dunno, I guess maybe one day it can?

Expand full comment

Joaquin Roibal

Feb 23, 2024

Benn,

I absolutely love your thoughts and fresh takes regarding Data, AI, all things "Tech". A few points stuck out to me about today's blog that I would like to comment on:

-I've never learned SQL, but I do know python very well (especially data frames, data analysis, etc). Your description of SQL as a cockroach curiously makes me want to learn it now, haha. Thought that was such a great descriptor. Also your knowledge and passion for SQL also makes me want to learn it, if for no other reason that to see what I am missing.

-Strangely enough (and especially after admitting my lack of experience with SQL) I think that what AI does best--better than any other task I have found--is the technical translation from Natural Language (plain english) into workable code. For Example, I spent about 3 months writing a ML algo stock options trading bot based upon a random forest algorithm, and last night I decided to "try again" rebuilding a ML Algo stock options bot from scratch using AI, and asking the AI for specific recommendations and to write the code itself, and "we" decided on a Gradient Regression, which works perfectly and I used today for the first time on Apple Stock (worked great). On this vein, I've recently heard about the lack of COBOL users, legacy code, etc, and actually I bet that AI is able to fill in as COBOL programmer for legacy code / fix issues that many people are unaware of (a bit of my own side project). So I find it very...strange (?) that AI is unable to work with SQL, and I know that AI is powerful at data analysis tasks--again this is another reason I am curious to explore SQL a bit further.

Anyways just wanted to say thanks for your blog, writing, and thoughts. You have a new perspective on something that I think has too many thinkers thinking in the same direction ("AI WILL REVOLUTIONIZE THE WORLD"), but yours is a new and refreshing take. One idea that I loved during the data camp podcast was your idea of "teleportation" in technology, that instead of going one direction quickly (which is the way I always thought of it as), we are "teleporting" to an entirely new, and unexpected area. I will definitely be listening to that podcast a few more times!

Expand full comment

Reply (1)

Benn Stancil

Feb 24, 2024

Thanks! I appreciate that, and glad you like it.

- On SQL, yeah, it's definitely useful (and I've always found it much easier than Python or R to be honest). But that's one of the interesting things about it; it requires a different way of thinking about problems than those languages. It's hard to describe. I think it's because it's so table-oriented, you start to think of every data problem around tables and joins rather than functions. I'm not sure if that's good or bad, to be honest.

- On AI working with SQL, I think we're actually saying the same thing? I think that AI can help as the sort of assistant you're describing, where you're basically pair programming with it. In that case, you wouldn't be giving it a business problem and saying write code that solves this thing for me; you'd be saying, I want to write a program that's shaped like this, can you fill in the pieces? It's still a kind of natural-language-to-code flow; it's just that you're telling it how you want the code to work. That seems to work pretty well (for SQL and for software engineering with things like copilot). What *doesn't* seem to work is asking it some business question - why are sales down, or whatever - and having it figure out what to do from there.

Thanks again for the note, and glad you like the posts (and the podcast)!

Expand full comment

Chris Stanley

Feb 23, 2024

this assumes that we have business questions that AI hasn't already answered for us!

Expand full comment

Reply (1)

Benn Stancil

Feb 24, 2024

True. One day we won't see owls, the will simply be rendered for us on our apple vision pros.

Expand full comment

Kyle P Rasku

Feb 23, 2024

Yes there is, but in order for it to be reliable and high-performance, it cannot rely on LLMs alone. Liquify Analytics is building it. It’s coming along great.

Expand full comment

Reply (1)

Kyle P Rasku

Feb 23, 2024

Meaning, yes there is a path from business question to SQL - it just isn’t what “everyone” currently thinks it is.

Expand full comment

Reply (1)

Benn Stancil

Feb 24, 2024

Agreed. It's not that there's not a path from question to query that runs through AI, but I think there's got to be something else in between those two steps other than just AI.

Expand full comment