I like how the English speaking world came up with Legalese because English is so ambiguous, and now a whole bunch of people pretend that they can program computers with natural language. 😅
After reading the first half of this I was going to make the point that humans who had “natural language” at their disposal choose to use SQL instead of natural language to interact with data…. But you pretty much made that point already.
Secondly - everyone think back to your analyst days… When is the last time you got a coherent English paragraph as a request for an analysis/dashboard/report? Mine were always the results of a hand waving conversations with torn off sheets from yellow legal pads with unreadable diagrams and handwriting. So if AI can deal with that - sounds awesome!
Ehhhhh. I think it's clever, but I've never found it particularly intuitive. My critique of it is basically the same as what I'd say about OLAP cubes (https://benn.substack.com/p/ghosts-in-the-data-stack): It hides the raw thing that you're working with, and I find that very challenging to reason about. Everything is a kind of abstraction on top of the underlying tables. And that's the fundamental problem for me - I want these languages to operate like functions on top of data, where I can understand the input, what the function does, and the output. Malloy (and every semantic abstraction that hides the underlying tables) doesn't give you that.
Hum I see your point; but working "the old-way" with Spark/Scala, that allows to see inputs/function/ouputs (at least not hidding too much), was quite terrible... (or maybe not 🤔?)
True that it was probably more a human/management problem rather than a language issue.
Finding the right abstraction for this problem sounds like a never ending optimization function... SQL/Malloy/etc. seem to be too much abstraction while Scala/Spark too much complicated (not complex).
While not a big fan of it, but sounds like the middle class here is Python; or at more accurately Polars-like frameworks
Python seems like it's probably the right way of doing this tbh, but Pandas has always felt extremely unintuitive to me. I feel like it does everything in a way that's the exact opposite of my instinct for how I expect it to work.
Ahah can't agree more with Pandas... chaining methods improved a bit the thinking at some point, but I feel like you...
I got a comment in a recent post about how the tidyverse in R is answering some pains... As a former R user it's true that dplyr or ggplot have very good semantics and they were really pleasant to use. Maybe something to deep dive soon 👍
I used R a good bit waaay back in the day (like 2007), but basically haven't touched it since then. But every time I see snippets of it, it seems a lot more natural than pandas. But my experience is so shallow there, who knows.
Tell your brother to check out Change Research for a modern polling / data analytics firm that works with state legislature candidates all over the country (disclosure: I used to run data science there)
Stephen Wolfram's point has always been that a computational language (read: Mathematica, of course) cannot be displaced by natural language. Computational language differs from natural language precisely insofar as it is constrained and unambiguous: one thing cannot mean another thing. There is no "but I actually meant this." The logic and reasoning of computational language is reproducible - you can always follow the same inputs and get the same outputs.
Natural language is not as precise nor as constrained. Words have "fuzzy borders." Interpretation is ambiguous and not even reproducible person-to-person (i.e. people will have different interpretations of the same words).
For example, if I asked ChatGPT for "the daily average temperature over the last month", am I asking for the average temperature each day over the month (line chart), or an average of that line (a number)? What if I wrote "the average daily temperature over the last month" instead? Not clear! But in SQL of course it is very clear.
However, just like with other people, we can have a discourse in natural language to arrive at computational language. This works from business people to programmers, so it is possible that one day it works from business people to ChatGPT as well.
Yeah, I'm sure there are lots of much more formal (and well researched) ways to describe this. (Any time I find myself on any wikipedia page related to linguistics, i'm always shocked by how far that rabbit hole seems to go.).
The weird thing is, not only does this all seem pretty well understood, it seems pretty intuitive? Like, it's a pretty obvious point and something that most everybody would say, yes, of course, that makes sense. And yet, we seem to be constantly ignoring it in how we're building some of these AI things. Which, maybe it's because of your last point, and we all assume that AI can "just figure it out." Which, I dunno, I guess maybe one day it can?
I absolutely love your thoughts and fresh takes regarding Data, AI, all things "Tech". A few points stuck out to me about today's blog that I would like to comment on:
-I've never learned SQL, but I do know python very well (especially data frames, data analysis, etc). Your description of SQL as a cockroach curiously makes me want to learn it now, haha. Thought that was such a great descriptor. Also your knowledge and passion for SQL also makes me want to learn it, if for no other reason that to see what I am missing.
-Strangely enough (and especially after admitting my lack of experience with SQL) I think that what AI does best--better than any other task I have found--is the technical translation from Natural Language (plain english) into workable code. For Example, I spent about 3 months writing a ML algo stock options trading bot based upon a random forest algorithm, and last night I decided to "try again" rebuilding a ML Algo stock options bot from scratch using AI, and asking the AI for specific recommendations and to write the code itself, and "we" decided on a Gradient Regression, which works perfectly and I used today for the first time on Apple Stock (worked great). On this vein, I've recently heard about the lack of COBOL users, legacy code, etc, and actually I bet that AI is able to fill in as COBOL programmer for legacy code / fix issues that many people are unaware of (a bit of my own side project). So I find it very...strange (?) that AI is unable to work with SQL, and I know that AI is powerful at data analysis tasks--again this is another reason I am curious to explore SQL a bit further.
Anyways just wanted to say thanks for your blog, writing, and thoughts. You have a new perspective on something that I think has too many thinkers thinking in the same direction ("AI WILL REVOLUTIONIZE THE WORLD"), but yours is a new and refreshing take. One idea that I loved during the data camp podcast was your idea of "teleportation" in technology, that instead of going one direction quickly (which is the way I always thought of it as), we are "teleporting" to an entirely new, and unexpected area. I will definitely be listening to that podcast a few more times!
- On SQL, yeah, it's definitely useful (and I've always found it much easier than Python or R to be honest). But that's one of the interesting things about it; it requires a different way of thinking about problems than those languages. It's hard to describe. I think it's because it's so table-oriented, you start to think of every data problem around tables and joins rather than functions. I'm not sure if that's good or bad, to be honest.
- On AI working with SQL, I think we're actually saying the same thing? I think that AI can help as the sort of assistant you're describing, where you're basically pair programming with it. In that case, you wouldn't be giving it a business problem and saying write code that solves this thing for me; you'd be saying, I want to write a program that's shaped like this, can you fill in the pieces? It's still a kind of natural-language-to-code flow; it's just that you're telling it how you want the code to work. That seems to work pretty well (for SQL and for software engineering with things like copilot). What *doesn't* seem to work is asking it some business question - why are sales down, or whatever - and having it figure out what to do from there.
Thanks again for the note, and glad you like the posts (and the podcast)!
Yes there is, but in order for it to be reliable and high-performance, it cannot rely on LLMs alone. Liquify Analytics is building it. It’s coming along great.
Agreed. It's not that there's not a path from question to query that runs through AI, but I think there's got to be something else in between those two steps other than just AI.
I like how the English speaking world came up with Legalese because English is so ambiguous, and now a whole bunch of people pretend that they can program computers with natural language. 😅
Yeah really, if you ever want to lose all faith that English means anything, talk to your corporate lawyer.
After reading the first half of this I was going to make the point that humans who had “natural language” at their disposal choose to use SQL instead of natural language to interact with data…. But you pretty much made that point already.
Secondly - everyone think back to your analyst days… When is the last time you got a coherent English paragraph as a request for an analysis/dashboard/report? Mine were always the results of a hand waving conversations with torn off sheets from yellow legal pads with unreadable diagrams and handwriting. So if AI can deal with that - sounds awesome!
Can't agree more with you Benn. Resonates a lot with recent writing (https://medium.pimpaudben.fr/sql-is-not-designed-for-analytics-079fc97b139c): SQL wasn't design for Analytics in the first place.
Still not mature enough, but I definitely think that the idea behind Malloy (aka. "data is not rectangular") is something too not overlook.
Any views on Malloy on your side ?
+ encourage anyone to read that fabulous post around English as a language and relationship to AI : https://orbistertius.substack.com/p/english-is-a-terrible-programming
Ehhhhh. I think it's clever, but I've never found it particularly intuitive. My critique of it is basically the same as what I'd say about OLAP cubes (https://benn.substack.com/p/ghosts-in-the-data-stack): It hides the raw thing that you're working with, and I find that very challenging to reason about. Everything is a kind of abstraction on top of the underlying tables. And that's the fundamental problem for me - I want these languages to operate like functions on top of data, where I can understand the input, what the function does, and the output. Malloy (and every semantic abstraction that hides the underlying tables) doesn't give you that.
Hum I see your point; but working "the old-way" with Spark/Scala, that allows to see inputs/function/ouputs (at least not hidding too much), was quite terrible... (or maybe not 🤔?)
True that it was probably more a human/management problem rather than a language issue.
Finding the right abstraction for this problem sounds like a never ending optimization function... SQL/Malloy/etc. seem to be too much abstraction while Scala/Spark too much complicated (not complex).
While not a big fan of it, but sounds like the middle class here is Python; or at more accurately Polars-like frameworks
Python seems like it's probably the right way of doing this tbh, but Pandas has always felt extremely unintuitive to me. I feel like it does everything in a way that's the exact opposite of my instinct for how I expect it to work.
Ahah can't agree more with Pandas... chaining methods improved a bit the thinking at some point, but I feel like you...
I got a comment in a recent post about how the tidyverse in R is answering some pains... As a former R user it's true that dplyr or ggplot have very good semantics and they were really pleasant to use. Maybe something to deep dive soon 👍
I used R a good bit waaay back in the day (like 2007), but basically haven't touched it since then. But every time I see snippets of it, it seems a lot more natural than pandas. But my experience is so shallow there, who knows.
Tell your brother to check out Change Research for a modern polling / data analytics firm that works with state legislature candidates all over the country (disclosure: I used to run data science there)
Dope, I will let him know. Thanks for sharing!
I'm still upset the name "draggy-droppy BI interface" didn't make it into the Mode product. I fought so hard for that.
What could have been, that was missing piece all along.
Stephen Wolfram's point has always been that a computational language (read: Mathematica, of course) cannot be displaced by natural language. Computational language differs from natural language precisely insofar as it is constrained and unambiguous: one thing cannot mean another thing. There is no "but I actually meant this." The logic and reasoning of computational language is reproducible - you can always follow the same inputs and get the same outputs.
Natural language is not as precise nor as constrained. Words have "fuzzy borders." Interpretation is ambiguous and not even reproducible person-to-person (i.e. people will have different interpretations of the same words).
For example, if I asked ChatGPT for "the daily average temperature over the last month", am I asking for the average temperature each day over the month (line chart), or an average of that line (a number)? What if I wrote "the average daily temperature over the last month" instead? Not clear! But in SQL of course it is very clear.
However, just like with other people, we can have a discourse in natural language to arrive at computational language. This works from business people to programmers, so it is possible that one day it works from business people to ChatGPT as well.
Yeah, I'm sure there are lots of much more formal (and well researched) ways to describe this. (Any time I find myself on any wikipedia page related to linguistics, i'm always shocked by how far that rabbit hole seems to go.).
The weird thing is, not only does this all seem pretty well understood, it seems pretty intuitive? Like, it's a pretty obvious point and something that most everybody would say, yes, of course, that makes sense. And yet, we seem to be constantly ignoring it in how we're building some of these AI things. Which, maybe it's because of your last point, and we all assume that AI can "just figure it out." Which, I dunno, I guess maybe one day it can?
Benn,
I absolutely love your thoughts and fresh takes regarding Data, AI, all things "Tech". A few points stuck out to me about today's blog that I would like to comment on:
-I've never learned SQL, but I do know python very well (especially data frames, data analysis, etc). Your description of SQL as a cockroach curiously makes me want to learn it now, haha. Thought that was such a great descriptor. Also your knowledge and passion for SQL also makes me want to learn it, if for no other reason that to see what I am missing.
-Strangely enough (and especially after admitting my lack of experience with SQL) I think that what AI does best--better than any other task I have found--is the technical translation from Natural Language (plain english) into workable code. For Example, I spent about 3 months writing a ML algo stock options trading bot based upon a random forest algorithm, and last night I decided to "try again" rebuilding a ML Algo stock options bot from scratch using AI, and asking the AI for specific recommendations and to write the code itself, and "we" decided on a Gradient Regression, which works perfectly and I used today for the first time on Apple Stock (worked great). On this vein, I've recently heard about the lack of COBOL users, legacy code, etc, and actually I bet that AI is able to fill in as COBOL programmer for legacy code / fix issues that many people are unaware of (a bit of my own side project). So I find it very...strange (?) that AI is unable to work with SQL, and I know that AI is powerful at data analysis tasks--again this is another reason I am curious to explore SQL a bit further.
Anyways just wanted to say thanks for your blog, writing, and thoughts. You have a new perspective on something that I think has too many thinkers thinking in the same direction ("AI WILL REVOLUTIONIZE THE WORLD"), but yours is a new and refreshing take. One idea that I loved during the data camp podcast was your idea of "teleportation" in technology, that instead of going one direction quickly (which is the way I always thought of it as), we are "teleporting" to an entirely new, and unexpected area. I will definitely be listening to that podcast a few more times!
Thanks! I appreciate that, and glad you like it.
- On SQL, yeah, it's definitely useful (and I've always found it much easier than Python or R to be honest). But that's one of the interesting things about it; it requires a different way of thinking about problems than those languages. It's hard to describe. I think it's because it's so table-oriented, you start to think of every data problem around tables and joins rather than functions. I'm not sure if that's good or bad, to be honest.
- On AI working with SQL, I think we're actually saying the same thing? I think that AI can help as the sort of assistant you're describing, where you're basically pair programming with it. In that case, you wouldn't be giving it a business problem and saying write code that solves this thing for me; you'd be saying, I want to write a program that's shaped like this, can you fill in the pieces? It's still a kind of natural-language-to-code flow; it's just that you're telling it how you want the code to work. That seems to work pretty well (for SQL and for software engineering with things like copilot). What *doesn't* seem to work is asking it some business question - why are sales down, or whatever - and having it figure out what to do from there.
Thanks again for the note, and glad you like the posts (and the podcast)!
this assumes that we have business questions that AI hasn't already answered for us!
True. One day we won't see owls, the will simply be rendered for us on our apple vision pros.
Yes there is, but in order for it to be reliable and high-performance, it cannot rely on LLMs alone. Liquify Analytics is building it. It’s coming along great.
Meaning, yes there is a path from business question to SQL - it just isn’t what “everyone” currently thinks it is.
Agreed. It's not that there's not a path from question to query that runs through AI, but I think there's got to be something else in between those two steps other than just AI.