33 Comments
User's avatar
Laurie Stark's avatar

Data analysis is one of the only things I'm not comfortable trusting AI to do, which makes sense since I'm a Data Person. I have found ways to use AI to speed up and enhance the data analysis process, like feeding it little bits and having it just do that bit. But handing it a whole dataset and saying "go to town, tell me what's up?" That ways lies madness, or at least nonsense. But then, I imagine that's how many people feel in other disciplines about using AI for what they do.

Benn Stancil's avatar

yeah, I'm sure. I certainly feel that with anything related to writing this blog. But, at the same time, it can feel so effective in other ways, it's hard not to think, surely there's some way to feel that here too?

David Andersen's avatar

"We need the opposite: Something that turns an arbitrary query into an accessible diagram. I don’t want dropdowns to generate a complex query. I want to ask a robot to write me any query I can think of, and a picture and some words that tell me how the big computer did numbers."

Yes, this would be wonderful. I've thought about this over the years - especially when handed someone else's complex SQL and I need to understand what it's doing.

Benn Stancil's avatar

I kind of have no idea what this would look like though? It's like, if someone spent an hour putting together an explanation of a query, I bet it'd be pretty good. But I'm not that sure what they'd do?

Travis Marceau's avatar

We built it at dbt and everybody misunderstood it except me and Drew: https://docs.getdbt.com/docs/cloud/canvas

Leadership listened to a bunch of tired execs who said they wanted clicks not code, instead of reading the tea leaves.

It was never about clicking to edit pipelines.

It should've always been marketed as validating what you just vibe coded.

Benn Stancil's avatar

Yeah, that's the temptation, for every BI and BI like thing - you can always sell the drag and drop creation thing (or you could've; I guess they now want chatbots).

Devil Pistons's avatar

Does this mean I can leave SQL early?

Benn Stancil's avatar

if 50 years after it got created is early, for sure.

Devil Pistons's avatar

Thanks. See you on Monday.

Marco Roy's avatar

For devs/apps, databases are just tools. But for data, they are our canvas. And then we curate them in museums called "dashboards".

Yoni Leitersdorf's avatar

As someone who has been "writing" a ton of code through Cursor, I can tell you, I didn't read a single line of code. I don't care. It works like I want - great. It doesn't - then I tell it to fix.

But as you say, there are a lot of nuances and precisions in the world of analytics. So you need to build ways to test it. Just like we have unit tests and integration tests in code, we have evals in AI, and benchmarking in data: https://www.linkedin.com/feed/update/urn:li:ugcPost:7419821539610619905/

So I'm of the opinion that we should not be reading all this code AI generates. It's a fool's errand. Instead, we should be codifying how we want to test its work, and then teach AI to test its work. You can even use different models, if that makes you feel better.

Marco Roy's avatar

We can create unit tests & integration tests for data pipelines/models also, but it only tests the functionality, not the "specs".

You validate the specs of an app by clicking around. But for data, the spec is basically the data structure (and the tests validate that the structure contains the right data). We can't just let AI drive the entire trip. Someone needs to say "this is what I want". That can either be done in a GUI tool, or with something like a CREATE TABLE statement.

I mean... I guess you could describe the table to an AI, but that seems pretty redundant? And then you need to validate it anyway. It's the same list of columns whether we give it to an AI or write it in a YAML file.

Yoni Leitersdorf's avatar

True... can you codify the tests though? If you sat down and wanted to write VERY VERY clear instructions on how to tests the results, could you?

Marco Roy's avatar

Yeah, it's all input-output. Like any other test:

- Provide the input

- Provide the expected output

- Compare with the actual output

And AI can do most of that (especially with clear instructions and samples).

But it's not possible to do any of this without a spec/structure. Otherwise, what are we supposed to test? Where does the output go? Are we supposed to let AI just guess? Maybe that's enough for some, if they have no standards or expectations.

Yoni Leitersdorf's avatar

That spec is crucial. What I'm seeing is that in the AI world - it's the definition of the work, and definition of expected results, that are crucial. The stuff in the middle is done by AI.

Humans, historically, haven't been amazing at building ironclad definitions. We need to get much better. We can use AI to help build the definitions and specs, but humans must guide that and review them. It's not like reviewing code... these specs are in human-readable language and format.

Marco Roy's avatar

And it always requires at least a few iterations. With or without AI, it's almost impossible to describe something perfectly the first time.

That's why "demo early & often" is so important.

And when working with data, I almost never know exactly what I want until I profile the upstream/source data and play with it a bit, so that's another consideration (unless an existing profile is already readily available, or if I know the data well already).

And then it's always a good practice to profile the production output as well, as an extra form of validation / smoke test (to be looked at by a human, because AI won't necessarily be able to tell what is good or not, nor can we create an "expected profile" for real production data).

Yoni Leitersdorf's avatar

Agree 100%. Means that we humans are not disposable yet! :)

Wojciech Wasilak's avatar

Malloy looks very promising for working with data and AI (https://docs.malloydata.dev/documentation/). Easy to read and understand, combines both code and metadata in the same file, compiles to sql. Anyone tried to use it with Claude?

Benn Stancil's avatar

Yeah, malloy has always felt very technically capable to me, but I've never found it very intuitive to read? and that feels like the thing that's needed most here - how do you make a messy query legible? malloy maybe could, but it seems like you've got to climb a pretty steep learning curve first.

Performative Bafflement's avatar

On the "data query interpreter," it actually seems pretty feasible, even vibe codeable. Essentially you want a human-quality-judged interpretation of what is reasonable or important.

There's got to be decent data that indicates the usual sticking points, and areas where a small change vastly changes the outcomes in Common Crawl / Stack Overflow / internally in Snowflake or similar places overall, and that gives you the a reasonable / important first pass.

You could build in a "sensitivity analysis" where it toggles changes in columns or filters and shows how sensitive the overall query is to things, and the range of sensitivity (ie beyond which the results are null or uniform).

You can do anomaly detection and surface any big changes for manual verification, so they can judge if it was just Covid or something, or if there's a problem with the query or data.

You could build summary statistics / displays that select from the sensitivities so you can gauge if they're realistic portrayals for this particular data or if there's some big problem.

I think it has pretty decent potential as an open source python library.

Benn Stancil's avatar

I'm not a huge believer in the anomaly detection and summary statistics type stuff, because I think that's pretty hard to tune to get right. It misses bad mistakes; it catches lots of errors that are just messy real world data. (just as tons of BI companies have died on the drag and drop editor hill, tons of data pipeline companies have died on the anomaly detection hill). But I'm with you that it seems like you could create some sort of mapping type thing that looks for common errors, and sort of walks through an explain type plan to determine how each table and column in a query gets manipulated, and lets you visually see joins, if there are many to many joins, etc, etc. I'm not sure quite how that would work, but seems like there could be something to it.

Performative Bafflement's avatar

> I'm not a huge believer in the anomaly detection and summary statistics type stuff, because I think that's pretty hard to tune to get right.

Ah, interesting - my experience with anomaly detection is in data science pipelines, where you pull source queries, variable clean, interpolate and create second order variables, do anomaly detection, look for variable drift, model, check for model drift, etc.

It was pretty high value there, but I don't have much experience in the BI space, I'll take your word for it. I can see why it might be true in theory, because anomaly detection in a limited domain with fixed scope is probably much easier and much more accurate than anomaly detection over the baffling array of uses cases and data sets businesses-in-general will throw at it.

> But I'm with you that it seems like you could create some sort of mapping type thing that looks for common errors, and sort of walks through an explain type plan to determine how each table and column in a query gets manipulated, and lets you visually see joins, if there are many to many joins, etc, etc. I'm not sure quite how that would work, but seems like there could be something to it.

Yeah, sounds like an annotated Snowflake query profiling with a few extra bits bolted on. Which I think works fine for people with some experience, but if this is LLM / vibe coding driven, I think you'd need some additional layer of simplification. Maybe the LLM makes a short video talking about the various decisions and flows and important things as it goes through the profiling, including which things are getting filtered out, which steps filter the most, etc.

Benn Stancil's avatar

Ok, yeah, that's fair for capital-D Data Science type work, where volumes are really high and a lot of what you're looking for is small or unexpected wiggles. For BI or analytics type problems, there are always tons of "anomalies," but they're often just businesses being naturally uneven (a holiday; a SaaS business that sells three new contracts a week; etc). In those cases, statistical significance is often more of a waste of time, because the differences may be real but not material. You're looking for things where you can immediately say, oh, wow, yes, that is big, rather than things that are "real" but undetectable with the naked eye.

And on the profiling thing, it's almost like asking it to construct this query in something Excel-like. Show it to me as tables and formulas and some simple explain this / explain this more buttons. Again, I'm not sure if that actually works, but seems like it maybe kinda could?

David Andersen's avatar

If Wes McKinney is right, and we're so confident that AI is going to do everything we want, might as well write machine code.

Marco Roy's avatar

AI needs abstractions and encapsulation to make sense of things just as much as we do.

David Andersen's avatar

What I'm saying is: if the hypothesis is "AI is going to generate code and humans don't need to read/review it" then there is no reason for it to generate Python or Rust or C or even Assembly.

Marco Roy's avatar

Higher order languages are abstractions on top of machine language, and are 100% necessary.

Unless you enjoy wasting token and making AI less effective.

That's why frontmatter is so important: it basically provides an abstraction/encapsulation for tools without exposing the entire thing into the main context (which is the equivalent of human working memory). That's basically the point of functions: being able to invoke a bunch of code without having to read or copy it all.

Just like humans, AI can reason better about names & verbs than ones & zeroes. And less is better than more. Conciseness benefits everyone (including AI), not only humans.

AI is not a computer; it is a verbal interface between humans and computers (with a layer of algorithmic intelligence). It is basically a programming language, but one which accepts about just any input (i.e. prompts), rather than the very strict syntax of the programming languages we are used to. It is basically a layer which sits on top of all the other layers. There is no point in digging all the way back to the bottom.

AI is basically Python (or whatever), except that people can type whatever they want, and indentation doesn't matter. But obviously, that comes with caveats (the most significant of which is ambiguity, which is the obvious consequence of abandoning strictness).

And tools like Claude Code are yet another layer on top of the AI layer, primarily to help handle/mitigate all of the ambiguity. Thus, it is largely programmed with prompts rather than code.

Benn Stancil's avatar

I've said this before (as have some real engineers: https://benn.substack.com/p/the-ads-are-coming#:~:text=how%20much%20faster%20could%20these%20models%20work%20if%20they%20wrote%20code%20for%20themselves%3F), but I think there's a bit of an in between there that eventually happens. LLMs probably don't need to write machine code, but you could certainly them writing code that isn't like what we write.

Like, if they essentially transpilers between English and code, maybe it's hard to transpile English to assembly, because English and assemby don't really interoperate well. The semantic distance could be too far to make sense.

But you could imagine closing that gap by putting different sorts of abstractions in languages that 1) make it more interoperable with English but 2) is more performant than something like Python, because it doesn't need to interoperable with *humans.* LLMs have far more memory than we do; they don't get tired of reading stuff; they don't fat finger things. But they make mistakes that we don't. And if a lot of code is designed, very approximately, to be ergonomic to our mental abilities, it makes sense to me that we'd eventually start building in languages that are more ergonomic to the model's abilities.

Marco Roy's avatar

It all comes down to primitives. It's pretty hard to get around for loops and if statements, and some way to declare & call functions (and maybe classes).

Every programming language has its own way to do it (or even multiple ways), but they essentially all do the same thing (with some subtle differences).

And generally, it is the most succinct way to express computing concepts, unless you create some kind of DSL for much more generalized cases (like REST APIs, perhaps). And the most basic DSL is to create functions (so instead of constantly looping over files, you could just call read_files(), or something like that).

So basically, I don't expect AI to change programming languages in any very significant or meaningful way. Nor should we want it to.

The industrial revolution primarily revolutionized textile/clothing (among many other things), but most clothes are still sewn together (but by machines instead of humans). The process hasn't changed; only accelerated.

Just because we now have AI-driven cars doesn't mean that we don't need roads anymore. Or even road signs, because AI cars still need to account for each other — unless we ditch all of the road signs in favor of networking, but then what happens when a human wants to drive?

Benn Stancil's avatar

I won't pretend to know enough about how all of this works to have any terribly informed opinions here, though I guess I do find it difficult to imagine languages don't change some. Even if it's just annoyances that people don't want to deal with and shortcuts that help us be lazy, those things don't annoy computers and computers aren't lazy. So, why allow it?

And on the humans wanting to drive, I think we'll just lose the ability, honestly. Not entirely, but if the black box is fast enough and mostly reliable enough, it seems like there are gonna be a lot of people who are perfectly happy never bothering to learn (which may or may not be bad, but that's a different question than what seems like will happen).