38 Comments
User's avatar
Yoni Leitersdorf's avatar

First of all, let's be honest. Today, August 1st, 2025, at best, AI is 85% accurate in these things. I know it because we've been building this for 18 months with some of the smartest people I know. I also know it, because I look at our competitors, and they're not ahead in any way (and sometimes behind) on the accuracy piece.

So, while vibe coding, and vibe analytics, are nice and great, let's keep in mind the accuracy issue.

There are solutions to this:

1. Human in the loop - either the customer's humans (their analysts), or your humans (startup hires humans to do it).

2. Show your CoT (aka Explainability), so that the user who views the output can at least understand what was the logic that generated it.

3. Users begin accepting things that are 85% accurate... because after all, no human is much better right?

All of those solutions basically rely on humans making decisions. Which, at the moment, is the best we have in this space. One day, we'll be Waymo.

But until then, we're driver-assist with a bit of higher level self-drive.

Expand full comment
Benn Stancil's avatar

Yeah, I think that's maybe the best counterargument here, which is basically, "people are wrong too and how do you know?" And I don't really have a good answer to that, other than, man, it *feels* different?

I think the Waymo example might be a decent analogy actually. Like, yes, Waymo seems to have a much lower accident rate than humans, which is great. But there's something unnerving about also knowing that Waymo can generally break in much more irrational and catastrophic ways. Putting aside malicious intent, there's sort of a common sense ceiling for mistakes that people will make. Computers don't really have that; they can glitch away their common sense, and then who knows what might happen (eg, they buy tungston cubes). I think until that problem gets solved (which maybe it's close? I don't know), it'll always feel a little on edge.

Expand full comment
Yoni Leitersdorf's avatar

Yep, that's why you have the emergency button in the waymo (or limit the budget for the Claude experiment). Fail safe and guardrails. Like any good computer software.

Expand full comment
Benn Stancil's avatar

But can those truly exist? Waymos drive around without passengers in them. And limiting the budget for claude could prevent one particular form of catastrophe, but it doesn't stop it from still doing nonsensical things (eg, the actual tungsten cube thing wasn't catastrophic; it was just stupid.)

Like, I guess I'd put it this way: One reasons normal computers are useful is because you can precisely program them to do (and not do things). They're contained, like trains, because they have to stay on whatever rails you build. One reason people are useful is because they can do anything. They can offroad. They can adjust to stuff that they haven't been taught how to handle. And it works, because most people have common sense and a desire for self-preservation and shame and all of these things that keep them from doing insane stuff. I'm in a coffee shop right now and the range of things that anyone here *could* do is enormous, but I trust that they won't because they all have some governor in them that keeps them from doing it.

But AI is kind of this middle thing, that 1) is valuable because it can offroad too, but 2) doesn't really have that self-governing mechanism. There are versions of it, and lots of AI safety people are trying to build it, but safety and common sense are different things. And in a way, programming common sense guardrails is a form of putting the car back on the train tracks, which seems like it undoes a lot of the usefulness of the thing?

Expand full comment
Yoni Leitersdorf's avatar

It's a big dilemma. As you say - we used to have separation, we automated controlled things, and we built strong governance mechanisms (laws, morals, etc) into the less controlled things.

Still, we have a lot of people who fail with the governance (murderers, DUIs, criminals in general) and we have ways to deal with that too. Those mechanisms aren't perfect, far from it, but there's some balance now where a human in many parts of the world feels generally safe from these things.

We need to stagger how much agency we give the AI... analytics is a more "safe" path here. Giving it weapons and creating Skynet, probably less so?

Expand full comment
Benn Stancil's avatar

True, but even without getting in to the philosophical stuff, it feels like stuff starts to get weird when you have any sort of semi-autonomous machine that's making decisions of sorts. Like this clip - it's not about robot morality or consciousness; it's more just like, so much of society wasn't built with this type of "being" in mind. https://www.tiktok.com/@wallstreetjournal/video/7520245966818053390?lang=en

Expand full comment
Yoni Leitersdorf's avatar

Hilarious clip!! Never thought about it that way.

In one of my Waymo rides, it picked me up near the Exploratorium in SF, from a spot where it has a ton of pedestrians crossing in front of it. I got in my ride, and for several minutes, watched the Waymo wait for people to allow it to pass.

No one did.

A few minutes later, the Waymo car started slowly inching forward, which eventually caused the pedestrians to stop and let it through.

I believe it wasn't the Waymo algorithm doing it.

I think that the car flagged a remote human operator for help.

Though I may be wrong, and the algorithm is amazing.

Expand full comment
Bryan Corder's avatar

We always hold machines to higher standards than humans. Humans are horrible at forecasting, but if Bob in Finance is 60% accurate and an algorithm is 85% accurate, Bob keeps his job (for now) while the algorithm is called trash and deleted.

Expand full comment
Benn Stancil's avatar

Agreed, but I think that's because of the common sense / catastrophic error thing? I don't think we penalize computers when they make errors more than we penalize humans when they do, but I do think we penalize computers *a lot* for making errors that humans wouldn't make. "A person would clearly never do this" is a hard objection to overcome, I think.

Expand full comment
Alex Petralia's avatar

I definitely think there is a way know whether the work is right. It all comes down to "consensus from multiple independent observers". If you have only a single source of truth, then you're right, you cannot verify. But almost always there are other sources: contracts, emails, etc. You must, and can, always tie back to the source.

I wrote about this more here: https://alexpetralia.com/2025/07/01/the-left-and-the-right-hands-of-data/

Expand full comment
Benn Stancil's avatar

I think that's true, as a philosophical matter. But how do you confirm that if a robot gets a number first? If you do it yourself, I guess that's kind of useful, because two sets of eyes (of sorts) is better than one, but it's hardly automating anything. And the other alternative is to get several other robots to do it, which...I guess might work? Though at least for a while, I'd have some concerns that their errors are correlated (all LLMs are similar-ish, so they could fail in similar ways). And in examples like the cap table thing, the formats of the outputs aren't all *that* structured, so it's not necessarily that straightforward to even compare answers. Though I don't know, I guess that's the future - validation wrappers around chatGPT wrappers.

Expand full comment
Alex Petralia's avatar

I completely agree that it's not straightforward to compare answers - sometimes you pull a count of orders from the database and there's no easy way in the frontend to corroborate this. I would say this is an "untethered" answer. The analyst shrugs and says "Just trust me I guess..?" (until everyone finds out an ETL job failed and it was not correct). Despite it being not straightforward, I think analysts have a responsibility to at least attempt to corroborate their figures using alternative sources.

Expand full comment
Benn Stancil's avatar

Yeah, agreed, but that feels like the problem? Like, if I have to corroborate the answer, can we actually automate anything? It helps, maybe, to have something give you the first pass and all of that, but it puts a pretty hard cap on how useful it is.

Expand full comment
DJ's avatar

The chart examples reminded me that Stuxnet used false dashboard reporting to keep the Iranians from knowing the centrifuges had been slowed down. Maybe that’s how AI gets us in the end.

It’s fitting the Michael Saylor is running the Bitcoin ponzi. I’m old enough to remember their 2000 Super Bowl ad used to prop up that ponzi.

Expand full comment
Benn Stancil's avatar

It doesn't seem that far off? In some of the safety tests, I'm pretty sure they've managed to get models to try to pursue some goal surreptitiously, while trying to hide what they're doing. So it's not that far off from creating a fake dashboard.

And that 2000 ad, wow. "The power of intelligence e-business," they say.

Expand full comment
Keyboard Sisyphus's avatar

To me, key roles of data analysis include, in no particular order but numbered for easier reference:

1. discover and communicate useful knowledge about how a business works

2. relieve the anxiety of executives facing uncertain decisions

3. spread out reputational risk for uncertain decisions

A (good) AI can conceivably do #1, which will lead to #2. But I think AI tools will always struggle with #3 because decision makers will be held responsible for the mistakes of their tools in ways that they are not held responsible for mistakes of their staff. "Shit rolls downhill", as they say, but if you automate your underlings you're left alone at the bottom.

Expand full comment
Benn Stancil's avatar

Do you think that really holds though? Right now, I'd agree with that, because AI is known to get stuff wrong; it's still new; even though people say to use it, there's a bit of a sense that you're supposed to use it at your own risk. But I wonder if that'll always be the case. Not necessarily in a way where people won't be held responsible for what they're robots do, but in a way where we're kind of forgiving of it.

I could see it being sort of analogous to "following the data," actually. If you make a decision and it goes bust, but you can say, "well, the numbers said this and so I thought it'd be good," people kind of shrug and say, "true, that was what the numbers said, sometimes it just doesn't work out, I guess." I wonder if the same thing happens here, where you can say the robots told you to do it, and people say, "the robots are usually pretty good, so I can't be too upset that you listed to the robots."

Expand full comment
Keyboard Sisyphus's avatar

I could see that outcome, yes. But I could also see increasing reliability leading to decreasing forgiveness: "why didn't you guard against that failure mode? Don't you know how to prompt??" If driving a car is any indicator, people operating in uncertain environments usually think they are more skilled than the average person.

Ultimately companies are people. In a cooperative culture, forgiveness may reign, while in a competitive culture, AI may present an attack surface for rivals.I certainly know which I'd prefer!

Expand full comment
Benn Stancil's avatar

I guess that's true; if it's so easy to use, there probably is a lot more pressure/expectation to use it right.

Expand full comment
Benn Stancil's avatar

I've always wondered how these sorts of checker loops work, where one thing does a thing and another thing evaluates the thing. Do they potentially spiral each other out of control, where they "yes and" themselves into oblivion? Do they become like a corporate committee that can't do anything but the most basic and safest stuff? Do they just work?

Expand full comment
Marco Roy's avatar

I guess time will tell. But establishing & following a plan does seem to help coding agents tremendously.

Expand full comment
nathan t's avatar

In analytics you have something called the ‘semantic layer’. If you let an AI leverage that semantic layer, you have pretty much full explainability on how a chart was created.

You could show the end user ‘hey, this spotter/julius/whatever ai thing, used metric X from —here— to create this chart’

or you could say ‘hey, it did not use an existing metric, instead it combined these measures and dimensions with this formula, if you wanna be 100%, please let it be double checked by your data colleague’

Expand full comment
Benn Stancil's avatar

I think that only partially works though. For things that exist in the semantic layer, sure, it's mostly reliable. Though not entirely - BI tools have used this sort of structure forever, and it's still plenty possible for someone to misinterpret some chart in it. A semantic layer "query" (defined by something that sort of looks like code, or by a drag-and-drop UI, or even natural language) can still be fairly confusing.

But to me, the bigger issue is there will always be things (lots of things?) that can't be answered through those layers. Like, can we automate mechanical reporting? Maybe? Probably? Can we automate analysis, which is the off-roading that inevitably happens when the reporting doesn't answer the question you want? That seems much harder.

Expand full comment
Keyboard Sisyphus's avatar

Personally what I am most excited for with automated analysis is that it could be economical, for the first time, to explore methodological uncertainty a la Andrew Gelman's "garden of forking paths". We'll get to argue more about high level assumptions and experimental design and less about p value thresholds. Those kind of decisions draw on deep domain knowledge -- exactly the kind of context that is hardest to shove in an LLM. In a high-AI scenario, experienced analysts may find their salvation in those kind of meta-decision roles.

Expand full comment
Benn Stancil's avatar

Doesn't that make the problem he talked about worse though? My understanding of that general thesis is:

- People explore a few avenues of inquiry

- Find something interesting and talk about it

- But don't really disclose or discuss all the other ways they tried to find something and didn't, so it becomes an informal version of the xkcd on p-values, but instead of p-hacking it's a kind of methodology hacking https://xkcd.com/882/

And if you can send a bunch of robots off to look for things, surely some will come back with some things that seem interesting. Which, if you look in their results in the aggregate, maybe that's good. Though if you look at their results and selectively talk about the fun ones, that seems bad?

(I have this theory about "foundation legitimacy" in analysis. Basically, the main reason why analytical arguments work today is because someone had to make them. It's easier to argue for something that's true than false, and so true things tend to have better arguments in support of them. But if you can make fairly compelling arguments for anything, we aren't actually that good at discerning which one is correct. AI potentially undoes that. And it seems like there's some analogy here, which is something that goes off and does tons of analysis on various things could potentially make finding true things *harder,* if compelling analytical conclusions become easier to manifest. https://benn.substack.com/p/how-analysis-dies?open=false#%C2%A7foundational-legitimacy

Expand full comment
Keyboard Sisyphus's avatar

Well it ensures that the full garden of potential analyses is explored. That is really bad if only one side does it (because they get the best possible cherry picking), but maybe helpful if both sides do it and the cherry can be placed in context. I still think we can mostly rely on "true things tend to have better arguments in support of them". Even AI agents will struggle to create as many coherent arguments for a false thing as they could for a true one.

As for "foundation legitimacy", I think a ton of our society's functions rely on similar "proof of work" heuristics to indicate seriousness. But many of those don't work anymore. A resume and cover letter used to be a meaningful signal of interest in a job; now there are services that automate not just cover letter generation and form filling but also linkedIn messages and emails to the hiring manager. Maybe we need to ask for paper applications again, and paying for postage becomes the proof. I'm not sure what a similar mechanism might be for data analysis.

Knowing things is just hard. The epistemologists warned me but I didn't listen and went into a data career anyway haha

Expand full comment
Benn Stancil's avatar

Ah, I like that too. That it's not just "you wrote a good cover letter" that used to matter, but the fact that you did it at all. But a whole lot of things are probably going to turn into versions of that, where the easy way to judge if something is good - that it exists at all - isn't going to work anymore.

Expand full comment
Nick E.'s avatar

"Can analysis ever be automated?" I 100% agree with your conclusion!

Expand full comment
Laurie's avatar

It’s hilarious that chart 2 is the only one I felt pretty confident was “real” because the data was so simple and verifiable. Chart 3 is the one I was most confident was made up.

I think your point about the CEO having to just trust the analyst is the real thing here. To non-data people, data has always been untestable magic, so this is just cheaper, faster untestable magic. For me, it’s horrifying to trust a number that I can’t reproduce or at least understand how it was produced – at least for any important decisions. And hopefully that’s what people are using data for, to make decisions… right? <insert Hayden Christensen / Natalie Portman meme>

Expand full comment
Benn Stancil's avatar

Yeah, and I have no idea what "net approval of the internet industry" even means? But half a million people subscribe to that Substack, so I'm guessing he can't just make stuff up. (Though if this quiz had happened this week, I guess I could've added BLS jobs numbers to the charts, and nobody would have any idea if they're right or not, because they're definitely about to start making stuff up.)

Expand full comment
fergie's avatar

"moving average" on a chart for no reason is classic economist stuff.

Expand full comment
Benn Stancil's avatar

Also, in excel

Expand full comment
Corneel's avatar

Yeah, it is funny thing that I encounter when collaborating with claude code.. It does 'defensive programming', as in handling error cases and still providing a result (let the script continue). But when some basic logic fail, I WANT the script to break, because I don't want to update the dashboard / model with new but most likely wrong information.

Bug detection in analysis is often finding out what assumptions about the data were wrong or violated, not real technical issues. Once you get more from innovation to production and you are delivering numbers and predictions, you want to have more technical stability. AI can help with that. So you have to do innovation as analyst and let AI help you to move to production. From notebook to the pipeline. The AI is able evaluate the output. In addition, AI could make the output much better looking (instead of the ugly graphs you see now everywhere).

For the quick and dirty ad-hoc analysis or the innovative stuff it remains extremely difficult for AI. I think that besides domain experts, you have data-domain experts, who just know a lot about the data and can smell weird results. They can't smell them in other domains though.

For coding tool to help, they would need a LOT of context, basically all those numbers, graphs and conversations the data-domain expert looked at.

Expand full comment
Benn Stancil's avatar

On that first point, someone else sent me a few logs of conversations they'd had with claude when they were doing work like this, and it was kind of remarkable how much it things that fit into that type of behavior. It's got an eagerness to it, to both solve the thing, to say it solved the thing, and then, if you tell it it got something wrong, to say "oh you're right! I will fix that." Which broadly doesn't feel like quite the right personality for analytical type questions, where people seem better of being skeptics. And I haven't seen any model do a particularly good job of that.

Expand full comment