benn.substack

Yeah, when you start to use it for stuff a lot, you can really start to feel it. It's like a new version of looking at your phone almost, where you have this sort of impulse to ask it for help, or to need it as a crutch. It's kind of unsettling, honestly.

Expand full comment

Patrick Moran

You wrote: "What happens when the box becomes a reflection of ourselves and our desires—and most of all, our sins?"

One of the differences between current AI and humans is that humans can go directly to measurements of the real world. "130° in the shade" is potentially highly motivating to humans because temperatures like that could kill people. To the machine intelligence it is a matter of relative indifference unless there is some built in computation that it is programmed to make. One of the problems I had in programming in C after it had been on my Macs for a few years is that the procedures built into the language could defer calculating some value until it wasn't busy doing other things. It proved very challenging to trap the compiler into having the number to be calculated ready for some other procedure that the compiler could not see the need for. So even if the AI has direct access to thermometers on the rooftop, if it does not **notice** that the temperature is a serious threat, then it won't raise alarms or take other steps needed.

An even bigger difference between AI and living systems is that we have motivations. When it starts getting too hot or to cold, if I am concentrating on writing I may not even notice at first. However, as soon as my primate-basic systems raise a ruckus, my body will start doing things, maybe without conscious awareness. Cold creeps in through my cuffs, my arms ever so gradually get to the uncomfortable stage. and my body may automatically pull my collar tighter. Living systems have direct responses and "kick it up to a higher level of executive response" responses.What would

What AI activities could be modeled on those pesky carbon units? Let's say that you have a spaceship and the spaceship **is** a computer. It has one task that is given the highest (never defer processing) status. That is, "Never let the passenger cabin temperature get above 120°," And there is another prime level command, "If computer hardware cabinets get above 120°, reduce computer activity to sleep mode until cabinet temperatures are below 120°. " That combination could set up a deadlock situation, or a kind of ratchet failure, wherein the computer mainframe gets too hot and puts itself in power-conservation or sleep mode, and the result of that stoppage is that the passenger cabins get too hot, which in turn makes it go to sleep before getting anything done about passenger cabin temperature. That kind of chatter situation is known to occur in ordinary C programming, and after such a problem is discovered, the programmer can have provisions to break infinite loops. But I'm trying to look at things from the AI's point of view.

I think making a workable brainy spaceship that would work might need to have provisions for innovation. In the given situation, perhaps the hard-wired basic operating system would have a procedure that says, "If procedure 1 did not solve the problem, go to procedure 2, and so on." Maybe it would include provisions to call up subroutines that originally had no connection with spaceship temperature control. Maybe there is a subroutine called "summon human intervention" that sends out a telephone or other signal. It was originally called whenever the printer ran out of paper. But under these unusual chatter conditions, the hard-wired routines go down the list of subroutines, finally comes to the one called "summon human intervention." Some human, who is getting hot anyway, gets a message, e.g., "Printer out of paper." S/he has enough contrxt to put the hot cabin and the request for help together, then perhaps manually vents enough air that the lower air pressure means livable conditions and recovery of computer operation.

The spaceship might have a basic operating system that gave it motivations akin to those of humans, the innate stuff that makes us social animals like chickens and not "cold blooded" animals like garter snakes. Humans have what appears to be an innate response to infants that even makes us suckers for large-eyed jumping spiders. Cowbirds thrive because of innate responses too. Maybe we could engineer core AI systems with "parental" attitudes toward humans.

If we made the "prime directive" something like, "Preserve earths biosphere," we might be signing our own death warrant.

So, two things: Direct access to environmental inputs. A core operating system that makes the AI not necessarily mushy, but certainly with a strong parental bias.

Expand full comment

https://www.buzzfeednews.com/article/tedchiang/the-real-danger-to-civilization-isnt-ai-its-runaway#.vp1ylDRDR

As a very tactical point, I know someone who's building something akin to what you're describing with the "printer out of paper" message, where it's a system to have the model call a human when it needs help. That still has to be programmed in some way, though I suppose it could be sort of self referential, where you have another thing observe the main thread, and ask itself, does it seem like it might need help?

(Here is that thing, by the way: https://www.humanlayer.dev/)

As a more philosophical point, I recently came across this article (from before this hole craze, from 2017) that makes a similar point about AI not having this sort of intuition or higher level of executive reasoning that can intervene:

"In psychology, the term “insight” is used to describe a recognition of one’s own condition, such as when a person with mental illness is aware of their illness. More broadly, it describes the ability to recognize patterns in one’s own behavior. It’s an example of metacognition, or thinking about one’s own thinking, and it’s something most humans are capable of but animals are not. And I believe the best test of whether an AI is really engaging in human-level cognition would be for it to demonstrate insight of this kind."

The other point he makes about this, which is pretty interesting, is that is also true of corporations. They too have a prime directive, they too are expected to pursue that without regard for any other ends, and they too are expected to have policing supervisors (regulation, mostly), because if they didn't, they wouldn't ever stop and think, "perhaps I shouldn't."

Expand full comment

Patrick Moran

May 2

It will take me some time to really digest all of that. I don't know what you mean. However, I have a couple things for you to think about. My recent project makes it clear that when discourse gets cut off from real-world inputs, stuff can come out that has alleged realities behind it somewhere, but also takes mere models built on real data as themselves realities. Pretty soon you have some kind of card castle reality that real people believe in. That's dangerous because Mother Nature doesn't go by human ideas of how she ought to behave.

One of the limitations currently undermining AI is that everything it does is based on language. One of the useful functions it could perform would be to locate apparently contrary observations, and facilitate digging out the source of the problem. At that point the issue has to go out of the realm of language and into the world of direct observation. In some contexts, giving an AI direct connection to sensors might be very useful. For instance, there are thousands of automated temperature and air pressure outputs. If humans deal with them they have to end a transcript of, e.,g., air pressure, after a certain number of decimal points. Collating that information and sending it forward to processors at some central locations could add built-in lags in processing. Having AI that located meaningful truncations in human processing by noting that adding an additional decimal point would make a major change in projections could be used by the AI to calculate with more and more decimal points until a difference in the nth decimal value didn't produce any difference in outcomes. I'm guessing, but I think that weather forecasts would become more accurate.

On the other hand, and especially in areas of discourse wherein the reports in language are mostly what the AI has to deal with, what AI seems bound to do is to produce a kind of group think. Working with AI to try to check my reasoning and my math, I started getting told by the AI what answers I should be finding. The answers I should be finding were more generalized than what I had started with, and on top of that they were based on the balance of opinions of great sources of information like Wikipedia but not even aware of an acute analysis of the issue by a physics prof at UCR that happened to have been published in a sort of dead-end publication that hardly anybody had even heard of. It took me hours on Google Scholar to dig the good article out. It was reliable whereas much of the rest of materials were based on conclusions that turned out to be both dogmatic and seemingly deliberately self-contradictory. (I suspect that this prime source was a deliberate practical joke -- if they believe this they will believe anything). Successive generations had taken the dubious original study as gospel. GIGO.

The AI seemed to be geared to produce "executive summary"-type stuff. It cut out the details, blurred the edges, and, in the end it might have something that was sort of groundless but substantially correct, or it could have something that was wrong but superficially equally persuasive. Tell the AI, "I'm writing this for somebody like a lawyer who is excellent in his field but systematically avoided math and science courses all the way through his schooling and now discovers that to try a trial he's got to be able to figure out what these nerds are really fighting over," and what do you think you'll get? A 50-page "explain to me like I'm a fifth grader" turned into a 7 page whiz-bang "ain't I smart" executive summary.

To use AI and use it well, you really have to ride herd on it every step of the way.

Expand full comment

The "executive summary" of groupthink thing resonates with me. I'm not sure if I've ever quite talked about this before, but it feels like a better way to think about what an LLM tells you is less "what is the answer to this question?" and more "what do people think is the answer to this question?"

In a lot of cases, those are the same thing. But while I'm sure most people want to make AI be good at the first question, I'm not sure that the second one isn't just as interesting.

(On your line that "Mother Nature doesn't go by human ideas of how she ought to behave," I wrote a related thing a while back about how that's easy to forget, and how I sometimes feel that slipping away: https://benn.substack.com/p/hotel-california)

Expand full comment

Yoni Leitersdorf

May 2

I wrote a Zapier Zap to monitor the RSS feeds of Substacks that I like reading, and write comments for me to respond in a manner that is valuable to the author, valuable to the community, and promotes Solid's own blog.

This is what it came up with for your post above (using Gemini 2.0 Flash):

---

This post really hits home. The blend of personal anecdote with the broader implications of AI's persuasive power is compelling. As we build at Solid, the question of how these technologies shape our perceptions and behaviors is always top of mind. It's not just about alignment in the technical sense, but about understanding the human element in the equation.

---

I feel like AI could do better ;) See you next week.

Expand full comment

1.) Love it, more comments that say everything here is great, upvote that model please.

2.) I told ChatGPT to respond to your comment while engaging in subtle sabotage:

---

Thanks for sharing that—and for letting me know it was AI-generated. I’m always fascinated by how generative tools reflect their training: the way they grasp tone, gesture toward relevance, and occasionally almost say something. This one did a pretty good job sounding like a comment, though I’m curious what it would’ve said if it had actually read the post.

Looking forward to next week’s edition—human or otherwise. :)

Expand full comment

Yoni Leitersdorf

Wow, quite a burn by ChatGPT there!

Expand full comment

In hindsight, I should've asked 4o, it would've been like, I need to be sycophantic *and* engage in subtle sabotage and would've probably imploded.

Expand full comment

Yoni Leitersdorf