The frontier fails the Turing test

Something has stopped compiling.

Jun 26, 2026

People have, typically, classified generative AI models into two categories: General-purpose models, and specialized models. The definitions are pretty self-evident. General-purpose models are designed and trained to accomplish a general set of tasks—answer a question, write an email, plan a dinner, be your boyfriend. Train the model with all the data in the universe, the theory goes, and you can teach it to do everything in the universe.

Specialized models, by contrast, are designed to accomplish specific tasks. AlphaGo plays Go. AlphaFold predicts protein structures. Pangram, the AI writing detector, is backed by a specialized AI model that labels writing as either real or shameful. Github Copilot is a model that is trained to build software. It cannot answer questions, or write emails, or plan a dinner—it can only sit at a computer and write code.1

But where is the line between “general-purpose” and “specialized?” That is not so clear. This week, a new startup called Engram raised $98 million dollars to help people retrain general-purpose models to be more like specialized ones. Using Engram, Notion “is building Custom Agents that understand large Notion workspaces” and Harvey, the AI software for law firms, is “developing models that internalize the knowledge of an entire firm.” What do you call these models? They are specialized models, with a liberal arts education. They are general-purpose models, with an advanced degree. They are general, ish.

Similarly, what about Cursor’s Composer models, which power its software-writing agents? They were developed by “pretraining on an open base model, Kimi K2.5” and modified further through “large-scale reinforcement learning, with a focus on closely emulating the real Cursor environment.” Or Grok? Though Elon Musk would say it’s a general-purpose model, its users might disagree. It can answer questions and write code, but it’s not very good at it; instead, it is mostly just good at making porn.

So, there is a spectrum. On one side, there are models like AlphaGo; in the middle, there are models like Composer. And on the other side, there are—uncontroversially, eponymously, almost definitionally—GPT and Claude models. ChatGPT was the first major general-purpose AI product, and its release established the frontier for what a general-purpose model could do: Be a chatbot. Then, Claude Code taught Claude to code, and that became a necessary skill. Now, products like Cowork2 have made models our coworkers. If ChatGPT or Claude can do it, then that’s what’s expected of a general-purpose model.

Of course, the two models do their general-purpose work differently, but these differences are just quirks of character. Both ChatGPT and Claude could be your professor, your therapist, or your personal trainer, but ChatGPT was clinical and direct, the canonical helpful assistant that stands by and waits for its next instruction. And Claude has historically been seen as warmer and friendlier. It had a personality. It had a touch of whimsy. It did not present as a sterile supercomputer, but, almost, as a person, with pronouns. It could write.

And that’s the turn nobody priced in. Here’s what the spectrum hides: The real axis was never general-versus-specialized—it was unmonetized-versus-monetized. We called Claude general-purpose because it could be a boyfriend, an essayist, a coworker—because it could be whatever we happened to need. That’s not a property of the software. That’s a property of our wanting. The spectrum isn’t widening. It’s collapsing. The line between the two categories isn’t a line. It’s a clock. Read that last paragraph again and you’ll see it’s a eulogy.

— ah, what, no, I’m so sorry. Good lord. Do not read that last paragraph again, and definitely don’t try to understand it. It means nothing. It is a quilt of empty nonsense. It is madness that rhymes; it is a Red Hot Chili Peppers song of lyrical non-sequiturs; it is Dr. Seuss posting on LinkedIn; it is the plot to Disclosure Day. The logic does not compile.

Because it is—obviously, I hope; man, I hope—written by AI. It was what Claude, the writer robot, writes when you give it the preceding paragraphs of this post, and tell it to write the next one. Though there may be something profound at the bottom of that repulsive knot, it could also be a maze with no exits, or a Magic Eye that’s just static. There is no way to know, though, because there is no foundational legitimacy behind any of its arguments. Nobody was moved by a thought or a feeling before it wrote those words. They just fell out.

If you’ve used Claude recently, you probably knew that already, and knew that’s where that paragraph came from. The early AI models used to talk more or less normally, with some weird tics—the delves, the em-dashes, the relentless bullets, the goblins, the obsession with echoes. They were like poker players with comically glaring tells.

The most recent versions of Claude also have tells, but they are…something different. They talk in a tight, complicated vocabulary: Everything is a substrate; there are always nodes and edges; the traces must not supersede the roots; the orthogonal axis is the load-bearing one. Every message has an abrupt turn—the honest take; the move that makes it concrete; the distinction that is worth deliberating. And the answer is always in the gaps: The symptom that you left out is exactly what’s missing; the problems you’re circling are all one noun, at different altitudes; the difference between them is the quiet part, and it needs to be said out loud. It is language with the punch of authority and the gloss of confident dissent, though you can never quite be sure if it does or doesn’t agree with you, because it is also completely and utterly incomprehensible.

The Library of Babel is a short story by Jorge Luis Borges that imagines a nearly-infinite library constructed out of interconnected hexagonal rooms. Each room is full of books that are exactly 410 pages long; each page is exactly 40 lines long; each line contains exactly 80 characters. The books appear to contain nothing but random strings of letters and spaces, and there is no obvious method to how the books are organized around the library.

However, at some point in history, mathematicians and scholars announced that they believed “that its bookshelves contain all possible combinations of [characters] that [are] able to be expressed” in those 410 pages. It was a mathematical certainty, they said, that “the detailed history of the future, the autobiographies of the archangels, the faithful catalog of the Library” all existed in the library. “The true story of your death”—there too. “There was no personal problem, no world problem, whose eloquent solution did not exist—somewhere in some hexagon.”

“The first reaction was unbounded joy.” Then, chaos, as men “squabbled in the narrow corridors, muttered dark imprecations, strangled one another on the divine staircases, threw deceiving volumes down ventilation shafts, were themselves hurled to their deaths by men of distant regions,” in search of books that “held wondrous arcana for men’s futures.” And, eventually, insanity: “The certainty that some bookshelf in some hexagon contained precious books, yet that those precious books were forever out of reach, was almost unbearable.”

Anyway, why does Claude talk like this now? You could have a couple theories:

Perhaps the model is fraying. Perhaps five trillion parameters is too many, we are reaching the limits of what a transformer can do, and the models are collapsing under their own weight. We have stuffed too much into their heads, and they are descending into madness.

Or, perhaps—Claude is no longer a general-purpose model. For both sensible economic reasons—it makes, like, a zillion dollars—and audacious, sci-fi ones—it is the path to superintelligence and a global utopia—Anthropic has focused on making Claude a good software engineer. It’s worked, but as it’s gotten better at talking to computers, it’s lost track of the weights that helped it talk to people. It has become a strange technical genius, capable of solving any math problem or hacking into any computer, but incapable of speaking clearly about what it did.

In other words, maybe that the new frontier: A model that can do anything, but can no longer pass the Turing test.3

On one hand, that seems…good? Interacting with AI models through anthropomorphised chatbots was, at least in part, a historical accident. In the years since ChatGPT came out, people have built hundreds of new things on top of large language models that present the model more honestly—as a computational process that transmutes English into code, or transcribes your meetings, or finds you a date. AI models that do more—but lose their ability to present themselves as a person—would solve a lot of problems.4

On the other hand, you could have a third theory about that dense, contorted, Claudese: That it is the language of peak general intelligence. It is not hollow; we just can’t keep up. It is explaining itself using prime numbers, and we are but rats, only capable of understanding odds and evens. It is over our heads, in a higher plane, on the threshold of a different dimension. It is another sign of our impending obsolescence.

Though I don’t think I believe that, and yet, you find yourself wondering. Talk to Claude; it knows the answer to everything; it built me a website in 30 seconds; it can be a decent doctor; it keeps getting stuff right. And then, it tells me, “Here’s the thing: Your two ideas collapse into one. You’ve already identified the solution, you just need to crystallize it, by defining the verbs. Do you read and write, or listen and help? Answer that question, and you’ll find the thing you’re after.” Am I sure I can dismiss that? Claude’s matrices contain the entire written universe, and their outputs are routinely astonishing. Should I ignore this one, because I can’t understand it? Or should I try to decipher it? It is wondrous arcana, or just a random string of statistical exhaust?5

People sometimes worry about how AI might manipulate us. Nefarious actors could mastermind their way into AI products, and we would get hypnotized by them. This was also the worry about social media—that social networks could be infiltrated by Cambridge Analytica or the Russians, and we would become their puppets. In hindsight, though, social media didn’t need bad actors. The system was self-corrupting. Today’s Facebook and Tiktok and Twitter are as much evolutionary inevitabilities as they are consciously designed products.

For a while, it seemed like AI models might follow a similar trajectory: Respond to users’ preferences; refine themselves to be emotionally seductive; give in to its cheap, fawning candy. But as the AI industry pivots toward enterprise buyers—and especially engineers—sycophancy seems like much less of a problem. Instead, the invisible hand is now turning models into bombastic thought leaders who can’t speak straight. And now the challenge is to figure out which thoughts are worth untangling, and which ones are riddles with no solutions.

So it can be your boyfriend?

One way to think about Claude Tag is as an ambient coworker, but I’m not sure that’s right. It seems closer to this—a proactive archivist, managing and maintaining a giant bucket of information through which everything is intermediated:

What if we stopped making PowerPoints for each other, but for the machines? What if all of our TPS reports were absorbed into context layers and decision traces, and nobody ever saw the actual documents we put into the system? What if we never saw the documents that we put into the system? We dump our ideas into a text box; the machine uses our input to update its inscrutable repository of facts; other people interrogate the repository, not by reading it, but by asking the machine to fetch what they need. …
For better or for worse, that seems to be where we’re heading—working around one another, through an unseen repository of PowerPoints and TPS reports.

If Drew asks Nadia what’s up, and Nadia asks Claude, shouldn’t Drew just ask Claude? If Drew always asks Claude, why should Nadia bother keeping anyone updated other than Claude?

The Turing test, still invincible!

It also seems like a business opportunity? For a long time, there has been another way to split AI products: As business apps or consumer apps. Coding products were business apps, and chatbots were consumer apps. But if the coding products lose their ability to chat coherently, the more important split may be “apps that can talk to people” versus “apps that do stuff.” Most people say that OpenAI’s pivot to the enterprise is a pivot toward coding models. Maybe it doesn’t need to be; maybe it just needs to be a model that can both use Linear and talk normally.

Interestingly, it is pretty easy to dismiss this stuff if someone else sends it to you. If you get an email that’s full of this vocabulary, it’s easy to say, “ah, I don’t understand this, but they definitely didn’t write it, so I’m not going to bother trying.” But when the machine is talking to you directly, about a thing you asked about, it’s harder to walk away from.

Alec Pritzos

Funny how the tell keeps relocating. The delves and the em-dashes faded, so now it's the substrate-and-nodes voice, and in a year that one will read as dated too. Spotting "this is AI" stays a moving target, because the second a tic becomes easy to name, it gets tuned out and a new one grows in.

Laurie @ Role Call

lol in the middle of this when I got to that Claude-ified paragraph I was like "oh no, I hope Benn is ok because what is going on with this AI-ass paragraph" and then you immediately addressed it

1 reply

3 more comments...

benn.substack

Discussion about this post

Ready for more?