Large language labor markets
Everything is about to get very weird.
Four hours before I published last week's post—which argued that, because LLMs are so expensive to develop, OpenAI shouldn’t be an iPhone but should instead be AWS—Databricks announced that they had created a chatbot that could be built cheaply and in under 30 minutes.1 Their bot wasn’t as good as the one OpenAI had that was backed by GPT-3—which isn’t as good as one using GPT-4—but it was still, compared to what most people thought chatbots was capable of in the middle of last year, a huge leap forward.2
People had questions. Does this change everything? Does that mean that LLMs are becoming commodities? Will every company train their own? Is OpenAI’s strategy of launching an app store—and using an ecosystem (or regulation) as its moat, rather than a technology—the right strategy? Will open source models ultimately win out?
I have no idea—clearly, this blog is a lousy source of the latest news on AI.3 However, it did raise one interesting question for me about where all of this is headed: Are we going to have labor markets for LLMs?
Historically, economists have divided the labor market into two groups: Low-skilled workers and high-skilled workers. As the name suggests, low-skilled workers are people who hold jobs that don’t typically require advanced degrees or special training, like service workers, taxi drivers, construction workers, and so on. High-skilled labor, by contrast, are lawyers, doctors, engineers, electricians, or any other profession that takes a long time to develop the ability to do.
Though the terminology is problematic, the concepts provide a useful sketch of how labor markets work. Jobs that require what are perceived as more generic skills get paid low wages, whereas those that require a lot of expertise can usually command higher wages. And jobs that require extreme specialization—say, a heart surgeon over a primary care physician—are paid an even greater premium. Very roughly, the more a person invests in acquiring skills, the more expensive their labor is.
The introduction of Dolly suggests that LLMs and other generative AI models are also “skilled,”4 and could be divided along the same crude lines that economists use to divide the labor force. Dolly is low-skill. You might not trust it to send an important email to your boss, but you’d probably be fine with it making a restaurant reservation for you. GPT-4 is high-skill—it might actually be good enough for that email to your boss. And companies will surely develop specialized models that extend GPT-4 (or GPT-5, or GPT-6S Plus) with specific training data, and are particularly good at creative writing, or molecular chemistry, or negotiating for higher salaries.
These high-skill models are likely to be more expensive to employ than their less-skilled counterparts. They would cost more to train because they’d have to be tuned, tested, and refined against a new set of complex tasks. They would also cost more to run. In order to perform advanced tasks, LLMs would probably require longer prompts, which are much more computationally expensive.
You could imagine a world where we call Comcast, and talk to a cheap, low-skill LLM that can’t figure out how to cancel our subscription. By contrast, USAA, which markets itself as having industry-leading customer service, would be running on more refined models. If their frontline AI can’t solve our problem, we’d get escalated to another model that’s specifically designed to help us file an insurance claim or apply for a home loan. The quality of these services becomes yet another way companies try to differentiate themselves, or upsell premium packages.5
This dynamic could appear everywhere. There could be expensive AI therapists that keep years of dialogue in their conversational memory, and there could be cheap ones that start from scratch every session. There could be good AP U.S. History tutors (and test-takers), and bad ones that hallucinate facts about American history.6 There could be AI lawyers that specialize in obscure corporate tax law—for the right price. People with money may be able to train models on lookalike groups of patients and get personalized medical care, while people without it may have to rely on ChatWebMD.
The same could be true in the corporate world as well. Companies don’t pay a premium for McKinsey because of the quality of their consultants, but because McKinsey promises to train a dedicated model for all of their clients. Google builds a better writing assistant than anyone else because it’s trained on private data inside of Google Docs, which only Google has access to.7 HBO stops investing in the best writers, and starts investing in creative LLMs. Nike’s ads are developed by a team of experts and a fork of Midjourney that’s tuned to create inspiring and emotionally arresting imagery; Pepsi storyboards their ads with Microsoft Tay.
True, it’s possible that the floor for these models, including the cheap ones, is so high that the differences don’t actually matter. ChatGPT might already be a halfway decent lawyer; in a year, even the lowest-cost LLMs might be a regular Elle Woods. Prices may also fall so much that the best and worst models are all trivially cheap.
Still, it’s also possible the ceiling for some of these tools is so high that there’s still a huge gap between what’s cheap and what’s expensive—it’s just that the expensive ones are capable of things we can barely fathom today. Or, even if the differences between models are small, the advantages they provide, like being able to ship code faster, could compound quickly. Employing a marginally better AI “workforce” might not make a discernible difference in how a company operates from day to day, but those benefits could accrue into huge gaps over time.
Databricks’ Dolly was released under a headline that said it was “democratizing the magic of ChatGPT.” It’s another parallel to the iPhone: Generative AI tools could be egalitarian, and afford no privilege to the rich. The reality, however, seems much more complicated—and much weirder. Legal LLMs could drift towards being intentionally obtuse so that you have to pay them to interpret one another, and can’t replace your lawyer with your low-skilled personal assistant chatbot. A cabal of overbearing parents is going to bribe the College Board for all of its test data so that they can create a test-taking and essay-writing bot for their kids. Someone will propose that LLMs pay income tax, and someone will become the first AI labor economist.
Whatever happens, it feels like one thing is pretty clear: We’re barreling towards a very strange future. Even if we don’t go to war with Skynet, or if Bard doesn’t harvest us to be batteries for Google’s data centers, it feels like a lot of the assumptions we make today about how society works, including the basic underpinnings of how jobs work, are about to get upended.
Here’s to hoping that it doesn’t happen faster than we can handle—unless, of course, at the pace we’re going, it already happened four hours ago.
I’m speculating a bit here. Dolly’s creators said that it demonstrated a “surprising degree of the instruction following capabilities exhibited by ChatGPT,” which I read as acknowledging that it’s not as capable as GPT-3.
The big brained computer is once again reminding me to do the reading before writing one of these things.
We even benchmark that skill in the same (somewhat dubious) way that we benchmark it in people: with standardized tests.
“Our starter plan includes a maximum of 10 user licenses, all of our essential features, and our 24/7 chatbot support. Our enterprise plan includes unlimited seats, SAML-based single sign-on, and a chatbot trained on your own usage data, updated once a quarter.”
Is this legal? Probably not. Are they doing it? I’m sure they’ve thought about it. Could it hallucinate something from someone else’s document into your own? If they haven’t done it, I’m guessing that’s why not.