I mean, surely something in this sequence is wrong, right?
Large language models cost a fortune to build. OpenAI, which is reportedly in the process of raising $6.5 billion, needs $6.5 billion dollars, because, “by some estimates, it’s burning through $7 billion a year to fund research and new A.I. services and hire more employees.” Anthropic is expected to spend $2.7 billion this year. Facebook is spending billions more.
It probably won’t get cheaper. Chips might get better; compute costs might go down; Moore's law; etc, etc, etc. But as models get better, pushing the frontier further out will likely get more difficult. The research gets more harder, and the absolute amount of compute required to train a new model goes up. It’s climbing Mount Everest: The higher you go, the thinner the air, and the tougher each step gets.1 Even if it gets cheaper to do the math required to build new models, that math has diminishing returns. To build a better model in 2024, you have to do more and harder math than you had to do in 2023.
Despite these costs, people will probably keep building new models. People believe that LLMs are the next technological gold rush, and the companies that build the best ones will make their employees and investors a fortune. They are trying to build artificial general intelligence. Human nature compels us to make everything faster, higher, and stronger.
If the industry does keep building new models, the value of old models decays pretty quickly. Why use GPT-3 when you can start using GPT-4 by changing a dropdown in ChatGPT? If a competitor puts out a better model than yours, people can switch to theirs by updating a few lines of code. To consistently sell an LLM, you have consistently be one of the best LLMs.
Even if the industry doesn’t keep building new models, or if we hit a technological asymptote, the value of old models still decays pretty quickly. There are several open source models like Llama and Mistral that are, at worst, a step or two behind the best proprietary ones. If the proprietary models stop moving forward, the open source ones will quickly close the gap.
Therefore, if you are OpenAI, Anthropic, or another AI vendor, you have two choices. Your first is to spend enormous amounts of money to stay ahead of the market. This seems very risky though: The costs of building those models will likely keep going up; your smartest employees might leave; you probably don’t want to stake your business on always being the first company to find the next breakthrough. Technological expertise is rarely an enduring moat.
Your second choice is…I don’t know? Try really really hard at the first choice?2
Eighteen months ago, I said that foundational LLM vendors are potentially the next generation of cloud providers:
So here's an obvious prediction: AI will follow a nearly identical trajectory [as AWS, Azure, and GCP]. In ten years, a new type of cloud—a generative one, a commercial Skynet, a public imagination—will undergird nearly every piece of technology we use.
Other people have made similar comparisons. And on the surface, the analogy seems roughly reasonable. Foundational models require tons of money to build, just like cloud services do. Both could become ubiquitous pieces of the global computing infrastructure. The market for both is easily in the tens of billions of dollars, likely in the hundreds of billions, and potentially in the trillions.
There is, however, one enormous difference that I didn’t think about: You can’t build a cloud vendor overnight. Azure doesn’t have to worry about a few executives leaving and building a worldwide network of data centers in 18 months. AWS is an internet business, but it dug its competitive moat in the physical world. The same is true for a company like Coca-Cola: The secret recipe is important, but not that important, because a Y Combinator startup couldn’t build factories and distribution centers and relationships with millions of retailers over the course of a three month sprint.
But an AI vendor could? Though OpenAI’s work requires a lot of physical computing resources, they’re leased (from Microsoft, or AWS, or GCP), not built. Given enough money, anyone could have access to the same resources. It’s not hard to imagine a small team of senior researchers leaving OpenAI, raising a ton of money to rent some computers, and being a legitimate disruptive threat to OpenAI’s core business in a matter of months.
In other words, the billions that AWS spent on building data centers is a lasting defense. The billions that OpenAI spent on building prior versions of GPT is not, because better versions of it are already available for free on Github. Stylistically, Anthropic put itself deeply in the red to build ten incrementally better models; eight are now worthless, the ninth is open source, and the tenth is the thin technical edge that is keeping Anthropic alive. Whereas cloud providers can be disrupted, it would almost have to happen slowly. Every LLM vendor is eighteen months from dead.3
What, then, is an LLM vendor’s moat? Brand? Inertia? A better set of applications built on top of their core models? An ever-growing bonfire of cash that keeps its models a nose ahead of a hundred competitors?
I honestly don’t know. But AI companies seem to be an extreme example of the market misclassifying software development costs as upfront investments rather than necessary ongoing expenses. An LLM vendor that doesn’t spend tens of millions of dollars a year—and maybe billions, for the leaders—improving their models is a year or two from being out of business.
Though that math might work for huge companies like Google and Microsoft, and for OpenAI, which has become synonymous with artificial intelligence, it’s hard to see how that works for smaller companies that aren’t already bringing in sizable amounts of revenue. Though giant round funding rounds, often given to pedigreed founders, can help them jump to front of the race, it’s not at all obvious how they stay there, because someone else will do the same thing a year later. The have to either raise enormous amounts of money in perpetuity,4 or they have to start making billions of dollars a year. That’s an awfully high hurdle for survival.
In this market, timing may be everything: At some point, the hype will die down, and people won’t be able to raise these sorts of rounds. And the winners won’t be who ran the fastest or reached some finish line, but whoever was leading when the market decided the race is over.
I guess some people would say this is true until AGI solves it for us? It gets more expensive until you build a robot that does it for you, and then it gets way cheaper.
The market needs to be irrational for you to stay solvent.
I echo what James said, which I think you completely missed in your analysis: gpt-4o now also memorizes questions to get to know you better - the more I use it, the more difficult it becomes for me to switch, since it knows all my history, the questions I asked, what I care about, when I asked them, just like a partner in life.
I’m already personally at a point that I don’t see myself able to move to any other LLM, unless it’s 10x materially better, or GPT raises its prices by 5x or smth.
I can encourage everyone in the comments who is an active gpt user to ask the following question:
“Can you tell me about myself in 400 words”
You’ll be surprised how well it already knows you..
moving into a thought experiment on how the future could look:
I believe everyone will land with their core LLM, who will become their trained life coach or advisor, and it will become the centre piece of all digital interactions, similar to how social media accounts became the key online credentials.
eg. expect to be able to log in into Salesforce using my ChatGPT login details (as I do with google today), and all GenAI features/capabilities in Salesforce will be using my own personalised token.
This is a great write-up. I think there is a good thought process evolution here from LLMs being cloud providers of 2024s to not having the same business model. The question of most remains. To my mind the current moat (till it lasts is) who can build a better narrative to raise more.moeny. and I am not saying it in a bad way. When you are in an industry like say semis, which needs huge upfront investment, sometimes all that matters is how much more money you can raise vs your competitors. Once the dust settles, we will probably have a couple of LLMs standing, which will be closely integrated with existing cloud providers like AWS or Azure for GTM.