A few months ago, I said there were two ways to figure out how to fix a failing bar:1
You could use data. You could track every purchase, monitor every customer interaction, and instrument every corner of the bar. Log everything that happens in some huge database, and analyze all of it—ordering patterns, foot traffic, temperate readings, weather data, demographic profiles, market trends, everything. Are people who order the cocktails less likely to come back than beer drinkers? Is the average wait time for a drink above the neighborhood average? Does your bar attract the type of people that other people are attracted to? Something is holding your bar back, and if you follow your customers’ digital footprints, they’ll lead you to it.
You could talk to people. Interview them about your music, your drinks, and your bar’s vibe. Put out a suggestion box; ask them for feedback. Rather than trying to decipher their opinions through a cloud of behavioral exhaust, just ask for them directly.
I thought that option two was better, and AI might make it better still. But there’s actually a third option that I didn’t think of: You could watch people. You could sit in the corner of the bar with a notepad, and write down everything you see. The point isn’t to quantify it—that’s just manual data capture—but to actively listen, and see if you notice anything surprising. Watch what customers order; what they do; what types of people come in your door. Do you notice patterns?2 When your DJ3 starts their set, does it get the people going, or do they start shopping on their phones? Though your conclusions may not be “statistically significant,” or even quantifiable, they still have force. You don't need to know how to define dancing to know when something is a banger.4
On one hand, this third option is terrible, because it’s a combination of the worst elements of the other two. Observations, like the behavioral datasets that you, are an indirect way to understand customers’ opinions. But unlike working with data, observation is hard to scale. At the end of a night of note-taking, there’s no way to aggregate up what you saw into precise facts and figures. It’s analysis by anecdote, math by memory, and inference by vibe. We can add up structured likes and dislikes, but we don’t have a way to “average” 200,000 freeform comments5—and we don’t have a way to average eight hours of passive observation either. The best we can do is watch, and then ask ourselves if we remember seeing anything interesting.
On the other hand, watching customers includes some of the best parts of user research and of data analysis. Like behavioral logs, observations are revealed preference; they are “true,” and not self-reported representations that may not always be entirely honest. Moreover, unlike behavioral logs, observations can capture things that are difficult to quantify, or things we haven’t thought to quantify.6
Perhaps most importantly, though, watching customers can reveal surprising results that we didn’t think to ask about. This is one of the biggest problems with traditional data analysis. The insights hidden in it like the Isla de Muerta: They can’t be found except by those who already know where they are. User research and open-ended feedback forms don’t have that problem—and neither does watching customers from the corner of a bar. If you watch for long enough, and remember what you see well enough, you’d probably see a lot of things that you never would’ve thought to look for.
Of course, that’s a big if. You have a bar to tend; you can’t actually spend all night terrorizing your customers with your furious note-taking. It’s also hard to remember everything you see, and your conclusions, though emergent, would likely be plagued by recency and availability biases.
But what if you had an army of invisible watchmen, with a notepad long enough to perfectly record every detail they saw, and with a memory sharp enough to be able to recall all of it at once? What if you could ask that omniscient thing what it saw that was interesting and surprising? What if you had three tools for saving your bar—a database of numbers to analyze, a library of customer interviews, or this third thing, a thing that watches everything, remembers all of it, and finds patterns in the noise—which would you choose?7
I mean, you might not always choose the third one, but it’d be a pretty nice option to have. Because it’s not data analysis, exactly, but it’s something bigger: It’s the goal of data analysis. Most analysis is meant to find these sorts of patterns in charts and graphs. What if we didn’t need the numbers at all, but we could just watch and remember?
Attention is all you have
When ChatGPT first came out, a whole bunch of startups immediately tried to use it to help analysts with option one. On the surface, that made sense: Companies have enormous haystacks of data, there are (allegedly) patterns in that data, and LLMs are built on top of literally every recorded idea in human history. If something could connect the dots in our datasets, it would be ChatGPT.
So far, it hasn’t really worked. According to dbt Labs’ recent survey of data professionals, only a third of them are using AI in their daily workflows, and I’d guess a much smaller fraction of them are using it to do analysis directly. As Seth Stephens-Davidowitz said of his book Who Makes the NBA?: Data-Driven Answers to Basketball's Biggest Questions—which he wrote in thirty days with heavy support from ChatGPT—though AI “eliminates all the parts of my job or work process that I hate [like] cleaning datasets, merging datasets,” its “current capabilities in correlating [its] extensive knowledge to generate new theories or uncover complex patterns were limited.”
In hindsight, that also makes sense. A lot of analysis is logical reasoning. It’s asking, “Why might people not like my bar?,” and thinking through the possible causes. To do that, you need a rough theory of how bars work, and what people like, and how those things are connected.
Large language models only appear to use theory and reason. Instead, what makes them powerful is that they have incredible memories. If you ask ChatGPT to write a blog post that’s full of subliminal messages that compel people to buy technology stocks, it will recall an enormous corpus of blog posts. It will find texts that it associates with subliminal messages. It will fetch everything it knows about technology stocks.8 And it will then smash them all together into a post that represents a rough average of every related piece of content that’s ever been written.9 It doesn’t know what the prompt it was given means;10 it only knows how to create something that fits the prior patterns of everything it’s observed.
In some very rough sense, that’s what LLMs are—engines for finding patterns in words. But the underlying transformer architecture behind LLMs can be applied to any sorts of sequences.11 ChatGPT and other LLMs take input text, break it up into small chunks, turn those chucks into complex mathematical objects, and predict, based on historical patterns, what should come next. People have already built models that do the same thing for audio and video, using essentially the same architecture.12 They listen to and watch everything, remember all of it, and reproduce the patterns that they see.
If we can build models that do for audio and video what LLMs did for text, could we build the same sorts of models on behavioral data? Could AI save my bar, not via LLMs cosplaying as analysts writing SQL queries,13 but via transformers trained directly on sequences of customer purchases and interactions?
Large data models
I guess we’ll find out. Earlier this year, Motif Analytics14 announced that they’re building foundation models of event sequences:
To model event sequences with minimal changes to the underlying data, our approach is to mimic the training of LLMs on text data but with two key modifications:
Event tokenization: We developed tokenizers for events that richly capture their timing and various properties that have been logged in a flexible way.
Multi-time-scale loss function: Modify the training objective to make sequence models less myopic, encouraging them to learn about how sequences will evolve over the next few minutes, hours and days rather than simply the next event.
These architectures comprise what we are calling Generalized Large Event Analytics models (✨GLEAM✨).
Motif is building a model that takes streams of product event data—which includes the event itself and attributes about the event, like who the user was, the device they were using, how long they’ve been a user, and so on—breaks those events into chunks, and turns those chucks into complex mathematical objects. They then use the same transformer architecture that powers LLMs to predict what events are likely to come next, based on what happened, who did it, and the context in which they did it. It’s a machine for observing everything, and finding patterns.
Will it work? I have no idea. Training LLMs is still very hard, and requires a lot of finessing and fine-tuning to get right. Training a similar model on events is surely even harder. There’s also no internet of event logs to scrape. This whole idea could be crazy.
But I’m intrigued, because it’s one of the first efforts I’ve seen that attempts to apply the bitter lesson to analytical problems. GLEAM isn’t trying “to make systems that worked the way the researchers thought their own minds worked;” it’s not trying to build knowledge or theory into AI agents; it’s not trying to do analysis by making computers act like analysts. Instead, it treats AI like what it is: a blistering calculator with a breathtaking memory. It’s feeding patterns into a pattern-generation machine. And it’s saying that maybe people can get what they want from analysis—an understanding of the patterns in the world around us—without actually doing analysis, or even directly touching data.
Which is perhaps exactly what we need. The numbers may have been the problem all along. Data quality is really hard.15 Math is even harder. Let’s be humble, and get some help.16 Transformers, if applied directly on top of the behaviors we want to find patterns in, could finally be what saves our bad bar—even if they couldn’t save some people from theirs.
Give it up for the real DJ Earworm.
These two reaction videos are good examples of how option three is different from options one and two. If you wanted to figure out how people felt about the two songs, you could try to measure it with data. But using what metric? Head nods? Smiles versus aghast stares? How would you even track that? Or, if you preferred option two, you could ask peole what they thought of each song. They’d probably give somewhat political answers though, based on what they thought other people thought about them, and based on how they felt about the overall beef. But if you just watched the video and looked for informal patterns, it’s very easy to figure out the vibe.
I.e., the stuff in the reaction videos.
I guess you also have a fourth option for saving a bad bar, which is to write a song that says, nuh uh, the pub isn’t bad, actually. I doubt it would work, but Stranger Things have happened.
This isn’t exactly true. LLMs only appear to have incredible memories too. There is no database of every book that’s ever been written in ChatGPT; it can’t “remember” all of Moby Dick. Instead, there is a giant predictive model that’s been trained on every book that’s ever been written, and Moby Dick—and everything else—alters inside the weights in that model.
Less generously, it will crush centuries of human creativity and expression into a single sterile screen.
If you need proof that ChatGPT has no theory about what words actually mean, it called the blog post that was supposed to subliminally convince people to buy technology stocks “Unleashing Future Potential: Why Investing in Technology Stocks is a Smart Move.” Though, fair, I guess it would’ve been easier to brainwash Derek Zoolander to kill the prime minister of Malaysia with a song called Derek Zoolander, go kill the prime minister of Malaysia than with one called Relax.
I think? I should caveat this by saying that I have no idea what I’m actually talking about. My career as a proper data scientist started and ended with scoring a few A/B tests using high school statistics written in raw SQL, so I’m making most of this up as I go.
The success of the LLM paradigm is enabled in part by the use of tokens that elegantly unify diverse modalities of text—code, math and various natural languages. In this work, we consider how generative models of visual data can inherit such benefits. Whereas LLMs have text tokens, Sora has visual patches. Patches have previously been shown to be an effective representation for models of visual data.
My only advice: Drop the “analytics.” Just, Motif. It’s cleaner.
It’s common for people to say that generative AI makes data quality even more important. Bad training data makes bad models; garbage in, garbage out; all that. That feels…backwards? Messy data is a problem for computing metrics, because one incorrect bar tab will make your revenue KPI wrong. However, if your goal is to understand patterns, you’re looking for approximations. Tools like Motif could reach the conclusion that customers who spend a lot of money tend to order wine on top of pretty messy data. In some ways, that’s the beauty of transformers. You can stuff a bunch of messy data underneath them and they don’t lose their balance. I mean, ChatGPT is trained on Reddit.
You can also pretend to be a customer. In other words: the Renato Rosaldo method of 'deep hanging out' (Geertz 1998)
The underlying premise here is that "the answer" can be found in simply passively observing or collecting the status quo (through different means and with different analysis tools). Wouldn't another option (that does directly apply outside of the save-the-pub context) be to make a list of things that you could change that *might* have a positive impact, and then make those changes carefully and observe the results: change up the music selection, change up the sound level, change up the lighting, etc.?
Through that lens, generative AI can become a useful brainstorming companion: "I have a bar that is [describe "failing bar" characteristics] and would like to make some changes to make the bar more successful. Give me a list of 20 things that are reasonable changes to try..."
That kicks into more of a scientific method approach: cull through that list and treat it as hypotheses, refine them and see what others ones that lists sparks, prioritize them (possibly by applying human judgment to how quickly you would expect to see an impact and what the expense would be to try the change), figure out how you are going to determine if each one "worked," and start rolling them out.
What's happening with the status quo doesn't always hold the answer as to how to drive improvements (if you're concerned that your storefront is not getting sufficient foot traffic, studying the movements of the people who *are* coming to the store is like the drunk looking for his keys under the streetlight because that's where the light is, even if it's not where he dropped them).