benn.substack

Thanks, and yeah, on your last point, I bet we start to do a lot more to collect that now. Collecting structured data is pretty easy, but it wasn't always like that; we built a ton of stuff to help us do it because we had ways to work with it. But we didn't bother as much with conversations and unstructured stuff, because even if we collected it, it was hard to work with.

But now that it's easier to work with, suddenly I think there's going to be a lot more effort to collect it, and a lot of time things are going to get built that collect things in the way that ad trackers figured out clever ways to collect structured stuff. Like, the blocker on recording our conversations wasn't how to record the conversation, but in figuring out what to do with it.

Expand full comment

Vivek Aithal

May 16

Hey Ben! Enjoyed this post a lot, as always! I keep grappling with this question - is the difficulty in figuring out "how can my team move faster/what sales deals are in trouble" in either:

1. this is not super obvious in the data, or the data just doesn't have this info - or

2. there is no way to easily ask 100 proxy questions and experiment (because nobody really knows the answer to such questions, and the only way to find out is to guess 100 tangential things first)

Neither SQL writing copilots, nor Notion docs that has transcripts can solve this problem if the bottleneck is ease-of-guessing-and-trusting-what-comes-out.

[I say this while building your classic YC startup building talk-to-data / data modeling AI agents. The future is agents, finding the insights in our database that we could not. No, really! ;) ]

Expand full comment

Thanks! And yeah, even if something like this existed, it doesn't seem like an outright replacement for traditional analytical work (or the people/bots that do it). Though I do think the gravity could shift, where if more and more answers are in this nebulous unstructured thing, less and less time is spent looking at the normal thing.

Expand full comment

David Wilkens

May 18

Centralizing data and bringing your software to it is the next move. Just like Tabular did this for storage/compute (bring your own compute or headless warehouse), Nuon or someone else will do this with Saas. The key as you say is the databases. Can you build a system that is flexible enough to adapt and choose databases for you and allows you to operate over all of the different types of data that encompasses your whole internet life. It looks like Databricks is making the move with Neon. They are the best positioned to succeed. The future is everyone has their own personal cloud account that stores all of our data where we permission software to act on it.

Expand full comment

Yeah, I'm sure everyone will try to do it. I do wonder how much of it becomes like a traditional database, where data is stored in the thing, and how much of it works as a kind of connector out to other services, though I suspect that distinction depends on what people end up doing with all of it.

That said, the personal cloud part feels optimistic? Though maybe in a sense like everyone has a Google drive, they all have their giant bucket of life surveillance?

Expand full comment

Erald David

May 18Edited

As always, Benn articulates ideas better compared to what I had in my head. But I think you're underestimating the usefulness of having "unstructured, non-tabular" data that's queryable.

What's missing, I believe, is clarity on exactly how that data will be used for business impact ("reduce cost" or "identify/generate demand").

For example: I'm a big believer that by transcribing every word from Sales and Customer Service calls, you can apply the "Not Not" and Jobs To Be Done frameworks to get a more complete picture of your customers (basically the promise of "Personalized service from data" that we have never ever able to deliver)

Expand full comment

So back in one of the old posts about this, I said that you could imagine statsitical-ish techniques developing on top of text data, where people start figuring out concepts like averages and standard deviations of text in some way. But I hadn't thought about this angle, where instead of people figuring out stats-like things to standardize, they instead figure out research techniques and frameworks that become standardized.

One thing that I haven't thought about is how you could potentially apply various research techniques to this stuff automatically. Where instead of stats packages, you have more formalized ways to apply this kind of "what is the primary concern / job to be done / whatever from these documents?"

Expand full comment

Reply (2)

Rex

May 29

Correct me if I am wrong, but this sounds like NLP. Also I don't see how you get past doing some numeric analysis especially if you are trying to draw insights from text.

Expand full comment

Jun 1

I think it's much more vague than that. Like, imagine that you hire a user researcher who is very good at analyzing user surveys and stuff like that. That person could read some surveys, synthesize some points, and say something like, "ah, I realized that people pretty consistently seem to love X and are bothered by Y." Or imagine watching a basketball game and seeing a player struggling on defense. I don't think you need numbers to come to those conclusions; you just need a discerning eye and good enough memory to connect those patterns together.

AI functions basically the same way as that, with a far longer and more precise memory. They can say things like "Your last 1000 customers loved X and hated Y," in the same way that a person could for 10. That's not classic NLP, nor is it numeric analysis, because the AI isn't quantifying anything to come to those conclusions.

Expand full comment

Erald David

Benn, please remember me as random Substack subscriber after your next start-up (I can already imagine the tagline: "BI tools, but for text") blew up from this

Expand full comment

Jun 1

If I build another BI tool, please come relentlessly yell at me until I stop.

Expand full comment

Kingsley Uyi Idehen

May 17

Totally nailed it!

I say that as a data practitioner who’s been on a lifelong mission to solve this exact problem through our Virtuoso Data Spaces Platform.

Why “Data Spaces”? Because it rhymes with “Database,” while shifting the focus from tables or graphs to Data Source Names (DSNs) as the core abstraction.

1. https://www.linkedin.com/posts/kidehen_dbms-filesystem-dataspaces-activity-7199873803311529984-FUC3 -- background on the foundational idea behind Virtuoso Data Spaces.

2. https://www.linkedin.com/posts/kidehen_notebooklm-opal-ai-activity-7282547137278619648-GkcN -- practical application of the concept.

3. https://www.linkedin.com/pulse/addressing-dbms-innovation-stagnation-hyperlinks-super-idehen-nbuhe -- reflections on the broader context and why this shift matters.

Expand full comment

Yeah, this seems like roughly the idea, at least for sourcing data. I think a lot of it depends on the interfaces for interacting with it. How hard is it to query? What does a query even mean? What sort of workflows and "apps" can you build with those queries? That seems like the second part of the question.

Expand full comment

James Borden

May 16

I am in no danger of forgetting how terrible the Hornets are because evidence of their games against the Nets showed that they were worse.

That Notion self-description sounded legitimately neat from reading too many LinkedIn AI agent posts. But what it seems to do is just conglomerate everything the organization knows instead of providing a process for thinking through what it knows.

Expand full comment

Sadly, the Hornets don't seem to be in any danger of reminding us how terrible they are either.

And yeah, the Notion thing seems like it has some potential to do fairly cool stuff, but then it's mostly a kind of search? I assume they're planning on (/trying?) to do more though. Everyone seems to have the same ambition here, to be something like the centralizing place for all of the information everywhere. And I guess at least one thing will get that right.

Expand full comment

Guy Kerem

May 19

Prophetic.

The ultimately data source is… us. A never ending stream of consumption, decision and interaction.

Every day we leave a mountain of information in our wake and a fraction of it accumulates in the hoards of tech behemoths.

What happens when we capture all of it in a way that’s efficient to retrieve and process?

What happens when we own it?

Expand full comment

So I definitely think we'll get a lot closer to collecting all of it. Now that we have something to do with it, I think we'll find tons of ways to record more of it.

Us owing it though, that seems...less likely? People seem happy to trade it away for a few free services on the internet (which, is maybe a gross trade, but not really a bad one? Would I trade all of my browsing data away to Google for the services they give me? Like....probably?)

Expand full comment

Guy Kerem