It turned out that it doesn't scale; there's a ton of customization and integration work that needs to go into each account, and it was never worth more a few hundred million in revenue per year.
Autonomy rolled up a number of failing companies in this space and got HP to buy them; the resulting fraud case was likely bigger than all of the revenue generated from enterprise search ever.
A lot of the barriers are upstream of the AI/ML layer -- examples are getting a parser for proprietary document formats, or hand-coding specialized vocabulary and abbreviations that the model did not or could not pick up.
It's not always deep work, but you might have hundreds of little glitches to solve, and only a hand-wavey value proposition.
That mostly makes sense to me, though I'd think you could make the same argument about capital-D data collection too? Gathering it is really hard; it's all this bespoke stuff; the value prop is often pretty vague. Which isn't to say we necessarily *should* be doing that for data, but we *do* do it for data.
With unstructured stuff, I believe it's less likely to have critical business processes, because those get migrated to structured OLTP systems which are amenable to classic OLAP. So the value of unstructured data tends to be lower, and the cost to extract meaningful entities or actions is higher.
So I think that probably makes sense, though I'm not sure either of those things are necessarily true, as much as they've been historically true. To me, the causality goes the other way - it's been hard to extract meaning from it, so we value it lower and don't build critical business processes around it. But if we made it easier to extract stuff from it - ie, throw a no-shot LLM at it that seems to do pretty well - that could go in reverse. We suddenly look for more meaning, find more meaning, and start building more things on top of it.
I agree. The exact same thing happened with other kinds of complex data, until data processing scaled to make modern ML feasible. Consider product events, system logs, and sensor data - it was nearly impossible to extract meaning from these discrete or small-scale data points, and 'traditional' analysis techniques failed to capture their underlying structure (such as time-series properties, relations between numerous features, etc).
Now, it's the LLMs’ turn to do the same for textual data. There's still a lot of work to do in order to accomplish that reliably and at scale, though.
is the pronunciation “know-flake” 👀
Ok this is good
One thing: LLMs, the other thing: ChatGPT.
I don't follow?
Great perspective! ⛷️🌨️❄️❄️☃️
Enterprise search again!
It turned out that it doesn't scale; there's a ton of customization and integration work that needs to go into each account, and it was never worth more a few hundred million in revenue per year.
Autonomy rolled up a number of failing companies in this space and got HP to buy them; the resulting fraud case was likely bigger than all of the revenue generated from enterprise search ever.
ok but what if we do it with llms
A lot of the barriers are upstream of the AI/ML layer -- examples are getting a parser for proprietary document formats, or hand-coding specialized vocabulary and abbreviations that the model did not or could not pick up.
It's not always deep work, but you might have hundreds of little glitches to solve, and only a hand-wavey value proposition.
That mostly makes sense to me, though I'd think you could make the same argument about capital-D data collection too? Gathering it is really hard; it's all this bespoke stuff; the value prop is often pretty vague. Which isn't to say we necessarily *should* be doing that for data, but we *do* do it for data.
Yes it is comparable.
With unstructured stuff, I believe it's less likely to have critical business processes, because those get migrated to structured OLTP systems which are amenable to classic OLAP. So the value of unstructured data tends to be lower, and the cost to extract meaningful entities or actions is higher.
So I think that probably makes sense, though I'm not sure either of those things are necessarily true, as much as they've been historically true. To me, the causality goes the other way - it's been hard to extract meaning from it, so we value it lower and don't build critical business processes around it. But if we made it easier to extract stuff from it - ie, throw a no-shot LLM at it that seems to do pretty well - that could go in reverse. We suddenly look for more meaning, find more meaning, and start building more things on top of it.
I agree. The exact same thing happened with other kinds of complex data, until data processing scaled to make modern ML feasible. Consider product events, system logs, and sensor data - it was nearly impossible to extract meaning from these discrete or small-scale data points, and 'traditional' analysis techniques failed to capture their underlying structure (such as time-series properties, relations between numerous features, etc).
Now, it's the LLMs’ turn to do the same for textual data. There's still a lot of work to do in order to accomplish that reliably and at scale, though.
- btw, great piece, Benn