In my experience, a lot of confusion steps from the fact that people don't know exactly what they mean by a metric in many cases. They say they want to know "revenue" but it's the process of calculating it that clarifies their definition exactly. Which is where any purely semantic based querying for robust reporting will struggle, most metrics naturally change over time as a company evolves.
I feel like for this to work at the BI tool level they need to try and deliver an agent that behaves more like the head of BI / the person who is ultimately responsible for KPI reporting integrity, and not like a smart analyst who can just produce the number.
Even playing around with Stripe's new reporting tool - which is on a bounded and consistent dataset from stripe's perspective - the second I ask for any important number, my next action is to cross check against some source of truth or manually built query.
A great agent experience doesn't just help with the data access butdoes a few of these steps also
- asks if this is an established KPI / metric
- asks for some examples of the metric so it can cross check query against it (maybe even go and check against some external tool)
- maybe tracks who else in the org has asked for this figure before
- plays back the latest definiton to the user
- etc. etc.
This may mean that it doesn't deliver the magical answer on a plate experience, but it will be much more valuable.
Yeah, all the subtleties of analytical work are so hard to capture in a rigid framework, no matter how much semantics & AI we add on top of it.
And if no one is double-checking anything, no one will ever know if the answer is any good. Sometimes AIs even get confused about when "yesterday" or "last month" is, so it's like every possible edge case that could go wrong needs to be accounted for.
I work closely with our finance team, and they may seem to ask the same questions all of the time, but every single time is a little different than the last. It's all about context and understanding the nuances of the business. What should be included or excluded is not always the same from one question to the next. And sometimes the answer is to add new capabilities/features/dimensions to the data, rather than trying to produce an answer at any cost.
But at least when it comes to finance, it's much easier to double-check the results since the accounts will either balance or not (and the people who ask usually already have a ballpark answer in mind and a spreadsheet ready to cross-check the provided results). Other types of questions do not enjoy the same safeguards.
Agreed, and that seems like what someone eventually sort of figures out, and how to replicate those sorts of double-checking and reasoning steps. Because they aren't that hard, and they are probably can be loosely programmed, if programmed means "here are some best practices." It's programming more like a you would train a person, rather than precise instructions for a computer. I have no idea if that actually works, but it seems like the world we're lurching towards.
All the metrics have to do is evolve with time; that's the key. We've become fixated with thinking of business metrics as static rather than drifting... and then we express shock and surprise when the drifted metric doesn't match our (current) mental definition of it.
Define or redefine them on the fly? Link to their Gmail and Confluence to do that? Move some up and down in importance, practically on a daily basis?
(PS: Thanks for this post, Benn. This is close to my heart because we literally just published a blog post about this exact same concern!)
Seems dangerously close to building BI, and never again. (Though I do think there's something interesting in it, if you or other folks are building something like it)
Aggregation theory. No one wants to be a commoditized compliment. All BI vendors have a life and death incentive to not be just used by someone else’s agent. Demand is all that matters.
Sure, I don't think that most BI tools would want to do this, though not necessarily because of aggregation theory (which, honestly, I mostly think is a mostly empty concept), but because every product wants to be the main thing people use. So BI tools are close enough to being able to build bots that they probably want to be the bot too, even though I think that makes for a messier market than if they didn't do that.
This post hits home for me, obviously. I think that beyond the technical challenges of the semantic layers of years previous, it was also very hard to justify purchasing one. The VP of Data would go to the CFO and say "I need a semantic layer". The CFO would ask "what's that?". Then... the VP of Data... would... struggle?
But this time around - CEO comes to the VP of Data and says "I want to ChatGPT with my data!"
VP of Data goes to CFO and says "CEO wants AI with his data. I need X, Y and Z for that."
Easier to approve :)
Also - from a technical perspective, you have the interfaces (Claude, Snowflake Cortex Analyst / Intelligence, etc).
And you have the semantic AI-focused layers (such as Solid).
For now, that definitely seems true, since people seem willing to spend a lot on AI stuff right now. I'm not sure that lasts though, since you could definitely imagine some version like the CEO saying, "I want to chat with our data," and the data person saying "we need this chatbot BI tool thing, plus this database, plus this context + semantic layer thing," and the CEO saying, "Why can't I just do this with Claude Analyst?"
Which is probably really a question of how well it all works. And if that arrangement of tools works great, people won't ask lots of questions about the need for each piece. But if it doesn't work that well (which has historically been the the problem with self-serve, I'd say), then people will pick it all apart.
This is basically a documentation problem. Historically, documentation was not very useful and often went stale, because as soon as humans memorized it (or absorbed it organically without ever reading it), they never consulted it again (and hence no one bothered to maintain it, and it went stale. The value just wasn't there, and the effort couldn't be justified). New employees are usually shown the ropes rather than told to read the docs (which would be a terrifying prospect). The only docs that are ever useful & maintained end up being things like process checklists (especially for infrequent & tedious processes).
But since AI bots don't really have the ability to memorize/absorb knowledge in this way (for now, although that seems to be what Julius is trying to do), the next best thing is to give them access to that knowledge from some documentation/repository somewhere -- which means that we finally have a reason/incentive to maintain documentation, and even to start version-controlling it.
We're basically training new employees all over again, but they don't learn the same way as before (and they require a whole lot more diligence from their trainers, because they rarely (if ever) seem to stop to ask questions -- so we have to make sure everything is crystal clear ahead of time).
I think the question in there to me is, if we had good documentation, would these problems go away? I'm not sure it would. Because I think there's a very hard balance there, about how much you document. You can't document everything, because that would overwhelm both people and AI bots. But you have to document something, because that's the only way to figure out what's going on.
To take the extreme example, in a sense, query logs are a very complete form of documentation, and if you had enough time and a good enough memory, you could read all those old queries and probably figure out how to do anything. But that's too much to read for people and bots, so you have to trim it somehow. But what's the right amount, and what should it say?
Which I guess is a documentation problem, but it seems to be a much more complex one than needing to write stuff down. It's figuring out what to write down and how to use it. Which is maybe another way to frame what this context layer thing is: It transforms raw information into a summary that's more suitable to use.
My perspective as a data leader working with business teams is that the context layer should sit closer to the database than the BI tools. That way, it can be maintained once and reused everywhere—BI, LLMs, process automation, rather than being duplicated across tools, noting metrics do change and context changes what metrics mean.
In my opinion, the more important benefit of a context layer is in maintaining standardised data definitions (glossary, catalog) and making this usable.
I agree with Marco that this is fundamentally a documentation challenge—but the real problem is usability and maintainability of that documentation.
The key question for me is: while the context layer could be built anywhere and AI can certainly help to a certain extent, how do we connect definitions in the context layer seamlessly to the underlying data and then surface the same context consistently in any tool that uses that data
In my experience, a lot of confusion steps from the fact that people don't know exactly what they mean by a metric in many cases. They say they want to know "revenue" but it's the process of calculating it that clarifies their definition exactly. Which is where any purely semantic based querying for robust reporting will struggle, most metrics naturally change over time as a company evolves.
I feel like for this to work at the BI tool level they need to try and deliver an agent that behaves more like the head of BI / the person who is ultimately responsible for KPI reporting integrity, and not like a smart analyst who can just produce the number.
Even playing around with Stripe's new reporting tool - which is on a bounded and consistent dataset from stripe's perspective - the second I ask for any important number, my next action is to cross check against some source of truth or manually built query.
A great agent experience doesn't just help with the data access butdoes a few of these steps also
- asks if this is an established KPI / metric
- asks for some examples of the metric so it can cross check query against it (maybe even go and check against some external tool)
- maybe tracks who else in the org has asked for this figure before
- plays back the latest definiton to the user
- etc. etc.
This may mean that it doesn't deliver the magical answer on a plate experience, but it will be much more valuable.
Yeah, all the subtleties of analytical work are so hard to capture in a rigid framework, no matter how much semantics & AI we add on top of it.
And if no one is double-checking anything, no one will ever know if the answer is any good. Sometimes AIs even get confused about when "yesterday" or "last month" is, so it's like every possible edge case that could go wrong needs to be accounted for.
I work closely with our finance team, and they may seem to ask the same questions all of the time, but every single time is a little different than the last. It's all about context and understanding the nuances of the business. What should be included or excluded is not always the same from one question to the next. And sometimes the answer is to add new capabilities/features/dimensions to the data, rather than trying to produce an answer at any cost.
But at least when it comes to finance, it's much easier to double-check the results since the accounts will either balance or not (and the people who ask usually already have a ballpark answer in mind and a spreadsheet ready to cross-check the provided results). Other types of questions do not enjoy the same safeguards.
Agreed, and that seems like what someone eventually sort of figures out, and how to replicate those sorts of double-checking and reasoning steps. Because they aren't that hard, and they are probably can be loosely programmed, if programmed means "here are some best practices." It's programming more like a you would train a person, rather than precise instructions for a computer. I have no idea if that actually works, but it seems like the world we're lurching towards.
All the metrics have to do is evolve with time; that's the key. We've become fixated with thinking of business metrics as static rather than drifting... and then we express shock and surprise when the drifted metric doesn't match our (current) mental definition of it.
Define or redefine them on the fly? Link to their Gmail and Confluence to do that? Move some up and down in importance, practically on a daily basis?
(PS: Thanks for this post, Benn. This is close to my heart because we literally just published a blog post about this exact same concern!)
- Manu
Want to build this, or help sell this?
Seems dangerously close to building BI, and never again. (Though I do think there's something interesting in it, if you or other folks are building something like it)
Aggregation theory. No one wants to be a commoditized compliment. All BI vendors have a life and death incentive to not be just used by someone else’s agent. Demand is all that matters.
Sure, I don't think that most BI tools would want to do this, though not necessarily because of aggregation theory (which, honestly, I mostly think is a mostly empty concept), but because every product wants to be the main thing people use. So BI tools are close enough to being able to build bots that they probably want to be the bot too, even though I think that makes for a messier market than if they didn't do that.
This post hits home for me, obviously. I think that beyond the technical challenges of the semantic layers of years previous, it was also very hard to justify purchasing one. The VP of Data would go to the CFO and say "I need a semantic layer". The CFO would ask "what's that?". Then... the VP of Data... would... struggle?
But this time around - CEO comes to the VP of Data and says "I want to ChatGPT with my data!"
VP of Data goes to CFO and says "CEO wants AI with his data. I need X, Y and Z for that."
Easier to approve :)
Also - from a technical perspective, you have the interfaces (Claude, Snowflake Cortex Analyst / Intelligence, etc).
And you have the semantic AI-focused layers (such as Solid).
Which sounds a bit like your option #2 :)
For now, that definitely seems true, since people seem willing to spend a lot on AI stuff right now. I'm not sure that lasts though, since you could definitely imagine some version like the CEO saying, "I want to chat with our data," and the data person saying "we need this chatbot BI tool thing, plus this database, plus this context + semantic layer thing," and the CEO saying, "Why can't I just do this with Claude Analyst?"
Which is probably really a question of how well it all works. And if that arrangement of tools works great, people won't ask lots of questions about the need for each piece. But if it doesn't work that well (which has historically been the the problem with self-serve, I'd say), then people will pick it all apart.
This is basically a documentation problem. Historically, documentation was not very useful and often went stale, because as soon as humans memorized it (or absorbed it organically without ever reading it), they never consulted it again (and hence no one bothered to maintain it, and it went stale. The value just wasn't there, and the effort couldn't be justified). New employees are usually shown the ropes rather than told to read the docs (which would be a terrifying prospect). The only docs that are ever useful & maintained end up being things like process checklists (especially for infrequent & tedious processes).
But since AI bots don't really have the ability to memorize/absorb knowledge in this way (for now, although that seems to be what Julius is trying to do), the next best thing is to give them access to that knowledge from some documentation/repository somewhere -- which means that we finally have a reason/incentive to maintain documentation, and even to start version-controlling it.
We're basically training new employees all over again, but they don't learn the same way as before (and they require a whole lot more diligence from their trainers, because they rarely (if ever) seem to stop to ask questions -- so we have to make sure everything is crystal clear ahead of time).
I think the question in there to me is, if we had good documentation, would these problems go away? I'm not sure it would. Because I think there's a very hard balance there, about how much you document. You can't document everything, because that would overwhelm both people and AI bots. But you have to document something, because that's the only way to figure out what's going on.
To take the extreme example, in a sense, query logs are a very complete form of documentation, and if you had enough time and a good enough memory, you could read all those old queries and probably figure out how to do anything. But that's too much to read for people and bots, so you have to trim it somehow. But what's the right amount, and what should it say?
Which I guess is a documentation problem, but it seems to be a much more complex one than needing to write stuff down. It's figuring out what to write down and how to use it. Which is maybe another way to frame what this context layer thing is: It transforms raw information into a summary that's more suitable to use.
My perspective as a data leader working with business teams is that the context layer should sit closer to the database than the BI tools. That way, it can be maintained once and reused everywhere—BI, LLMs, process automation, rather than being duplicated across tools, noting metrics do change and context changes what metrics mean.
In my opinion, the more important benefit of a context layer is in maintaining standardised data definitions (glossary, catalog) and making this usable.
I agree with Marco that this is fundamentally a documentation challenge—but the real problem is usability and maintainability of that documentation.
The key question for me is: while the context layer could be built anywhere and AI can certainly help to a certain extent, how do we connect definitions in the context layer seamlessly to the underlying data and then surface the same context consistently in any tool that uses that data