45 Comments

Benn,

I work for GoodData and I believe we do exactly what you describe. Here are examples of metric definitions in MAQL: https://github.com/zsvoboda/gooddata-jdbc/blob/main/scripts/orders.maql

MAQL translates to SQL at query time based on the context. In your example about NPS, I can have a simple query "SELECT customer, nps" where nps is a metric you define once and it translates to a SQL query with "GROUP BY customer". If you call "SELECT region, nps", it translates to SQL with "GROUP BY region". It's pretty much as if you pass function definition as an argument.

All that sits on top of your data warehouse and you can access it via APIs and SDKs, or connect it to any visualization library. More about it here: https://medium.com/gooddata-developers/gooddata-cn-modern-headless-bi-is-now-available-for-free-f4390ad4ca29

Expand full comment

Hey Petr, thanks for sharing this! This definitely looks similar. I'll see if I can get it running and play around with it.

I'm curious how people use it - is it mostly for developers looking to embed visualizations? For analysts? For people who want to access data that aren't quite comfortable with SQL?

Expand full comment

Petr, Maql would be a great way to go and you know we've been huge proponents of it at Keboola. The only issue is that it is fully embedded into the whole GD product. You can't "just run it" as part of your stack. If that would be possible we would have had it every where in Keboola:)) Pavel D.

Expand full comment

Benn, your Substack and DBT Slack contributions have convinced me that an open source metrics layer on top of dbt feeding headless bi is the future of the stack. It makes a ton of sense and rounds out the ecosystem. Thanks for your thoughtfulness and willingness to share.

Expand full comment

Thanks! I really appreciate that, and glad it's been useful!

Expand full comment

Very Sweet Article, Benn. Talking about the real problems and focussing on the Metrics / KPIs, making it easy for Business to be on track.

I'm working on a platform - Alphaa AI - which can enable business to ask questions around Metric / KPI and the platform converts it into SQL Query and gets the data.

Would be great, if we can collaborate on the topic.

Here is a little demo:

https://www.youtube.com/watch?v=IHODWM9UOn0

PS: happy to do a POC around this to make this concept come alive.

Expand full comment

Any idea, about the products or solutions, solving the same problem?

Expand full comment

Hey Saurabh, thanks for sharing this. I don't have a ton of opinions on the tools out there today, actually. They're all pretty new (I think only Transform is publicly available) so it's hard to say which ones work and which ones don't. I'm glad that people are taking different approaches though (some SQL based, some more configuration based, etc). My general bias is towards the SQL based ones, though that could prove to be wrong.

Expand full comment

What would be the difference between a Metrics layer and the old concept of Materialized Views? In my view, SQL queries plus and understandable view schema is enough to solve this issue if you are in the relational world. We also implemented a similar idea with Elastic + Logstash to create a "table view" of the KPIs. Then, the business guy imports such table view into Power BI or Qlick View, for playing around data.

Expand full comment

I don't think materialized views would quite get you there because you still have to predefine them like you would a table. For instance, if I want every possible combination of revenue by time internal, segment, geo, and product SKU, I still have to define the materialized view so that all of those combinations are available.

I think you can maaaaaaaybe solve that with some combination of views + stored procedures that build the views on the fly, though that probably ends up looking a lot like the metrics layer I'm describing. And while you can also get there by having PowerBI et al generate the views, that puts a lot of business logic in the BI tool rather than a globally accessible layer.

Expand full comment

Nice! Quite unaware of the specificness of the problem we were moving to solve this in Data Virtualization, by offering both the parts (week definition =...) and the wholes (production by week=.... (based on week definition).

Expand full comment

Whatever happened to Supergrain? Their website now positions them as a marketing campaign creation tool and doesn't mention metrics or headless BI. A metrics layer is a sensible idea but I wonder if it is easy to get traction in the market and convince customers with established BI implementations to insert a new layer and re-write all their dashboards.

Expand full comment

They changed directions a few months ago. I think they decided there was more opportunity in building marketing tools directly on top of the warehouse than work on metrics, which got crowded quickly. George talked a bit more about it in this thread: https://twitter.com/g_xing/status/1506333772316229638

Expand full comment

Hi! Benn. Very inspiring article! I have read about it several times.

Coincidentally, before I read your article and learned about Airbnb's work on metrics platforms, my company, Kyligence, helped several companies build their metric platforms. Kyligence serves as the metrics computation engine in our user's cases.

Inspired by your piece, I wrote an article about my understanding the metrics layer. I have also included an example of a metrics platform from our user's experience. https://medium.com/kyligence/understanding-the-metrics-store-c213341e4c25

Expand full comment

Thanks! And yeah, I wouldn't claim that the idea is entirely new. So much of what we build it just a remix of what other folks have already done.

Expand full comment

Good post Benn, totally agree with your POV. I am currently investigating a new open source solution that aims to fill this gap: https://metriql.com/

Worthwhile to take a look at it / update your post based on your findings?

Expand full comment

Thanks Rogier! And given all that's going on with the space, I might come back to this at some point. I wrote a short follow up about Airbnb's solution to this, so might be worth another one: https://benn.substack.com/p/minerva-metrics-layer

Expand full comment

Hi Benn. Great post and I couldn't agree with you more. As CTO and co-founder of AtScale, we've been working on building what we call a "semantic layer" for several years now. Semantic layers have been at the heart of BI platforms for years (Business Objects Universe, anyone?) so the concept is not new. However, like you, I believe the semantic layer should be separate from the consumption (i.e. BI, ML) tool so that it can be truly "universal" and provide consistency across a wide range of consumers. My co-founder wrote this blog that describes this approach: https://www.atscale.com/universal-semantic-layer/what-is-a-semantic-layer-why-would-i-want-one.

Expand full comment

Thanks Dave! It'll be interesting to see how this evolves over the next few years. There seems to be a growing consensus around the need for something like this and (as your post alludes to), it seems a lot more achievable if everything's in a data lake vs a bunch of warehouses.

The more I talk about this though, the more I think the key to solving it is getting the onboarding right. Teams seem to agree that there's a need for a universal layer like this; the hard part is making it easy for people to adopt and integrate it. I think whoever figures that out will have a huge leg up in making something like this real.

Expand full comment

You make a great point about onboarding. The biggest impediment to deploying a semantic layer is in building the data model first. I think the key is to have a vibrant community of model contributors and a modeling language that is open and easy to work with (CI/CD is a must). With the popularity of SaaS applications, the good news is that these schemas are fairly well known and lend themselves nicely to pre-built data models. The folks at DBT and Fivetran have done a nice job at creating a community of DBT models.

Expand full comment

Maybe...those sorts of pre-packaged models have always felt like something that's around the corner, but never seems to arrive. I took a guess as to why a while back: https://benn.substack.com/p/when-are-templates-going-to-happen

Expand full comment

I have been looking for this information for a long time, I was very surprised when I found it here.

Expand full comment

benn dot substack is a land of many wonders

Expand full comment

or, like, overwritten posts about data stuff, you decide

Expand full comment

we have solved some of this problem (https://www.thoughtspot.com/trial) where our answers query which is written in natural language and converts query into SQL and renders data in dashboards .. the architecture is similar to what is described as solution here

Expand full comment

Life comes a full circle .. been following you since this article..

excited to the two companies coming along and working with you .. :)

Expand full comment

Hi Benn. Maybe this article shows how Airbnb are solving the problem you describe? https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70

Expand full comment

This is pretty close. It nearly solves the exact problem that I think needs to be solved by providing a repository for metics that can be filtered, aggregated, etc in a bunch of different ways. The implementation, however, might not work outside of Airbnb.

To me, one of the reasons this works is because it's integrated so well with the rest of their analytics suite. Most of Airbnb's tools are homegrown, so they can make those integrations pretty seamless. But if you're buying tools from vendors, that doesn't work as well. Each vendor has to build an integration into Minerva's API, and some vendors won't do it, or will do it inconsistently.

One other potential gap as well - I'm not sure how this interacts with SQL-based analysis. They have R and Python APIs, but at a lot of companies, nearly all analytical work starts (and a lot ends) with SQL. I'm not sure how this supports that.

Expand full comment

Hi Benn, this was a good read. Have you considered storing all granular data in a database like Druid with Turnilo/Imply on top? Done this way, you wouldn't have to give up detail (like with roll-ups). Druid is performant as well, especially given we're dealing with granular data.

Expand full comment

Hey Ed! I think Druid could work as the compute engine. You'd still need something that translates a "metrics query" into a query run against that DB though. But, tech like Druid is what makes the possibilities of a metrics layer interesting to me, because rather than precomputing aggregations in a cube like we used to do, we can do them on fly.

Expand full comment

Hi Benn, I definitely agree. Beyond ad hoc queries, for continuous oversight on metrics, I've been able to use the following:

a) An ETL that computes and pushes metrics information into Druid as a "measure"

b) Druid also gives you the ability to apply certain transforms on dimensions and make a measure out of it without a need to re-ingest.

Using this, we're able to get most metrics while retaining the ability to slide and dice the dataset in a myriad of ways. Powerful stuff! The only disadvantage is with things that OLAP can't do well, like cohorts.

Expand full comment

Gotcha, you're using Druid as a metric store, kind of like an ML feature store? Are those metrics accessible across other tools, or if people have these kind mad lib metrics questions, they start with Druid first?

Expand full comment

Yes, Druid has answers to everything (except cohorts). Its quite popular across marketing, operations, product, quant research - so it does well on the accessibility front. Just the vanilla kind, no ML :) That being said, I do agree on opportunity to productize this further.

Expand full comment

Hi Benn! Thanks for the nice article. Do you think tools like Atscale and Kyligence are worth considering for solving the universal semantic/metric layer problem ? If yes, then how successful are they ?

Expand full comment

Thanks! I've never used Atscale (and I don't know at all how successful it is), but as best I can tell, it gets fairly close to this. My guess is that it's trying to do a bit too much. It appears to be combining dbt and a metrics layer (so, similar to LookML in that regard), which I don't think is quite what's needed here. It also appears to expose datasets for exploration (you've got dimensions and measures and whatnot), which I think is often more flexibility than what people want. Because of what I talked about here, I see the narrow interface - choose metric, choose filter, choose aggregates - as a feature. https://benn.substack.com/p/self-serve-still-a-problem

Expand full comment

I work in microsoft data stack, in MS world(power bi, analysis service tabular model etc) there is DAX(data expression) language layer which is certainly similar to metrics layer proposed here, if we could come up a universal open source language similar to DAX but accessible to all BI tools, that would be wonderful! DAX is pretty solid language.

Expand full comment

Great insights, Benn! I couldn't agree more with the need for a centralized metrics layer. It's fascinating to see how the "modern data stack" is evolving, and you’ve made a compelling case for this missing piece. It reminds me of what we discussed in our blog post about schema evolution and the importance of structuring data upfront to ensure consistency and reliability, check it out here:

https://dlthub.com/blog/next-generation-data-platform.

At dlt, we’ve been tackling similar challenges by enabling automatic schema evolution on write, which saves a significant amount of time and effort in data management—something that could definitely complement the metrics layer you're advocating for. Keep up the great work!

Aman Gupta,

DLT hubTeam

Expand full comment