benn.substack

Apr 25, 2021

Hey Petr, thanks for sharing this! This definitely looks similar. I'll see if I can get it running and play around with it.

I'm curious how people use it - is it mostly for developers looking to embed visualizations? For analysts? For people who want to access data that aren't quite comfortable with SQL?

Expand full comment

Pavel Dolezal

Apr 26, 2021

Petr, Maql would be a great way to go and you know we've been huge proponents of it at Keboola. The only issue is that it is fully embedded into the whole GD product. You can't "just run it" as part of your stack. If that would be possible we would have had it every where in Keboola:)) Pavel D.

Expand full comment

Steve

Apr 22, 2022Edited

Benn, your Substack and DBT Slack contributions have convinced me that an open source metrics layer on top of dbt feeding headless bi is the future of the stack. It makes a ton of sense and rounds out the ecosystem. Thanks for your thoughtfulness and willingness to share.

Expand full comment

https://www.youtube.com/watch?v=IHODWM9UOn0

Apr 22, 2022

Thanks! I really appreciate that, and glad it's been useful!

Expand full comment

Saurabh Moody

Oct 19, 2021

Very Sweet Article, Benn. Talking about the real problems and focussing on the Metrics / KPIs, making it easy for Business to be on track.

I'm working on a platform - Alphaa AI - which can enable business to ask questions around Metric / KPI and the platform converts it into SQL Query and gets the data.

Would be great, if we can collaborate on the topic.

Here is a little demo:

PS: happy to do a POC around this to make this concept come alive.

Expand full comment

Saurabh Moody

Oct 19, 2021

Any idea, about the products or solutions, solving the same problem?

Expand full comment

Oct 19, 2021

Hey Saurabh, thanks for sharing this. I don't have a ton of opinions on the tools out there today, actually. They're all pretty new (I think only Transform is publicly available) so it's hard to say which ones work and which ones don't. I'm glad that people are taking different approaches though (some SQL based, some more configuration based, etc). My general bias is towards the SQL based ones, though that could prove to be wrong.

Expand full comment

deletefrom

What would be the difference between a Metrics layer and the old concept of Materialized Views? In my view, SQL queries plus and understandable view schema is enough to solve this issue if you are in the relational world. We also implemented a similar idea with Elastic + Logstash to create a "table view" of the KPIs. Then, the business guy imports such table view into Power BI or Qlick View, for playing around data.

Expand full comment

I don't think materialized views would quite get you there because you still have to predefine them like you would a table. For instance, if I want every possible combination of revenue by time internal, segment, geo, and product SKU, I still have to define the materialized view so that all of those combinations are available.

I think you can maaaaaaaybe solve that with some combination of views + stored procedures that build the views on the fly, though that probably ends up looking a lot like the metrics layer I'm describing. And while you can also get there by having PowerBI et al generate the views, that puts a lot of business logic in the BI tool rather than a globally accessible layer.

Expand full comment

M van der Heijden

Jun 7, 2022

Nice! Quite unaware of the specificness of the problem we were moving to solve this in Data Virtualization, by offering both the parts (week definition =...) and the wholes (production by week=.... (based on week definition).

Expand full comment

James

May 20, 2022

Whatever happened to Supergrain? Their website now positions them as a marketing campaign creation tool and doesn't mention metrics or headless BI. A metrics layer is a sensible idea but I wonder if it is easy to get traction in the market and convince customers with established BI implementations to insert a new layer and re-write all their dashboards.

Expand full comment

May 20, 2022

They changed directions a few months ago. I think they decided there was more opportunity in building marketing tools directly on top of the warehouse than work on metrics, which got crowded quickly. George talked a bit more about it in this thread: https://twitter.com/g_xing/status/1506333772316229638

Expand full comment

Joanna He

Feb 11, 2022

Hi! Benn. Very inspiring article! I have read about it several times.

Coincidentally, before I read your article and learned about Airbnb's work on metrics platforms, my company, Kyligence, helped several companies build their metric platforms. Kyligence serves as the metrics computation engine in our user's cases.

Inspired by your piece, I wrote an article about my understanding the metrics layer. I have also included an example of a metrics platform from our user's experience. https://medium.com/kyligence/understanding-the-metrics-store-c213341e4c25

Expand full comment

Feb 11, 2022

Thanks! And yeah, I wouldn't claim that the idea is entirely new. So much of what we build it just a remix of what other folks have already done.

Expand full comment

Rogier Werschkull

Oct 13, 2021

Good post Benn, totally agree with your POV. I am currently investigating a new open source solution that aims to fill this gap: https://metriql.com/

Worthwhile to take a look at it / update your post based on your findings?

Expand full comment

Oct 18, 2021

Thanks Rogier! And given all that's going on with the space, I might come back to this at some point. I wrote a short follow up about Airbnb's solution to this, so might be worth another one: https://benn.substack.com/p/minerva-metrics-layer

Expand full comment

Dave Mariani

Sep 23, 2021

Hi Benn. Great post and I couldn't agree with you more. As CTO and co-founder of AtScale, we've been working on building what we call a "semantic layer" for several years now. Semantic layers have been at the heart of BI platforms for years (Business Objects Universe, anyone?) so the concept is not new. However, like you, I believe the semantic layer should be separate from the consumption (i.e. BI, ML) tool so that it can be truly "universal" and provide consistency across a wide range of consumers. My co-founder wrote this blog that describes this approach: https://www.atscale.com/universal-semantic-layer/what-is-a-semantic-layer-why-would-i-want-one.

Expand full comment

Sep 27, 2021

Thanks Dave! It'll be interesting to see how this evolves over the next few years. There seems to be a growing consensus around the need for something like this and (as your post alludes to), it seems a lot more achievable if everything's in a data lake vs a bunch of warehouses.

The more I talk about this though, the more I think the key to solving it is getting the onboarding right. Teams seem to agree that there's a need for a universal layer like this; the hard part is making it easy for people to adopt and integrate it. I think whoever figures that out will have a huge leg up in making something like this real.

Expand full comment

Dave Mariani

Sep 27, 2021

You make a great point about onboarding. The biggest impediment to deploying a semantic layer is in building the data model first. I think the key is to have a vibrant community of model contributors and a modeling language that is open and easy to work with (CI/CD is a must). With the popularity of SaaS applications, the good news is that these schemas are fairly well known and lend themselves nicely to pre-built data models. The folks at DBT and Fivetran have done a nice job at creating a community of DBT models.

Expand full comment

Sep 27, 2021

Maybe...those sorts of pre-packaged models have always felt like something that's around the corner, but never seems to arrive. I took a guess as to why a while back: https://benn.substack.com/p/when-are-templates-going-to-happen

Expand full comment

KenneRoce

Sep 22, 2021

I have been looking for this information for a long time, I was very surprised when I found it here.

Expand full comment

Sep 22, 2021

benn dot substack is a land of many wonders

Expand full comment

Sep 22, 2021

or, like, overwritten posts about data stuff, you decide

Expand full comment

Sangamesh patil

Aug 2, 2021

we have solved some of this problem (https://www.thoughtspot.com/trial) where our answers query which is written in natural language and converts query into SQL and renders data in dashboards .. the architecture is similar to what is described as solution here

Expand full comment

Sangamesh patil

Jul 7, 2023Edited

Life comes a full circle .. been following you since this article..

excited to the two companies coming along and working with you .. :)

Expand full comment

crunchtoinfo

May 2, 2021

Hi Benn. Maybe this article shows how Airbnb are solving the problem you describe? https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70

Expand full comment

May 3, 2021

This is pretty close. It nearly solves the exact problem that I think needs to be solved by providing a repository for metics that can be filtered, aggregated, etc in a bunch of different ways. The implementation, however, might not work outside of Airbnb.

To me, one of the reasons this works is because it's integrated so well with the rest of their analytics suite. Most of Airbnb's tools are homegrown, so they can make those integrations pretty seamless. But if you're buying tools from vendors, that doesn't work as well. Each vendor has to build an integration into Minerva's API, and some vendors won't do it, or will do it inconsistently.

One other potential gap as well - I'm not sure how this interacts with SQL-based analysis. They have R and Python APIs, but at a lot of companies, nearly all analytical work starts (and a lot ends) with SQL. I'm not sure how this supports that.

Expand full comment

Apr 29, 2021

Hi Benn, this was a good read. Have you considered storing all granular data in a database like Druid with Turnilo/Imply on top? Done this way, you wouldn't have to give up detail (like with roll-ups). Druid is performant as well, especially given we're dealing with granular data.

Expand full comment

Apr 29, 2021

Hey Ed! I think Druid could work as the compute engine. You'd still need something that translates a "metrics query" into a query run against that DB though. But, tech like Druid is what makes the possibilities of a metrics layer interesting to me, because rather than precomputing aggregations in a cube like we used to do, we can do them on fly.

Expand full comment

May 2, 2021

Hi Benn, I definitely agree. Beyond ad hoc queries, for continuous oversight on metrics, I've been able to use the following:

a) An ETL that computes and pushes metrics information into Druid as a "measure"

b) Druid also gives you the ability to apply certain transforms on dimensions and make a measure out of it without a need to re-ingest.

Using this, we're able to get most metrics while retaining the ability to slide and dice the dataset in a myriad of ways. Powerful stuff! The only disadvantage is with things that OLAP can't do well, like cohorts.

Expand full comment

May 3, 2021

Gotcha, you're using Druid as a metric store, kind of like an ML feature store? Are those metrics accessible across other tools, or if people have these kind mad lib metrics questions, they start with Druid first?

Expand full comment

May 3, 2021

Yes, Druid has answers to everything (except cohorts). Its quite popular across marketing, operations, product, quant research - so it does well on the accessibility front. Just the vanilla kind, no ML :) That being said, I do agree on opportunity to productize this further.

Expand full comment

Hi Benn! Thanks for the nice article. Do you think tools like Atscale and Kyligence are worth considering for solving the universal semantic/metric layer problem ? If yes, then how successful are they ?

Expand full comment