"it looks like the API lets people request metrics over date ranges, with filters, and grouped by different dimensions." This is really important and something I've been thinking about with metrics layers. Some metrics - especially financial metrics - are in the form of ratios, such as ROI. These can only be calculated after the user has selected their dimensions, as the numerator and denominator values of the ratio can vary independently depending on the dimensions selected. In other words they cannot be pre-calculated and stored (unless you try and store all permutations of dimension combinations) So any metrics layer needs to be able to receive the dimensional selection from the end user tool and calculate the metric on the fly. In older days we used to do this by implementing such metrics in the semantic layer or in database views.
Agreed, and my understanding is most metric layer tools can support this in some form. I think tools like Transform do a kind of smart caching where they try not to recalculate everything if they don't have to. But if they need to go back to the underlying data (for cases like this), they will.
Lol, yes that was me too. Thank you. I was looking for a while, and I thought I was overlooking it. Do you know any good, similar substitutes for data quality that are open source? In my research so far, I've found Griffin.
Yeah, I've seen tools like Profisee that are commercial, but some of these licenses for the data governance tools are pretty expensive. Azure has Purview, but the data quality feature on Purview is pretty lacking, so I wanted to use something in addition to it with Nifi. Thank you for your help, this helped me to narrow down my search. I'm going to keep working with Griffin for now.
"it looks like the API lets people request metrics over date ranges, with filters, and grouped by different dimensions." This is really important and something I've been thinking about with metrics layers. Some metrics - especially financial metrics - are in the form of ratios, such as ROI. These can only be calculated after the user has selected their dimensions, as the numerator and denominator values of the ratio can vary independently depending on the dimensions selected. In other words they cannot be pre-calculated and stored (unless you try and store all permutations of dimension combinations) So any metrics layer needs to be able to receive the dimensional selection from the end user tool and calculate the metric on the fly. In older days we used to do this by implementing such metrics in the semantic layer or in database views.
Agreed, and my understanding is most metric layer tools can support this in some form. I think tools like Transform do a kind of smart caching where they try not to recalculate everything if they don't have to. But if they need to go back to the underlying data (for cases like this), they will.
The Minerva API does speak SQL which we use with Superset (see the "Metrics for the Masses" section in https://medium.com/airbnb-engineering/supercharging-apache-superset-b1a2393278bd). Note we're in the process of writing the third blog in the series which will describe this in more detail.
Where can you find the Minerva API? I tried to pip install it within Python, and it's saying the library doesn't exist.
Just in case you weren't the same person asking this on Twitter, https://twitter.com/bennstancil/status/1423727352207581187
Lol, yes that was me too. Thank you. I was looking for a while, and I thought I was overlooking it. Do you know any good, similar substitutes for data quality that are open source? In my research so far, I've found Griffin.
Not open source, unfortunately. I only know of the commercial tools that do this (of where there are a handful now).
Yeah, I've seen tools like Profisee that are commercial, but some of these licenses for the data governance tools are pretty expensive. Azure has Purview, but the data quality feature on Purview is pretty lacking, so I wanted to use something in addition to it with Nifi. Thank you for your help, this helped me to narrow down my search. I'm going to keep working with Griffin for now.
Yeah, Robert said there was bit more SQL under the hood (https://twitter.com/_rchang/status/1389761173982117891), though I still can't quite piece it all together in my head. Looking forward to the next post!