Twyman's law: Any statistic that appears interesting is almost certainly a mistake.
I feel like a lot of data work is a random search for surprises, performatively using the most complicated tools available, in an organization that isn't prepared to evaluate them once found.
Lately I've been reading Kohavi's book on experimentation, which …
Twyman's law: Any statistic that appears interesting is almost certainly a mistake.
I feel like a lot of data work is a random search for surprises, performatively using the most complicated tools available, in an organization that isn't prepared to evaluate them once found.
Lately I've been reading Kohavi's book on experimentation, which is actually a deep dive into how hard it is to make data-driven decisions. There's a whole hierarchy of evidence and different types of metrics. The org has to be prepared to discard projects that don't test out, for instance, and most Aha! moments have to be followed by months of experimentation and analysis.
What you're probably talking about with reporting is guardrail metrics, which mainly tell us if things are working properly. My 90/10 rule is that 90% of a data system is baseline reports like this, to keep things on track for the 10% which is actionable insights or just higher-level analysis like ML.
There's an interesting question in there to me, about how we want to find interesting things when we do explorations, deep dives, etc. Most of those projects are, in some senses, fishing expeditions for interesting things.
Which sounds bad, though as I type that out, I'm not so sure it is? Interesting and unexpected things can often be useful, even if they aren't the thing we set out to find. In other words, fishing expeditions sound kind of like a form of p-hacking, where we're always looking for the significant result rather than actually testing a hypothesis. But if that interesting thing is real (ie, it's not just noise), that could still be a good thing to know, even if it wasn't what we were initially hunting.
That's a bunch of underbaked ideas, but might be something worth thinking about more.
Twyman's law: Any statistic that appears interesting is almost certainly a mistake.
I feel like a lot of data work is a random search for surprises, performatively using the most complicated tools available, in an organization that isn't prepared to evaluate them once found.
Lately I've been reading Kohavi's book on experimentation, which is actually a deep dive into how hard it is to make data-driven decisions. There's a whole hierarchy of evidence and different types of metrics. The org has to be prepared to discard projects that don't test out, for instance, and most Aha! moments have to be followed by months of experimentation and analysis.
What you're probably talking about with reporting is guardrail metrics, which mainly tell us if things are working properly. My 90/10 rule is that 90% of a data system is baseline reports like this, to keep things on track for the 10% which is actionable insights or just higher-level analysis like ML.
There's an interesting question in there to me, about how we want to find interesting things when we do explorations, deep dives, etc. Most of those projects are, in some senses, fishing expeditions for interesting things.
Which sounds bad, though as I type that out, I'm not so sure it is? Interesting and unexpected things can often be useful, even if they aren't the thing we set out to find. In other words, fishing expeditions sound kind of like a form of p-hacking, where we're always looking for the significant result rather than actually testing a hypothesis. But if that interesting thing is real (ie, it's not just noise), that could still be a good thing to know, even if it wasn't what we were initially hunting.
That's a bunch of underbaked ideas, but might be something worth thinking about more.