Hey Benn, awesome article as always! Not sure if you got my email - I have a draft of an article that I want to publish on the Medium publication Towards Data Science on how to fine-tune a LLM by using your content to teach a LLM to write like you. But, I need your permission. It seems like you're pretty responsive through comments so I thought this would be a good way to reach you :) anyways, I emailed you from mathewnxwang@gmail.com if you want to check it out or if you're already down feel free to just reply to this.
I think (especially for tech companies) the data driven approach is starting to just feel more comfortable. It is what you are "supposed to do." If you are in tech/data it is easy to sit down at your keyboard and query a database looking for something to support your theory. (Or ask someone else to do this for you). It feels harder to talk to a bunch of customers, record the results, and study and make sense of what they said... AND that also feels less sophisticated like something "anybody" could do.
People want to see predictive models and fancy charts not tabulated customer data manually and try to find signal in a bunch of noise - when a signal might not even exist.
That seems like the rough trend to me, where a kind of clinical, quantitative approach to things became the norm, and we're swinging back away from that now. It feels most apparent in politics, honestly, where we had peak polling and stuff in the Obama years and 2016. Though that stuff is there, there's a drift back towards paying more attention to vibes. A lot of companies seem to be moving in roughly that same direction.
Reminds me of when we tried to do root cause analysis anomalies in business metrics in a previous company. In principle, it sounds great. We're going to let you know when your revenue/engagement/traffic is lower than expected and why. Well, when those anomalies are not due to data quality (which is some of the time), it turns out this is extremely difficult. The odds that the reason why something changed is actually captured in YOUR data is lowish. It only really would work in a closed system and business metrics are most certainly not that.
This is a longer rant for another day, but I think that explains a big part of what makes data stuff hard (or, more cynically, not as useful as we thought it would be). We've all been sold on the idea that the answer is in your data, if only you look hard enough. And we've all been sold tools that are built on that premise, that if you're able to drill into something just so, you'll find the answer you're looking for. But...the answer probably isn't in there? That seems like how we should be thinking about the problem - what would we do if we assumed that the data didn't contain what we wanted to know?
*quickly skims over own take on the mess* Phew, we wound up in mostly similar directions.
I love how you point out that data-driven culture is just debate-driven culture where people club each other charts and tables instead of debate arguments. In UX I often have to sit at a prioritization meeting and one team's got a list of 6-week projects with $X million in blocked deals, while we've got a list of 6-week "our customers complain this is hard, unknown ROI" projects. And "the culture" is more about how those two lists are merged, balanced, then executed.
Maybe this is a grass is greener thing, but I think I've become more and more of a believer that user researchers are the people who get this right. But we miss that, because data people (oops) have culturally steamrolled them in the debates like the one you're talking about. That might not happen in individual meetings, but the broader zeitgeist definitely seems to see one as more rigorous - and maybe legitimate? - than the other. And I think that whole opinion might be backwards.
Maybe metrics are mostly valuable not for operating and growing things, but for pricing them? "Pricing" in a general sense.
Imagine there is a metric of smoking: nicotine flow through smoker's body. It can be measured per hour, per day, per year, per lifetime, averaged in the demographic, etc. Knowing how much did the screen viewer smoked in the last day/hour could in theory be used in programmatic advertising for ads bidding. On the scale of years, it could help doctors in cost-benefit analysis ("pricing", risk stratification): whether they should schedule lung cancer screening for the patient, for example. (I leverage your example of medication intake from your previous post: https://benn.substack.com/p/insight-industrial-complex)
An insurance company uses person's smoking history to price their insurance policy. A pharma company or a medical lab uses cohort-level smoking and trend to assess the ROI of developing capacity for treatment of cancer or another smoking-caused disease.
That all seems like the dream (for the company anyway, and maybe the nightmare for everyone else), though for better or for worse, nobody seems able to pull it off. I'm friends with some folks who worked at an insurance company that started with a very similar premise, where they thought that they could use metrics like these to provide more proactive (and therefore cheaper) care. But a lot of that slowly faded away, and they became more or less a traditional insurance company. The predictive part just never really worked well enough to make that much of a difference. But, we could get there at some point, I suppose.
> The advance of big data means that this element of randomness will
steadily diminish. Insurers can already obtain information through a device
that monitors your personal driving behaviour, and the premium can mirror
more and more exactly the losses which will result from that behaviour. As
insurance becomes precisely tailored to the individual, and the element of
randomness is reduced, it ceases to be insurance. As more data for medical
diagnostics become available we will progressively know more and more
about the health prospects of any individual. And as Alexa reports back to
her employers, more and more data about everything become available.
When risks become certainties they cease to be insurable. For this reason,
most countries, now including the United States, severely limit the ability of
insurers to select their policy-holders or differentiate their premiums. This
limits the scope for actuarial calculation of premiums based on probabilistic
assessment of frequencies and returns insurance to a system of reciprocal
assistance within the community.
But what I meant is the general approach. There are relatively few types of economic decisions that are digitised well in the current economy: advertising (ad auctions), decisions in supply chain and logistics, some more. BI and reporting as we know them serve "narrative protection" in Kay and King's parlance: see https://app.heymaven.com/discover/27623. BTW, that note reminded me about Kay and King's idea that "figuring out what is going on here" is abductive, not deductive reasoning: maybe that's another thread the industry should pull on to make business analytics work better.
For me, despite the noise, I think it's a valid reflection point for data teams.
Analytics work is often rooted in Data-Driven Initiatives, stomached in order to pay the bills. A top level OKR I was involved in: Invest heavily in data. I think about that a lot.
With that in mind, my read of founder mode, and relates to where Nike got it wrong, is for the CEO to keep a strong product sense, and fight for it.
People engaged in The Founder Mode are less likely to be disoriented by busy-work. Busy-work that might distract from the reason the company exists.
Disoriented is the takeaway concept for me.
I feel like this is the Nike story. Disoriented by short terms goals, losing sight of the product (American exceptionalism for your feet), losing the customer.
I think that that's sort of the subtext of the whole founder mode thing - it's basically an argument for "go with your gut, and go fast." Which I kind of agree with? Like, I think the PG essay is way too biased towards believing that works often, which I don't think it does, but I think it's often your best bet?
To obnoxiously quote myself, I think it's as much as what last week's post was about as this week:
Instead, the people who are best able to run the machine are probably psychopaths, for better or for worse. They are people who are convinced that they know how the machine works. They don’t look at a bunch of levers and wonder what to do, nor do they try to experiment with every switch. They look at the dials and just decide—based on intuition? Delusion?—how to turn each one. To people of this sort, the machine isn’t stressful, because to them, the levers are labeled.
That, of course, blows up a lot of machines; you are unlikely to simply guess the right way to configure it. But both indecision and erratic action blow up a lot of machines too—more, probably, because at least guessing a plan and sticking to it could work. Plus, if you’re going to commit to running the machine for a while, you can save yourself a lot of anxiety by picking a set of switches and doing whatever it takes to flip those. It is important to configure the machine well, and you might as well tell yourself that you know how to do it.
There is a kind of interesting thing in that post to me (which sadly might turn into something longer), where he says in a footnote, "Or managers who aren't founders will decide they should try to act like founders."
There's all this talk about what "founder mode" is or whatever, and how founders *are* different than his idea of managers in practice. Which, ok, sure. But that lines suggests that he thinks that managers *can't* act like founders, which is real trap, because then he's saying 1) manages do bad things, and 2) managers can't try to do good things. So my question is, 1) huh? and 2) why not?
Totally right, it was the first article I read from you and came back several times. Working long rather in Best of Breed/vendor context I started to learn more about the MDS, which is still faszinating. While vendors with an End to End approach often don't understand the idea of a metric layer, forbmany companies with complex data stacks it is still an interesting idra.
I think most people working in this world want it to exist or happen, but it comes down to what you touched on in this post and many others, which is culture, which means people. (Side-note I keep having the "Mean Girls" quote around this about "fetch" never happening) The technology has largely been there for it to a certain extent for some time.
I think my view is that's more of a collective action problem than culture. People want standardization, and everyone would be better off with some form of it, but the incentives aren't there for it. Which I think it a bit more "solvable" than culture, because a really popular tool could compel everyone to standardize around it. But I'm not sure we'll get that really popular tool, so, we stay stuck in the xkcd comic forever.
I never understood the difference between dbt models and metrics. Just another tech learning curve with unclear boundaries between the two.
The only justification for such layer is that it contains a clear contract (time based numeric values). This is perfect for LLMs to apply natural language queries on top of them.
But again, still not the 10x in value that justifies an additional layer.
Eh, I will say, I do still wish that this was a thing that we could have. I see it as fairly distinct from dbt, actually. In short, dbt is a pantry of ingredients; metrics are final dishes that people can consume. Though you could prepare a bunch of dishes and leave them in your pantry (ie, put metrics in dbt core as a bunch of tables), it's pretty painful do to that. Metrics are the point when the things fan out into a million different combinations, so they're really hard to manage. (In a way, I guess you could kind of call it a contract for data consumption though? The typical data contracts make sure raw data gets clean; this tries to make sure clean data is aggregated in a standard way.)
Your last sentence summarizes it perfectly - a standard way to aggregate clean data. I wouldn't ask for additional budget on top of dbt (or Montara 😉) just for that.
They are shouldn't demand additional budget on top of database or data warehouse, not anymore as stored procedures weren't billed separately in good ol' RDBMS. Before VC invented the idea of slicing up the data stack into dozens of tiniest layers all owned by different SaaS providers.
That's one of the really tricky parts with this. In a "pay for your db's compute" world, charting for queries against a semantic layer feel like you're getting double charged.
Yeah it's definitely that and not posts calling to "disband the analytics team."
Though fwiw the actual content of that post (be humans and not robots, be more adversarial, argue on theories as much as hard numbers) is aging pretty well. It seems like the notion of operating on heuristics and taking informed (but not perfectly calculated) risks have been circling around the blogosphere quite a bit the last few months.
If I wanted to wildly speculate (and I will for fun since I've got 20 minutes before happy hour), I'd argue that a lot of people who normally want exact numbers started thinking about it when the Democratic Party was debating pushing Joe Biden out of the race. They were grappling with trading known and perfectly calculated risk with an unknown process or candidate that could have high potential upsides but also came with an unknown amount of risk. Klein, Yglesias, and the started talking in potential distributions (instead of known distributions) in ways that they hadn't previously.
Nate Silver's new book also seems to be getting at similar ideas.
Yeah, that seems like the rough evolution to me. We got excited about "big data," and the first people who did it 1) had really valuable data (eg, FB, Google, etc), and 2) were building models relative to such crude alternatives (eg, Nate Silver vs punditry about yard signs) that we tried to apply the same sort of of processes to everything. But that type of thinking didn't really work in more general cases. And so now we have to rethink what this whole enterprise is good for, and how we use the things we've made. Which, clickbait headlines aside, seems like progress?
Hey Benn, awesome article as always! Not sure if you got my email - I have a draft of an article that I want to publish on the Medium publication Towards Data Science on how to fine-tune a LLM by using your content to teach a LLM to write like you. But, I need your permission. It seems like you're pretty responsive through comments so I thought this would be a good way to reach you :) anyways, I emailed you from mathewnxwang@gmail.com if you want to check it out or if you're already down feel free to just reply to this.
Hey, thanks! And just responded to your email. Sounds like a bot that's a lousy writer, but by all means, publish away.
I think (especially for tech companies) the data driven approach is starting to just feel more comfortable. It is what you are "supposed to do." If you are in tech/data it is easy to sit down at your keyboard and query a database looking for something to support your theory. (Or ask someone else to do this for you). It feels harder to talk to a bunch of customers, record the results, and study and make sense of what they said... AND that also feels less sophisticated like something "anybody" could do.
People want to see predictive models and fancy charts not tabulated customer data manually and try to find signal in a bunch of noise - when a signal might not even exist.
That seems like the rough trend to me, where a kind of clinical, quantitative approach to things became the norm, and we're swinging back away from that now. It feels most apparent in politics, honestly, where we had peak polling and stuff in the Obama years and 2016. Though that stuff is there, there's a drift back towards paying more attention to vibes. A lot of companies seem to be moving in roughly that same direction.
Reminds me of when we tried to do root cause analysis anomalies in business metrics in a previous company. In principle, it sounds great. We're going to let you know when your revenue/engagement/traffic is lower than expected and why. Well, when those anomalies are not due to data quality (which is some of the time), it turns out this is extremely difficult. The odds that the reason why something changed is actually captured in YOUR data is lowish. It only really would work in a closed system and business metrics are most certainly not that.
This is a longer rant for another day, but I think that explains a big part of what makes data stuff hard (or, more cynically, not as useful as we thought it would be). We've all been sold on the idea that the answer is in your data, if only you look hard enough. And we've all been sold tools that are built on that premise, that if you're able to drill into something just so, you'll find the answer you're looking for. But...the answer probably isn't in there? That seems like how we should be thinking about the problem - what would we do if we assumed that the data didn't contain what we wanted to know?
*quickly skims over own take on the mess* Phew, we wound up in mostly similar directions.
I love how you point out that data-driven culture is just debate-driven culture where people club each other charts and tables instead of debate arguments. In UX I often have to sit at a prioritization meeting and one team's got a list of 6-week projects with $X million in blocked deals, while we've got a list of 6-week "our customers complain this is hard, unknown ROI" projects. And "the culture" is more about how those two lists are merged, balanced, then executed.
Maybe this is a grass is greener thing, but I think I've become more and more of a believer that user researchers are the people who get this right. But we miss that, because data people (oops) have culturally steamrolled them in the debates like the one you're talking about. That might not happen in individual meetings, but the broader zeitgeist definitely seems to see one as more rigorous - and maybe legitimate? - than the other. And I think that whole opinion might be backwards.
This is an awesome post :)
Maybe metrics are mostly valuable not for operating and growing things, but for pricing them? "Pricing" in a general sense.
Imagine there is a metric of smoking: nicotine flow through smoker's body. It can be measured per hour, per day, per year, per lifetime, averaged in the demographic, etc. Knowing how much did the screen viewer smoked in the last day/hour could in theory be used in programmatic advertising for ads bidding. On the scale of years, it could help doctors in cost-benefit analysis ("pricing", risk stratification): whether they should schedule lung cancer screening for the patient, for example. (I leverage your example of medication intake from your previous post: https://benn.substack.com/p/insight-industrial-complex)
An insurance company uses person's smoking history to price their insurance policy. A pharma company or a medical lab uses cohort-level smoking and trend to assess the ROI of developing capacity for treatment of cancer or another smoking-caused disease.
In general, all these examples of "pricing" can be viewed as computing "Free Energy reduction" (https://engineeringideas.substack.com/p/gaia-network-an-illustrated-primer) when conditioning some model of reality by the given metric value.
That all seems like the dream (for the company anyway, and maybe the nightmare for everyone else), though for better or for worse, nobody seems able to pull it off. I'm friends with some folks who worked at an insurance company that started with a very similar premise, where they thought that they could use metrics like these to provide more proactive (and therefore cheaper) care. But a lot of that slowly faded away, and they became more or less a traditional insurance company. The predictive part just never really worked well enough to make that much of a difference. But, we could get there at some point, I suppose.
Yeah, my example wasn't meant to be completely realistic, as of today. Regarding insurance in specific, there is an intersting passage from Kay and Kind's "Radical Uncertainty" (https://www.amazon.com/Radical-Uncertainty-Mervyn-King/dp/1408712601):
> The advance of big data means that this element of randomness will
steadily diminish. Insurers can already obtain information through a device
that monitors your personal driving behaviour, and the premium can mirror
more and more exactly the losses which will result from that behaviour. As
insurance becomes precisely tailored to the individual, and the element of
randomness is reduced, it ceases to be insurance. As more data for medical
diagnostics become available we will progressively know more and more
about the health prospects of any individual. And as Alexa reports back to
her employers, more and more data about everything become available.
When risks become certainties they cease to be insurable. For this reason,
most countries, now including the United States, severely limit the ability of
insurers to select their policy-holders or differentiate their premiums. This
limits the scope for actuarial calculation of premiums based on probabilistic
assessment of frequencies and returns insurance to a system of reciprocal
assistance within the community.
But what I meant is the general approach. There are relatively few types of economic decisions that are digitised well in the current economy: advertising (ad auctions), decisions in supply chain and logistics, some more. BI and reporting as we know them serve "narrative protection" in Kay and King's parlance: see https://app.heymaven.com/discover/27623. BTW, that note reminded me about Kay and King's idea that "figuring out what is going on here" is abductive, not deductive reasoning: maybe that's another thread the industry should pull on to make business analytics work better.
Should've called this "Founder Mode"!
lol I came back here to say this.
Paradim shift!
Paradim Shift!
For me, despite the noise, I think it's a valid reflection point for data teams.
Analytics work is often rooted in Data-Driven Initiatives, stomached in order to pay the bills. A top level OKR I was involved in: Invest heavily in data. I think about that a lot.
With that in mind, my read of founder mode, and relates to where Nike got it wrong, is for the CEO to keep a strong product sense, and fight for it.
People engaged in The Founder Mode are less likely to be disoriented by busy-work. Busy-work that might distract from the reason the company exists.
Disoriented is the takeaway concept for me.
I feel like this is the Nike story. Disoriented by short terms goals, losing sight of the product (American exceptionalism for your feet), losing the customer.
I think that that's sort of the subtext of the whole founder mode thing - it's basically an argument for "go with your gut, and go fast." Which I kind of agree with? Like, I think the PG essay is way too biased towards believing that works often, which I don't think it does, but I think it's often your best bet?
To obnoxiously quote myself, I think it's as much as what last week's post was about as this week:
Instead, the people who are best able to run the machine are probably psychopaths, for better or for worse. They are people who are convinced that they know how the machine works. They don’t look at a bunch of levers and wonder what to do, nor do they try to experiment with every switch. They look at the dials and just decide—based on intuition? Delusion?—how to turn each one. To people of this sort, the machine isn’t stressful, because to them, the levers are labeled.
That, of course, blows up a lot of machines; you are unlikely to simply guess the right way to configure it. But both indecision and erratic action blow up a lot of machines too—more, probably, because at least guessing a plan and sticking to it could work. Plus, if you’re going to commit to running the machine for a while, you can save yourself a lot of anxiety by picking a set of switches and doing whatever it takes to flip those. It is important to configure the machine well, and you might as well tell yourself that you know how to do it.
There is a kind of interesting thing in that post to me (which sadly might turn into something longer), where he says in a footnote, "Or managers who aren't founders will decide they should try to act like founders."
There's all this talk about what "founder mode" is or whatever, and how founders *are* different than his idea of managers in practice. Which, ok, sure. But that lines suggests that he thinks that managers *can't* act like founders, which is real trap, because then he's saying 1) manages do bad things, and 2) managers can't try to do good things. So my question is, 1) huh? and 2) why not?
Totally right, it was the first article I read from you and came back several times. Working long rather in Best of Breed/vendor context I started to learn more about the MDS, which is still faszinating. While vendors with an End to End approach often don't understand the idea of a metric layer, forbmany companies with complex data stacks it is still an interesting idra.
Yeah, it is a thing that I still *want* to exist, but I'm not sure that it ever really will.
I think most people working in this world want it to exist or happen, but it comes down to what you touched on in this post and many others, which is culture, which means people. (Side-note I keep having the "Mean Girls" quote around this about "fetch" never happening) The technology has largely been there for it to a certain extent for some time.
I think my view is that's more of a collective action problem than culture. People want standardization, and everyone would be better off with some form of it, but the incentives aren't there for it. Which I think it a bit more "solvable" than culture, because a really popular tool could compel everyone to standardize around it. But I'm not sure we'll get that really popular tool, so, we stay stuck in the xkcd comic forever.
I never understood the difference between dbt models and metrics. Just another tech learning curve with unclear boundaries between the two.
The only justification for such layer is that it contains a clear contract (time based numeric values). This is perfect for LLMs to apply natural language queries on top of them.
But again, still not the 10x in value that justifies an additional layer.
Eh, I will say, I do still wish that this was a thing that we could have. I see it as fairly distinct from dbt, actually. In short, dbt is a pantry of ingredients; metrics are final dishes that people can consume. Though you could prepare a bunch of dishes and leave them in your pantry (ie, put metrics in dbt core as a bunch of tables), it's pretty painful do to that. Metrics are the point when the things fan out into a million different combinations, so they're really hard to manage. (In a way, I guess you could kind of call it a contract for data consumption though? The typical data contracts make sure raw data gets clean; this tries to make sure clean data is aggregated in a standard way.)
But, is that worth it? Seems like maybe not.
Your last sentence summarizes it perfectly - a standard way to aggregate clean data. I wouldn't ask for additional budget on top of dbt (or Montara 😉) just for that.
Julian Hyde has very clear thinking on this topic IMO: https://www.datacouncil.ai/talks/cubing-and-metrics-in-sql.
They are shouldn't demand additional budget on top of database or data warehouse, not anymore as stored procedures weren't billed separately in good ol' RDBMS. Before VC invented the idea of slicing up the data stack into dozens of tiniest layers all owned by different SaaS providers.
That's one of the really tricky parts with this. In a "pay for your db's compute" world, charting for queries against a semantic layer feel like you're getting double charged.
I'm here for your defense of pie charts at Coalesce 2023 and that defense alone--the metric layer was an interesting idea though
Pie charts
, they are not so bad.
(But also, I talked at Coalesce every year, gave that talk, and am *not* talking this year. Just a coincidence, I'm sure...)
Yeah it's definitely that and not posts calling to "disband the analytics team."
Though fwiw the actual content of that post (be humans and not robots, be more adversarial, argue on theories as much as hard numbers) is aging pretty well. It seems like the notion of operating on heuristics and taking informed (but not perfectly calculated) risks have been circling around the blogosphere quite a bit the last few months.
If I wanted to wildly speculate (and I will for fun since I've got 20 minutes before happy hour), I'd argue that a lot of people who normally want exact numbers started thinking about it when the Democratic Party was debating pushing Joe Biden out of the race. They were grappling with trading known and perfectly calculated risk with an unknown process or candidate that could have high potential upsides but also came with an unknown amount of risk. Klein, Yglesias, and the started talking in potential distributions (instead of known distributions) in ways that they hadn't previously.
Nate Silver's new book also seems to be getting at similar ideas.
Yeah, that seems like the rough evolution to me. We got excited about "big data," and the first people who did it 1) had really valuable data (eg, FB, Google, etc), and 2) were building models relative to such crude alternatives (eg, Nate Silver vs punditry about yard signs) that we tried to apply the same sort of of processes to everything. But that type of thinking didn't really work in more general cases. And so now we have to rethink what this whole enterprise is good for, and how we use the things we've made. Which, clickbait headlines aside, seems like progress?