16 Comments

Love your point about LLM’s “averaging previously unsummable things.”

Also, footnote 1 re the futility of teaching bridge is hilarious!

Expand full comment

That LLM thing is slowly becoming the only thing I ever seen to say on this thing, so glad it's not total nonsense...

And yeah, I'm honestly not sure I could design a worse job to have during a global pandemic.

Expand full comment

How do you even quantify good data ? VS Bad ?

Expand full comment

Idk if Benn already has other post about this but this question is really important. I discussed with my friend the other day and mentioned "Yeah, instead of saying GIGO, we actually want to define what is "garbage", what is "in", and what is "out" first. This saying doesn't really help us analytics folks unless we can make it more concrete and actionable".

Thus, let me go first then. Good data basically are three things

- For number related activity, it match with actual physical things (when I said "I sold 3 can of soda", means I can count the soda that I sold to others or saw that I had less soda in warehouse but more money in my bank).

- For text or non-number related activity, it can be match with actual physical things "I live in Indonesia. Not 'Indonesiaa'" and it's complete.

Expand full comment

I don't disagree with those things, though I think my definition is essentially, data that doesn't change. Certainly, it's good if it's "right," but that seems hard to define as well (eg, what is the "right" retention rate, when retention itself is a malleable concept). I think the most we can hope for is consistency. And specifically, consistency in the face of efforts to make it more accurate.

So like, what is good data? It's data that you can try to improve from its current state, and cannot.

Expand full comment

Eeeh, I don't think you really do? And honestly, I think there's some danger in trying, because the whole thing because an exercise in trying to pass the data quality test. You build a dashboard that says your data is 90% accurate, and, putting aside the question of what that number even means, you probably then start to get twitchy about wanting that number to be higher. But does it matter? Eh, maybe not?

Expand full comment

I love this nugget of truth you slid in here "We haven’t gotten twitchy about broken dashboards because, frankly, people don’t get twitchy about broken things that they don’t care about..." 🤣

Expand full comment

it really feels like half of the stuff data people fix is like this xkcd comic. it's not because it matters, but because something in a dashboard is wrong.

https://xkcd.com/386/

Expand full comment

I always think about it as a chicken and egg problem.

Some companies get stuck in a “is this number right?” phase, and use analytics mainly in informative use cases where poor quality is more likely ok.

Other companies push on quality allowing them to deploy analytics to a lot more critical use cases that more directly impact/interact with a customer.

Was data quality or critical use case first? … or maybe company + data leadership that recognises what could be done if they have the foundation right?

So my guess is that some companies will keep nailing it (both analytics and AI), some will realise they need to invest more into data quality to power the new user-facing analytics (~AI) and some will keep wondering what numbers are right (and be ok with it).

Expand full comment

I think that's fair, though I'm not sure I'd see it as that much of a puzzle. I'm not sure I know of any one who collected a bunch of data, did a bunch of work to make it nice and tidy and clean, and then suddenly found something useful to do with it.

Expand full comment

I see it as:

1. Find the next use case which increases impact

2. Deliver it fully AND on the way strengthen the foundation

3. Repeat

As you say doing just foundation doesn’t work. Similarly just trying to do more without foundation doesn’t work either.

I see the puzzle in creating an approach/strategy/operating model that gets these cycles work well.

Expand full comment

Yeah, agreed that trying to do more without the foundation doesn't really work. But I think that's sort of the trap, where people then think "ah, my problem is not having the foundation," and semi-cargo cult the foundation as the solution.

It's like when people want to make themselves more appealing to date (no idea why this was the analogy I thought of, but, here we are), so they go buy clothes from some trendy store. That's part of the package, but if nobody wants your personality, your clothes aren't gonna help.

Expand full comment

Thanks for writing this post … it’s incredible that this is still a raging debate … think the real problem is it’s hard to do and who takes accountability for it …. And how do you properly measure how good or bad you are ? This latest AI wave is helping to amplify its foundational but you still need to tackle the points above ….does this make sense ?

Expand full comment

Yeah, though I'm not sure I would really agree? There needs to be accountability, sure, but why hasn't there been so far? The easy answer is some tragedy of the commons problems, or that people don't "understand the value of data." Which, maybe? But rather than us (the data people) worrying about who's going to take responsibility for this, it seems like we should be more worried about making it so important that we don't have to berate people into taking responsibility for it in the first place.

For the most part - not always, but usually - valuable things will get the attention they need, if people know that they're valuable.

Expand full comment

Haha I’m not berating anyone …. Ultimately it’s the business who needs to take ownership and speaking as a business person who’s tech literate … then it’s down to leadership and driving visibility of how getting and focusing on the elements of data that are foundational and drive the business that need to be looked after ….agree what is valuable gets attention but with the complexity and growing scale of this item then the business needs help … just my POV 😁

Expand full comment

Oh for sure. My point is that it seems like a lot of data teams fall back on berating other people about the need for them to provide us high quality data, where we expect them to do it because we ask them to. But I think the only way that works is if we show them why it's valuable for them to do that, such that they are incentivized to do it on their own (or that execs will do the berating on our behalf, because they see the obvious value of it). We have to earn leadership buy in, not sell it.

Expand full comment