If data is a product, what is production?

Sep 9, 2022

We can’t build something unless we know what it is.

19 Comments

Sep 9, 2022

Yes Benn! We haven't worked together at a company.. but as a former CFO working with IT/developers on the reports/dashboards/queries that you have described above, it is like you just described 10 years of our separate/together joint life! 🤣😭🤘

We all wonder how many decisions at a company are being informed by inaccurate reports from 3 years ago built for a 1 off purpose... Ugg!

Expand full comment

Reply (1)

Benn Stancil

Sep 9, 2022

And that's the scary part. Surely there out there - but we have no idea where.

Expand full comment

Reply (1)

Ernest Prabhakar

Sep 10, 2022

Alas, I actually came up with a dashboard that showed how many decisions were made based on inaccurate data -- but then it self-destructed!

Expand full comment

Reply (1)

Benn Stancil

Sep 11, 2022

hate it when that happens

Expand full comment

Nick Zervoudis

Oct 14, 2022

As I was reading through this post (which I really enjoyed, thank you for writing it!) I was trying to relate the scenarios you describe to my current and past work. Upon reflection, it seems that the model described is more applicable to reactive data teams (which is many if not most of them!).

If the data team is the answer-producing black box that business stakeholders come to every time they need an answer, the observations you make about things being one-off or in that muddled we-built-it-as-a-one-off-but-people-want-it-updated area make total sense. But creating products (and taking them into production) should be more proactive and collaborative than that, or at least aspire to be, no?

In my last team, we had a very explicit set of products we set out to build. They weren't in production from day one, but the vision was one that was the joint product between the commercial leadership team and our department. Instead of waiting for different business teams to come with their queries (almost all of which would end up being answered with one-off projects), we set out to create reusable, extensible data assets that standard models could be built on (and that were then also used for some one-off projects, or given to local teams who wanted to do one-off work). What was production grade* (reliable, refreshable, quality-assured etc.) was the underlying asset (standard data models, standard ML models), which was where the team spent most of their time on.

Of course, many of the challenges you describe crept in anyway, and the products weren't perfectly... product-like from the start (I wrote about this last month, if you're interested: https://bit.ly/productisation), but they definitely weren't a continuous and recurring pain.

Going from reactive to proactive is hard, don't get me wrong. It requires business partners to change how they work, and because it requires analytics translators and data product managers to occupy the chasm between data and business successfully, and proactively identify and explore areas where these sort of product opportunities might exist.

I started off writing this comment thinking I'd be outlining where I disagree with you, or at least where I don't think the observations you've made apply. Thinking it through, what I think I've done above is actually just fill in some blanks: If a data team is stricter about defining production, and as a result commits more but less often, then it has the luxury (and probably the prerequisites too) to be much more proactive.

Expand full comment

Reply (1)

Benn Stancil

Oct 17, 2022

I’m 100% behind this. In a lot of ways, I think this type of “product” is the best thing we can learn from product teams - we should make proactive choices about what to build, and recognize that we can’t support everything that people ask for. Just as I can’t expect some product I use to immediately jump up to address any feature request (and I fully accept when they’re like, nah, we’re just not gonna do that), I think data teams should be similarly stubborn.

But, as you say, much easier said than done. And I think that’s the crux of it - it’s one thing to define production and try to build data products in the way you describe. It’s quite another when suddenly you’re getting asked important questions, you want to help, but those questions don’t fit into the scope of what you wanted production to be. That’s where I think we get ourselves in (understandable) trouble. I don’t have a great answer for how to deal with that, other than being a lot more explicit about things that *aren’t* meant to be reused.

Expand full comment

Reply (1)

Nick Zervoudis

Oct 21, 2022

The approach we've gone with (and still trialling, tbh, but with senior sponsorship) is to set reasonably clear boundaries between 'standard product', 'new feature for existing standard product', and 'custom solution' (which may evolve into a future product). Each bucket comes with its own expected time to delivery (assuming it gets prioritised), and also expected $$$ of value it is expected to bring (i.e. a sort of "you're probably not getting a custom solution if it's not gonna bring at least X digits in value" warning). This puts the onus on BD/business to quantify value (even if it's potential value), and sets expectations upfront instead of having to have one-off conversations that lead to disappointed clients (or salespeople)

Expand full comment

Reply (1)

Benn Stancil

Oct 24, 2022

I think the time piece is really important. I've got a longer half-baked piece about this somewhere, but I think that's really the key: We never specify how long something should be kept around, and most people asking for stuff don't specify how long they _want_ it around. If we actually answered those two things, I think we could get pretty far.

Expand full comment

Reply (1)

Nick Zervoudis

Oct 25, 2022

Ah, we're talking about slightly different things here - I meant when we can pick something up, rather than how long it'll be supported for. But for sure I agree with you (and quite enjoyed the idea of self-destructing dashboards)

Most of my current users are external, paying customers - so it's generally very clear when someone wants continued access and refreshes ($$$) versus when something is a one off. Where the waters are (or could get) more muddy, however, and scope creeps in, this sort of clear, upfront definition is essential.

Expand full comment

Jason

Sep 19, 2022

Upvote for self-destructing or aggressive janitors (see https://www.linkedin.com/feed/update/urn:li:activity:6966069137496293376?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A6966069137496293376%2C6966258477782433792%29)... combined with telemetry and lineage (very much easier said than done).

Not sure if we can find inspiration from the design world, but I certainly see very similar challenges. Lots of ephemeral assets and loose boundaries around "production". Perhaps Figma (aka Adobe) has an answer for us.

Expand full comment

Reply (1)

Benn Stancil

Sep 19, 2022

I like the "this will self destruct" if for no other reason because it makes you paranoid about properly organizing things in other ways. A bit of a tangent, but I always actually wanted Slack to work this way. I think it's great as an ephemeral chat app. But when you treat it as the way things get documented, it's terrible. If people knew that what was in Slack was going to go away in a week, yeah, you lose some history - but you gain a habit of properly writing down the things you want to keep a history of.

Expand full comment

Stefan Krawczyk

Sep 15, 2022

Great post.

Two thoughts:

(1) This reminds me of the problems associated with providing a centralized feature store (managing SCD type II data) to make it easy for people to build machine learning models. Without proper versioning constructs, and SLA tiering, it's hard to set expectations for everyone involved and it's easy for it to devolve into a mess as you describe...

(2) Is part of the problem because we don't have the lineage & telemetry insights to determine what's actually running and whether it's being used or not? What if you could get column level lineage from source to dashboard and then display this information as to what "paths" are being used and by who? Would that help catch where "production" expectations are misaligned?

Expand full comment

Reply (1)

Benn Stancil

Sep 15, 2022

I think 2 is necessary but not sufficient? It would definitely be a huge help to see what's being used. But it seems to me that that would immediately lead to a question of what do you do about it? So we find out stuff is used all over the place, and is messy. We can tell people not to do it...but they probably will, because they've got jobs to do that are more important than our rules.

Expand full comment

Reply (1)

Stefan Krawczyk

Sep 15, 2022Edited

Yep that's fair. Where my mind wanders is using it to manage expectations, and determine where to focus:

1. are the stakeholders that matter really using what they asked you for? Nothing worse than fulfilling a PM's desires and finding out they don't actually need it.

2. User comes to you with a question that supposedly uses some known company metric. You ask where they got it from and trace its lineage. "Look, you're using table XXX that comes from Bob's ad-hoc analysis, this isn't a tier-1 support level issue." Most tables (look up where metric is used) use YYY as their source -- consider using that instead. Or maybe you can flag this to the user with a self-service portal so they don't even need to bother you in the first place...

3. You use it in a CI/CD step so that people can understand what they're modifying or impacting, or adding to...

4. Oh look we have a lot of people doing the same downstream work -- if we centralize it we'll be able to kill all these other things... or the opposite, everyone is doing something different, we shouldn't spend more time here.

Expand full comment

Reply (1)

Benn Stancil

Sep 15, 2022

2 is really interesting point actually. That happens so often where people end up having different numbers, and you spend a lot of time trying to track down what explains the difference. If you could quickly skip to the end of that process, it'd not only save time, but I suspect it would make it much easier to consolidate towards the right versions. It's one thing to say "use this version because I said so and I just think yours is wrong;" it's another to say "use this version because we can see exactly where these two versions diverged."

Expand full comment

M van der Heijden

Sep 13, 2022

In our company the term certified reports was coined somewhere in 2000, so the product idea is not very new. It would be new if the product thinking is applied to the data entry (take that serious at last) and each of the intermediate steps before the report comes up. Which also is not completely new if you consider that the data mart was supposed to be 'a disposable product'.

Expand full comment

Reply (1)

Benn Stancil

Sep 14, 2022

Yeah, it's definitely not even close to a new concept. The tricky part to me is that nobody really puts it into practice. Sure, you've got these certified things, or gold datasets, or whatever, but then you also have a bunch of half-production assets that people still use regularly.

To me, "production" requires two things: A clear definition of what is production, and a clear definition of what *isn't.* We sometimes do the former, but very rarely seem to do the latter.

Expand full comment

akhil

Sep 12, 2022

Hey Benn, This looks incomplete. I mean - Deprecation is easier said than done since - i bet - data teams wont largely find out what to deprecate. (unless costly catalog, governance projects defining ownership, stewardship etc).

Also, "Data Team" itself is difficult to define - are these meant to be central data orgs, embedded BI teams in different business units, CEO office, consultants, etc, data mesh enthusiasts etc. How would they now explain internally - production on the top of the jargons of trusted/gold data assets/tables/reports etc.

I think since Kimball's days - the thought around production, confirmed dimensions etc was always there. It was just painfully slow and costly to implement.

Data teams products, in my opinion, - are more and less - like MVPs, that do not reach production status. May be because the value of the MVP has diminishing quality (if a report tells me that payment issues are reasons for churn, the report's value drastically reduce after the discovery)

(to quote your earlier blog, its actually not web-services world that data products should mimic, but journalistic)

Expand full comment

Reply (1)

Benn Stancil

Sep 12, 2022

I agree with you on the first point about deprecation being hard today. That's a big part of the reason why I think teams should try to put a hard line around production, because then you *can* know what need to be deprecated. To me, a lot of the problems we have stem from that: We don't actually know what's being used and where. If we can make progress on that front (while also setting expectations that most things aren't in production and therefore not meant to be reused), I think we can get a lost closer to making his manageable.

Expand full comment

benn.substack

If data is a product, what is production?