29 Comments

Benn. You did not have to come for me like this. It's not OK to attack your friends this way! 😅

1) I think you're spot on about the people & process over tools, however, tools shape the process. Just like Conway's law is that you "ship the org chart," sometimes what you ship creates that org chart. Clothes make the man and tools make the analytics?

2) I think you are making a fallacy regarding the data that Netflix, AirBnB, or Facebook have. Don't forget that at some point in the past, Netflix was a DVD by mail subscription service, AirBnb was some dudes with an air mattress, and Facebook was a way to rank girls' hotness. They didn't always have that volume of data. They used good product insights to build and grow, used data to get bigger and grow faster. In other words, your earlier post about product > data is right, regardless of what your readers say. It's just that data makes for better products if you know how to use it.

3) Knowing how to use data means _wanting_ to use data. You gotta believe there's some "there" there, first. Everything else follows from that.

4) If tools make it easier to understand data, warts and all, does that make it easier for the business to know there's value in data?

5) Why doesn't _everyone_ feel the way you do about Pinterest, I'll never know. Literally every piece of content on the platform is _some_ form of advertisement for _something_. It's a tap straight into the desire center of the consumer brain. Quite possibly the most valuable data set in history, and yet most analysts are like "yeah yeah, it's a social media site." Infuriating.

6) Maybe FB, AirBnb, Uber, etc were just _lucky_ phenomenally, inconceivably, irrationally, mind-bogglingly _lucky_

Expand full comment

1) For sure, and as I said in the second footnote, I think tooling conversations are actually pretty useful, both in and of themselves, and as a means to talk about other things. It's not the end all be all, but they play a very important part in moving things forward.

2) So I made this point too clumsily - I agree with this actually. My intended point wasn't that data was useful at these companies because of their scale; it was that data was helpful because data could dramatically improve how they solved the business problem they needed to solve. That's more about the structure of the business than scale (though eventually, obviously, scale certainly helps). My question then is, how many businesses actually have that characteristic? I'd argue a gas station does not (it might at huge scale, though everything does at huge scale).

3) Yes, though I think the danger now is the opposite: People read hype about data and believe there's a there there when there's not actually much there.

4) For sure, but at some point that feels like a cheat. If our tools for extracting energy from peat got good enough, would peat be a good energy source? Yeah. But you can say that about anything. If our science fair potato batteries were efficient enough, potatoes would be a good energy source. The question to me is it worth it now, with things being as efficient as they are today?

5) It blows my mind.

6) On one hand, yes. There were tons of other efforts to do things like what they did, and they failed. If we just looked at the lottery winners, we'd think they picked their numbers with unbelievable skill. If we look at everyone who bought a ticket, it's obvious they got lucky. (That's an extreme case, and there's there's plenty of skill in growing those businesses; I just mean that luck is certainly a big part of it too.) On the other hand, even if their success is all skill, that doesn't mean data was necessarily a big part of it. Zuck could just be a prophet with incredible product vision (ok, now that I say that, maybe it was data).

Expand full comment

I thought a little more about your gas station analogy over the weekend (I lead an exciting life, what can i say!).

Here's where it falls down: most gas stations make basically $0 on gas sales. Also, gas is a commodity, so there's little differentiation in quality or price between stations. So what a gas station actually does is provide a captive audience for selling snacks. The better you are at optimizing that, the more money your station will make. Therefore, analytics would actually be quite useful in understanding what gets people from the pump into the store (to wit: Buckee's as a "destination").

Maybe i'm just hopelessly optimistic about how data can be transformative. But then again, I would be, wouldn't I?! I think the first mover in the space to understand the value of data is going to be the one that captures the market.

You can have a good product without "data" and a bad product with data, you have to have the culture to build the flywheel, but the flywheel goes: good data->good product insights -> better product -> better data -> better product insights -> better product

Expand full comment

Yeah, I think you're on to something. Someone else reached out to me who's actually worked for gas stations, and said that it was all about "gas, cokes, and smokes." So as a store that sells gas, maybe there's not much you can do. But as a store that's choosing what to sell gas with, and how to market all of those other things, there's probably a lot you can do.

Expand full comment

Re Airbnb: "Some dudes with an air mattress" who decided to hire a data scientist/analyst as one of the first 10 hires (iirc)-- and subsequently invest in building out a world class data team. The data didn't just follow with their growth-- there were critical decisions that allowed that data to be captured/logged/leveraged.

Expand full comment

For sure - I didn't make this point very well. Bullet 2 here is what I meant to say: https://benn.substack.com/p/day-of-reckoning/comment/10423407

Expand full comment

I think it's a seriously excellent post- spot on in all the ways too many decision-makers (and vendors) want (or choose) to overlook.

Expand full comment

Interesting, but if you had that magic dataset about your gas station customers and you couldn't figure out how to leverage the information to make more money, that is just bad business...

1) Sell more than just gas; "cross-sell" into other stuff your customers need while pumping gas. Food, drinks, car repairs, car wash, propane tanks for bbq, etc, etc. If the data can point you to the which of those ancillary goods and services would be best without trial-and-error, that would be amazing.

2) Retain customers by making the experience better for them; if the data can tell which improvements would cause customers to come back or what is causing them to never back come again, that would also be amazing. Are you really are loosing business because your pumps are too slow? because the handles are sticky? because your station smells like sewage?

But the more valuable dataset would probably be on your competition:

- what are their operational costs? margins?

- how long do you need to squeeze them on price before they shut down and you get all the traffic?

Expand full comment

So a couple people who actually work at gas stations reached out, and their answer was very much 1, and not at all 2. They said that customers don't really "retain" in a traditional sense, and that you essentially have to treat every customer as a new one (ie, it's all about how do you attract a new customer, and not about how to keep ones that come once coming back). But, as one person put it, the business of gas station is "gas, cokes, and smokes." You can't really do anything about gas margins, costs, prices, or anything; you're just a price taker. So all the variability is in the other services you have, how good your store is, etc. So it's almost like the gas is just a low margin business that's going to get a certain number of people to consider parking in your store's parking lot; the rest of business is trying to figure out how to get those people into your store.

Expand full comment

How many businesses should have data teams is a pretty close inverse to how many businesses you think look like perfect competition in econ 101.

Expand full comment

Wait, I'm not sure if this a joke or a real point, but I don't think I get it.

Expand full comment

On some level, the job of a data team is to look for opportunities to make an outsized return on an investment for business. In the scenario where you are in business that has been in a competitive equilibrium such that you are a price taker, you should not generally assume that those opportunities exist.

In Moneyball, Michael Lewis takes time out to explain that the A’s only get away with their statistical arbitrage because the rest of MLB was too protected from competition to really need it. The market was a long way from perfect competition, so there were actual $20 bills on the sidewalk.

So if you think more businesses are operating in markets that are mature and competitive, the payoff to data investments are not likely to be at the scale of google or Netflix.

Expand full comment

Ah, gotcha. I'll buy that.

Expand full comment

context is the only prison that punishes when you leave its walls.

Expand full comment

I most of all take this as a warning / wake up call / way to communicate for those that think you should do as the big tech do. If you have some physical product and you don't have a customer base like Wallmart you may not be the type of company to build your business around the data. Meaning that you need to be careful in what you spend to get your data right. While this is true for the tooling end, maybe on the governance end your challenges are greater than those of the big tech, for with the little data you have you need it to be as accurate as can be.

Expand full comment

I don't know that that big tech vs a physical good is dividing line, but generally, yeah, I'd say that companies should be thoughtful about what it means to put a big investment in data, and what they think they might get out of it.

I'm not sure about the governance point - that's an interesting thought.

Expand full comment

New subscriber here but loved the gas station analogy. Thanks for sharing it.

Expand full comment

Thanks, glad you like it!

Expand full comment

As an industry our spending controls just stink, though. There just isn’t enough pressure to be good at it yet, is my guess, plus the linked tweet is dead on. We have a lot of conflict averse people who try to solve people problems with tools.

Expand full comment

On conflict aversion, I think there's something bigger in there, where we try to over-engineer stuff, partly for classic reasons (https://xkcd.com/974/), partly because Silicon Valley things too many things can be solved with tech, and partly because people want to solve things with tech to avoid solving them with people. But that might be a bigger conversation.

On spending controls, sure, but I think that kinda hides the point. Is the problem that we spend too much money on it, or that it's not worth enough to justify any inefficient spend? I get that you can solve that equation by lowering cost or raising value, but if we have to keep saying "it'll all be worth it when it's cheaper!," it starts to feel a lot like motivated reasoning.

Expand full comment

Yes! I’ve often wondered how many data requests are an Oracle-of-Delphi-like fig leaf to make us feel better about impossibly uncertain decisions.

Maybe if we had the guts to admit we always need to make decisions based on imperfect information, we could focus our energy on the marginal cases where there is genuine return-on-data-quality, rather than stressing about stuff that doesn’t actually move the needle.

Expand full comment

I think we've more or less come around to the idea that we have to make decisions based on imperfect information. The thing that seems like is missing is any discussion of the marginal value we get from making a data informed decision vs going with our gut. I'm sure the former is better - but how much better? Is it worth the effort? We can improve a decision isn't enough; we have to be able to improve it enough to justify the cost necessary to improve it by whatever marginal amount we do.

Expand full comment

This week I joined a data Slack group.

It took all of 3 days for it to devolve into influencers and vendors peddling various ideas.

The high water mark was a dbt analytics engineering data influencer who sells SQL books and SQL courses and the vendors put him on their webinars and such, and he is now working on a new product he and his business partner are pitching, which they started drumming up and hyping up in the Slack, as they do on the socials.

However, in the #general channel the very same dbt influencer was crowd sourcing answers about how to do a self-join in SQL, and he was fanning out expensive compute in loops. SQL 101, or if not 101, 102.

That's why this will all fail and it already is. Everyone is an influencer. Everyone just changed their job title.

Low interest rate data teams and gigantic stacks of tables, tables, tables and product and people are over.

The YAGNI data stack is in vogue.

Expand full comment

BRB changing my LinkedIn job title to "data influencer" XD

Expand full comment

True as that may be, that doesn't seem necessarily damning; it just says it's early. I could imagine (though I have no idea if this is true) in the early days of cars, people did all sorts of dumb and dangerous things with them. But cars were valuable enough that we figured it out, and now we have a reasonably well functioning world with lots of cars. (And yea, I get that cars have all sorts of problems; my point is just that cars weren't a hype bubble.)

With data tools, we're figuring it out. Every new thing has problems, it'll attract grifters and inexpert experts and all that. But the more foundational question seems to be, is there real value underneath it? If there is, we get the cloud. If there's not, we get crypto. But the difference there is value, not cost.

Expand full comment

Well, in as much as anything I write is typically misrepresented or muddled around, the fact is that 'Fivetran + dbt' is being shared by Snowflake reps with customers and I know this from several cases and friends/associates there. The storage and compute centers have all the power, in a distant second is BI/viz (no offense, Mode fan here, but you all can charge license fees and interface with revenue centers in a way other MDS categories can't), and in a distant also-ran is everything else which relies on human middleware to configure it as middleware.

Human middleware is the biggest liability and all the MDS tooling promotes increased human middleware bloat.

Alan Cohen's piece on human middleware, from 2012, hits the nails on the head. Single best piece on the Modern Data Stack imho, before the MDS even existed.

https://readwrite.com/if-youre-human-it-middleware-its-time-to-find-a-new-job/

What Cohen likely didn't realize when he wrote this was a decade of Q.E. that has only created more human middleware working with weaker products.

This is all an interest rates story, as I (and others) have long said. It's all more processes and more humans in the loop and more products and more incremental compute to get a number on a dashboard.

I'll need to post a followup blog to crystallize.

Also, Fivetran's bleeding out accounts. They only have roughly 4k accounts. They knew this day would come and it's a reason they purchased HVR and paid 2021 multiples for it.

The next card to fall is dbt Cloud - which many have begun to realize is collecting literally all of their data.

Expand full comment

- I agree that compute centers having the power, because the closer you get to the commodity itself, the better you'll be (AWS > Snowflake > whatever's on top). And agree that BI (or any end-user facing SaaS) has an easier time because you're just selling straight up user applications, which has an easy business model.

- But, I'm not sure why Fivetran is _that_ different than Snowflake. Both are fancy applications that are reselling compute. In Snowflake's case, they offer a DB you could run yourself on bare AWS, but nobody wants to do that; in Fivetran's case, they run scraping APIs and data processing pipelines you could run yourself, but nobody wants to do that either. Structurally, they don't seem that different.

- Which is why I keep bringing up value. It doesn't seem like the gap between the two is actually structural in how they're built, but that your take is that Fivetran fundamentally isn't that useful. Which, ok, but that seems different than one being human middleware and the other not.

- And on the middleware point, that article actually seems *in favor* of something like Fivetran. We used to have to babysit scripts to extract data; now, Fivetran does it on its own (putting aside your gripes about performance, which, not my experience, but "it doesn't work" is both clearly damning and not very interesting, so I'm gonna assume it does).

Expand full comment

Yeah, but there are way better options than Fivetran out there. Nexla, Keboola, Ascend, just writing a pipeline - all have won share away the last few months and keep chipping, and I can name several examples.

More transparent pricing, pricing not based on rows in a fixed normalization schema with Fivetran credits wrapped on top, much larger connector libraries, ability to do the 'T' before to shape data and reduce the in warehouse transformations...plus orchestration built in, plus Reverse ETL built in if you want it.

I work with many customers - Fivetran's had its day among many who have gone down this route for 2-3 years and now feel they've outgrown or want more transparent pricing.

There's a Silicon Valley narrative and a customer narrative. The two are not the same.

Here's a text from an old client of mine this week, who knows my opinions on things. He was done with the credits on top of Monthly Rows at his old company, and at his new one he just ripped this out and will rip more out and they'll just do most of the transforms in the schema of the pipelines, then only use dbt a little for last mile in warehouse, only if needed.

https://imgur.com/a/VNgmpgb

There's a bubble narrative of people who only talk to one another, and then there's the customer narrative. Customers are over it with a lot of this stuff.

Extra headcount, products that display a perception of rent-seeking behavior, tons of joins and transforms...it's all getting examined now and the Great Unwinding is underway.

Expand full comment

All of that's fair, though it seems like a more mundane criticism of Fivetran as just having a less complete product than it being structurally broken in some way. (You could say that the transformation part is somewhat structural, I guess, though I'm not sure why that's more affordable, since it just moves compute from Snowflake to ELT. Though I guess it can be cheaper, since transformations are real-time rather than redundant in batch.)

And I have no doubt that customer narratives aren't always (or often) vendor narratives. I think I'm more optimistic that companies can adjust to that though, and if Fivetran bleeds customers because of pricing like that text, they'll change the pricing.

Expand full comment