I also sometimes wonder if the increased reliance on consultancy firms is a net benefit. I'm from one myself. I'm not speaking against our own company. We do good work and our clients see results. But as you are questioning how data teams work and thinking of how to improve this, it's worth taking into consideration. Especially now with the affordability of the tooling smaller and mid-tier companies are able to set up their own data stack.
As consulting analysts and analytics engineers the greatest perk is that you are able to work on many different problems. You can learn fast and re-apply what you learn across accounts. If you're waiting on access or approval for one account you can switch to a different one and keep on going. Another benefit is that you (often) work with a larger group of various talents that you can rely as a soundboard or for direct support. For some companies it would not be possible to hire the same expertise full-time. Or given the average tenure of people in the space does it warrant investment into in-house staff?
There is however a loss for companies as well. Does working with external teams allow for strategic focus? Long term thinking? How does it impact focus on the most important problems? How does it impact feedback loops and iteration that are so vital for developing healthy data products? We've helped multiple companies hire and train their own team during engagements. But striking the right balance is no easy feat and should be part of the discussion as we are looking to improve as a field.
ooh, that's a really interesting point. There's definitely a lot of consulting work out there now, and I haven't seen try to figure out what that means. Is it good, because it makes data work more accessible? Does it try to cram one-size-fits-all solutions on companies that don't need it? Does it encourage a kind of short-term-ism? Do we all actually need internal data teams? It seems like there's a very big difference between hiring a team vs using consultants (and it goes beyond just the cost), but I haven't seen any discussion of that. That's a good idea for a future post, perhaps.
I'm also a consultant and I think about this a lot as well. One thing I've noticed is that because of the standardization of tooling, consulting bids are more tech-focused e.g. "we're looking for help setting up Looker / Snowflake". A part of me is like this is good because the client has more time to focus on application than technology. But I can also see how the relative ease/price of hiring consultants might induce demand for more unnecessary technology and actually distract from strategic work.
That's a good point too - when tooling becomes more standardized, it seems like there are a lot of implementation specialists to set those tools up. Which might be good, because people could—in theory—then focus on making the tools useful. But it seems like (based entirely on feel) what actually happens is people just try to set up the tool because they think that's what they're supposed to do, and worry *less* about what it's for.
I took over managing Yammer's Vertica cluster in 2013, and I figured out why nodes were going down. The spread daemon, which is basically the control plane for the cluster, was competing with queries for network and CPU, and a few seconds of starvation during a heavy query would trigger a partition event. I found that nice'ing spreadd up to realtime priority eliminated node failures.
Vertica originally recommended separate switches for this traffic, and this type of hiccup made it hard to simply switch to cloud or generic servers with one network interface. The cloud at that time also lacked data-intensive instance types -- I used to test Azure instances every few months to see if they could even get near our I/O specs, and I never found one. It took years for cloud data servers to become available.
Maybe I'm a little grizzled, but the problem I'm having nowadays is how disappointing the new offerings are. I've always gravitated towards high-value, performance critical stuff like analytics-based web apps, and I still haven't found anything that beats Vertica (which is now available as a SaaS by the way).
Oh nice - I left in 2013, so missed out on that. But that's awesome.
And I remember having the same reaction to Redshift for a while, after I'd gotten used to Vertica. Despite it being a bit fragile, Vertica was obviously very high performance, whereas Redshift (in the early days) felt kind of like a cheap imitation. It was like going from a BMW to a Kia - the BMW seems to need a lot of care, but it was clearly engineered with a lot of craft. The Kia has more of a sheen of quality, but has some loose parts under the hood. (Ie, Redshift lacked a lot of functionality then; queries would get hung up; large tables had to be pretty carefully managed in messy ways).
I don't feel that way about Snowflake today. Though I'm not sure if that's because cloud warehouses have now had enough time to grow up, or if it's because I've forgotten what Vertica was like.
The market seems to get segmented into different subcategories. Redshift is a warning to us all; I believe Amazon just never figured out how to manage it effectively.
As to Yammer, it was interesting to work with the tools that were created there. Avocado is still my model for what a data tool should be, and I believe that many orgs need a flexible self-service tool for moving data around more than they need to fixate on one warehouse platform.
The point about fixating on one platform is interesting. This is a longer idea for another day, but I think built a little too much for a future world that we want, rather than the one that's realistic. Like, if we assume we never solve some of the disorder, what would we do then?
On Redshift, what is it that you think they didn't figure out?
For example it only sorts and searches on the first eight bytes of a key, even for strings, so it's entirely possible for a unique key search to scan 65% of a sorted table.
I don't know what went on internally with Paraccel and so on, but it seems like they stopped developing the core engine at some point, possibly when Amazon bought them.
Gotcha, yeah, and it seems like they're kind of walking away away from it. Which might make sense, if they're just going to resell compute underneath Snowflake. But it also seems like there's an opportunity to build a "pure" database - just a fast, almost bare metal warehouse that's easy to get up and running - to compete against the soon-to-be-bloated "data platforms" that are becoming the standard. But, seems unlikely to happen.
I tend to agree to the most of the points. I could draw parallels to "Analytics / AI / ML" world too. Essentially, people are becoming more tool centric and attribute their success (or failure) to the tool's features (or lack of them). When we say "data science", often we tend to overlook " Science " part of the things. Usually, it starts with a good hypothesis or by asking a right question. That is an art which is missing very badly these days. I think last paragraph in your article captures the importance of asking the right questions.
Yeah, another commenter hypothesized that as tools have become more important, we've tended to measure our value on how well we use those tools rather than what we do with them. Which, I think certainly seems possible, though it also feels a bit like the easy criticism in the community these days. Like we've come to accept that we talk about tools too much as a given problem, but don't really ask what that means.
Chuck, a soccer buddy of mine, was a professional still photographer working in the movie industry 20 years ago. When digital photography came out he scoffed and said that a better word processor doesn’t make you a poet so the industry would always need experts like him. Then the consumption of photos moved suddenly from glossy print magazine ads and 27 x 40 inch posters to 468 x 60 pixel online ads. No poetry needed in 468 x 60 pixels. Chuck lost his photographer role and instead became an expert in organizing thousands of digital photos and managing the distribution of ads to online networks.
Caveat: I’m an evangelist for a data/AI/machine learning platform vendor. (If the following is too salesish let me know.) IMHO some data teams’ impact hasn’t changed much because they’re still doing old tasks that are now less valuable, like taking still photos. The cheese moved. Instead of central data teams being “honorary members of the marketing and customer success leadership,” those departments often now have actual team members who are data experts. A new, valuable role for the data team is to empower those data experts at the edges of the business by providing them with powerful, self-service platforms and best practices.
We have hundreds of customer success stories to back it up including Standard Chartered Bank’s 3,000% productivity increase, a multinational telecom’s 16,000% productivity increase, and Unilever’s data-driven evaluation of new product ideas that’s 100 million times faster than their previous method.
I've periodically seen arguments like this, that data teams need to evolve to be more like platform teams (it's similar to the data as a product argument too). It doesn't feel wrong to me, but I'm not sure it's an either/or choice. The data-as-a-platform-team approach just seems like a different role than what a lot of today's teams are trying to fill. There certainly can be value in that role (and maybe more than what's in traditional data teams), but it doesn't really address the question about the traditional role of data teams in helping businesses understand how they're performing and things like that. Though the answer there could what you suggest - that the value there is relatively limited (or is capped), and so new tools need to find problems with higher ceilings.
Hi Ben !! May be the perceived value is a factor of investment as well. Years back, cost of data infrastructure was 4-6X cost of data professional salaries, now it's a fraction of it. Then it was more about " let's respect, hard to find folks who can get some value out of the large investment we have done", vs "we have invested so much in these folks, they better deliver X times"
Hmmm. I could see some kind of dynamic like that, where, as data teams become more of widespread thing, there's more of a transactional expectation around what they deliver and less of "expert" work (whatever that is). But, even if that were true, it still doesn't quite make sense to me why the tools wouldn't make us collectively way more effective. Unless it does, and we've all subconsciously raised the bar so much for what we expect that it doesn't feel that way.
Eh, I'm not sure I'd buy that it's that zero sum, though I think there's probably something about how hiring works that creates this effect. Tech skills (and to some extent, familiarity with tools) are easier to put on resumes and easier to screen for. And the more the industry is defined by tools, the stronger that dynamic.
I was trying to formulate an answer to your question: "Why has data technology advanced so much further than value a data team provides?"
Because we treat the data team as a specialised team thus creating a dichotomy between data vs non-data teams. Of course, there are specialisations of data architects, engineers etc. and there are overlaps between these roles. But in most organisations, data teams are seen as functional not essential like say HR or finance. The data team often emerges as an afterthought. Some of these issues have been addressed in the "data as product" or "design thinking" approach where communication and empathy are crucial but these skills are usually not thought of as essential for data teams. In my current role as a Data Architect, I have done a similar (in some cases more) amount of communication, liaison, team building efforts along with actual technical architectural work.
Gotcha. I could see that going either way. On one hand, there's an argument to be made that data work is more of a set of skills that everyone needs to learn, in the way that companies don't have "sending good emails" teams or "typing" teams. Over time, as more people learn the skills, the need for a centralized team will go away. On the other hand, I could see data becoming like HR and finance, where, even if it's not essential now, it becomes that way. There was a point when IT wasn't everywhere; now, every company has IT people.
having lived through the era you describe, there were more generalists in the field because they had to be to juggle the tooling at their disposal. So far appears to be a zero sum game between tool advancement and skill capacity.
Meaning, if we have more specialized tools, people end up training themselves on the tool rather than “business need” (or whatever thing the tool is meant to solve)?
I also sometimes wonder if the increased reliance on consultancy firms is a net benefit. I'm from one myself. I'm not speaking against our own company. We do good work and our clients see results. But as you are questioning how data teams work and thinking of how to improve this, it's worth taking into consideration. Especially now with the affordability of the tooling smaller and mid-tier companies are able to set up their own data stack.
As consulting analysts and analytics engineers the greatest perk is that you are able to work on many different problems. You can learn fast and re-apply what you learn across accounts. If you're waiting on access or approval for one account you can switch to a different one and keep on going. Another benefit is that you (often) work with a larger group of various talents that you can rely as a soundboard or for direct support. For some companies it would not be possible to hire the same expertise full-time. Or given the average tenure of people in the space does it warrant investment into in-house staff?
There is however a loss for companies as well. Does working with external teams allow for strategic focus? Long term thinking? How does it impact focus on the most important problems? How does it impact feedback loops and iteration that are so vital for developing healthy data products? We've helped multiple companies hire and train their own team during engagements. But striking the right balance is no easy feat and should be part of the discussion as we are looking to improve as a field.
ooh, that's a really interesting point. There's definitely a lot of consulting work out there now, and I haven't seen try to figure out what that means. Is it good, because it makes data work more accessible? Does it try to cram one-size-fits-all solutions on companies that don't need it? Does it encourage a kind of short-term-ism? Do we all actually need internal data teams? It seems like there's a very big difference between hiring a team vs using consultants (and it goes beyond just the cost), but I haven't seen any discussion of that. That's a good idea for a future post, perhaps.
I'm also a consultant and I think about this a lot as well. One thing I've noticed is that because of the standardization of tooling, consulting bids are more tech-focused e.g. "we're looking for help setting up Looker / Snowflake". A part of me is like this is good because the client has more time to focus on application than technology. But I can also see how the relative ease/price of hiring consultants might induce demand for more unnecessary technology and actually distract from strategic work.
Would also love to see a future post on this!
That's a good point too - when tooling becomes more standardized, it seems like there are a lot of implementation specialists to set those tools up. Which might be good, because people could—in theory—then focus on making the tools useful. But it seems like (based entirely on feel) what actually happens is people just try to set up the tool because they think that's what they're supposed to do, and worry *less* about what it's for.
I took over managing Yammer's Vertica cluster in 2013, and I figured out why nodes were going down. The spread daemon, which is basically the control plane for the cluster, was competing with queries for network and CPU, and a few seconds of starvation during a heavy query would trigger a partition event. I found that nice'ing spreadd up to realtime priority eliminated node failures.
Vertica originally recommended separate switches for this traffic, and this type of hiccup made it hard to simply switch to cloud or generic servers with one network interface. The cloud at that time also lacked data-intensive instance types -- I used to test Azure instances every few months to see if they could even get near our I/O specs, and I never found one. It took years for cloud data servers to become available.
Maybe I'm a little grizzled, but the problem I'm having nowadays is how disappointing the new offerings are. I've always gravitated towards high-value, performance critical stuff like analytics-based web apps, and I still haven't found anything that beats Vertica (which is now available as a SaaS by the way).
Oh nice - I left in 2013, so missed out on that. But that's awesome.
And I remember having the same reaction to Redshift for a while, after I'd gotten used to Vertica. Despite it being a bit fragile, Vertica was obviously very high performance, whereas Redshift (in the early days) felt kind of like a cheap imitation. It was like going from a BMW to a Kia - the BMW seems to need a lot of care, but it was clearly engineered with a lot of craft. The Kia has more of a sheen of quality, but has some loose parts under the hood. (Ie, Redshift lacked a lot of functionality then; queries would get hung up; large tables had to be pretty carefully managed in messy ways).
I don't feel that way about Snowflake today. Though I'm not sure if that's because cloud warehouses have now had enough time to grow up, or if it's because I've forgotten what Vertica was like.
The market seems to get segmented into different subcategories. Redshift is a warning to us all; I believe Amazon just never figured out how to manage it effectively.
As to Yammer, it was interesting to work with the tools that were created there. Avocado is still my model for what a data tool should be, and I believe that many orgs need a flexible self-service tool for moving data around more than they need to fixate on one warehouse platform.
The point about fixating on one platform is interesting. This is a longer idea for another day, but I think built a little too much for a future world that we want, rather than the one that's realistic. Like, if we assume we never solve some of the disorder, what would we do then?
On Redshift, what is it that you think they didn't figure out?
(by which I mean, I agree they missed an opportunity; curious why you think they missed it)
It just seems to have a lot of defects and unfinished features. There's even a website that tests them (https://www.amazonredshiftresearchproject.org/white_papers/index.html), and the writeups largely echo the experience we've had.
For example it only sorts and searches on the first eight bytes of a key, even for strings, so it's entirely possible for a unique key search to scan 65% of a sorted table.
I don't know what went on internally with Paraccel and so on, but it seems like they stopped developing the core engine at some point, possibly when Amazon bought them.
Gotcha, yeah, and it seems like they're kind of walking away away from it. Which might make sense, if they're just going to resell compute underneath Snowflake. But it also seems like there's an opportunity to build a "pure" database - just a fast, almost bare metal warehouse that's easy to get up and running - to compete against the soon-to-be-bloated "data platforms" that are becoming the standard. But, seems unlikely to happen.
I tend to agree to the most of the points. I could draw parallels to "Analytics / AI / ML" world too. Essentially, people are becoming more tool centric and attribute their success (or failure) to the tool's features (or lack of them). When we say "data science", often we tend to overlook " Science " part of the things. Usually, it starts with a good hypothesis or by asking a right question. That is an art which is missing very badly these days. I think last paragraph in your article captures the importance of asking the right questions.
Yeah, another commenter hypothesized that as tools have become more important, we've tended to measure our value on how well we use those tools rather than what we do with them. Which, I think certainly seems possible, though it also feels a bit like the easy criticism in the community these days. Like we've come to accept that we talk about tools too much as a given problem, but don't really ask what that means.
Chuck, a soccer buddy of mine, was a professional still photographer working in the movie industry 20 years ago. When digital photography came out he scoffed and said that a better word processor doesn’t make you a poet so the industry would always need experts like him. Then the consumption of photos moved suddenly from glossy print magazine ads and 27 x 40 inch posters to 468 x 60 pixel online ads. No poetry needed in 468 x 60 pixels. Chuck lost his photographer role and instead became an expert in organizing thousands of digital photos and managing the distribution of ads to online networks.
Caveat: I’m an evangelist for a data/AI/machine learning platform vendor. (If the following is too salesish let me know.) IMHO some data teams’ impact hasn’t changed much because they’re still doing old tasks that are now less valuable, like taking still photos. The cheese moved. Instead of central data teams being “honorary members of the marketing and customer success leadership,” those departments often now have actual team members who are data experts. A new, valuable role for the data team is to empower those data experts at the edges of the business by providing them with powerful, self-service platforms and best practices.
We have hundreds of customer success stories to back it up including Standard Chartered Bank’s 3,000% productivity increase, a multinational telecom’s 16,000% productivity increase, and Unilever’s data-driven evaluation of new product ideas that’s 100 million times faster than their previous method.
https://www.oreilly.com/library/view/data-mesh/9781492092384/ch04.html
https://www.dataiku.com/stories/detail/standard-chartered/
https://www.dataiku.com/stories/detail/telecommunications-service-failure-prediction/
https://community.dataiku.com/t5/Dataiku-Frontrunner-Awards/Unilever-Creating-Data-driven-Product-Ideas-Based-Exclusively-on/ta-p/29099
I've periodically seen arguments like this, that data teams need to evolve to be more like platform teams (it's similar to the data as a product argument too). It doesn't feel wrong to me, but I'm not sure it's an either/or choice. The data-as-a-platform-team approach just seems like a different role than what a lot of today's teams are trying to fill. There certainly can be value in that role (and maybe more than what's in traditional data teams), but it doesn't really address the question about the traditional role of data teams in helping businesses understand how they're performing and things like that. Though the answer there could what you suggest - that the value there is relatively limited (or is capped), and so new tools need to find problems with higher ceilings.
Hi Ben !! May be the perceived value is a factor of investment as well. Years back, cost of data infrastructure was 4-6X cost of data professional salaries, now it's a fraction of it. Then it was more about " let's respect, hard to find folks who can get some value out of the large investment we have done", vs "we have invested so much in these folks, they better deliver X times"
Hmmm. I could see some kind of dynamic like that, where, as data teams become more of widespread thing, there's more of a transactional expectation around what they deliver and less of "expert" work (whatever that is). But, even if that were true, it still doesn't quite make sense to me why the tools wouldn't make us collectively way more effective. Unless it does, and we've all subconsciously raised the bar so much for what we expect that it doesn't feel that way.
I think so generally.
Eh, I'm not sure I'd buy that it's that zero sum, though I think there's probably something about how hiring works that creates this effect. Tech skills (and to some extent, familiarity with tools) are easier to put on resumes and easier to screen for. And the more the industry is defined by tools, the stronger that dynamic.
"Why has data technology advanced so much further than value a data team provides?" We need to stop using the term "data team" -- instead use "team".
Why? (That’s an honest question)
I was trying to formulate an answer to your question: "Why has data technology advanced so much further than value a data team provides?"
Because we treat the data team as a specialised team thus creating a dichotomy between data vs non-data teams. Of course, there are specialisations of data architects, engineers etc. and there are overlaps between these roles. But in most organisations, data teams are seen as functional not essential like say HR or finance. The data team often emerges as an afterthought. Some of these issues have been addressed in the "data as product" or "design thinking" approach where communication and empathy are crucial but these skills are usually not thought of as essential for data teams. In my current role as a Data Architect, I have done a similar (in some cases more) amount of communication, liaison, team building efforts along with actual technical architectural work.
Gotcha. I could see that going either way. On one hand, there's an argument to be made that data work is more of a set of skills that everyone needs to learn, in the way that companies don't have "sending good emails" teams or "typing" teams. Over time, as more people learn the skills, the need for a centralized team will go away. On the other hand, I could see data becoming like HR and finance, where, even if it's not essential now, it becomes that way. There was a point when IT wasn't everywhere; now, every company has IT people.
having lived through the era you describe, there were more generalists in the field because they had to be to juggle the tooling at their disposal. So far appears to be a zero sum game between tool advancement and skill capacity.
Meaning, if we have more specialized tools, people end up training themselves on the tool rather than “business need” (or whatever thing the tool is meant to solve)?
Welcome back. I’m glad you survived the election and the Twitter apocalypse. :-)
give it time…
https://www.goodreads.com/quotes/7440855-in-economics-things-take-longer-to-happen-than-you-think