Discussion about this post

User's avatar
Marco Sollie's avatar

I also sometimes wonder if the increased reliance on consultancy firms is a net benefit. I'm from one myself. I'm not speaking against our own company. We do good work and our clients see results. But as you are questioning how data teams work and thinking of how to improve this, it's worth taking into consideration. Especially now with the affordability of the tooling smaller and mid-tier companies are able to set up their own data stack.

As consulting analysts and analytics engineers the greatest perk is that you are able to work on many different problems. You can learn fast and re-apply what you learn across accounts. If you're waiting on access or approval for one account you can switch to a different one and keep on going. Another benefit is that you (often) work with a larger group of various talents that you can rely as a soundboard or for direct support. For some companies it would not be possible to hire the same expertise full-time. Or given the average tenure of people in the space does it warrant investment into in-house staff?

There is however a loss for companies as well. Does working with external teams allow for strategic focus? Long term thinking? How does it impact focus on the most important problems? How does it impact feedback loops and iteration that are so vital for developing healthy data products? We've helped multiple companies hire and train their own team during engagements. But striking the right balance is no easy feat and should be part of the discussion as we are looking to improve as a field.

Expand full comment
Kendall Willets's avatar

I took over managing Yammer's Vertica cluster in 2013, and I figured out why nodes were going down. The spread daemon, which is basically the control plane for the cluster, was competing with queries for network and CPU, and a few seconds of starvation during a heavy query would trigger a partition event. I found that nice'ing spreadd up to realtime priority eliminated node failures.

Vertica originally recommended separate switches for this traffic, and this type of hiccup made it hard to simply switch to cloud or generic servers with one network interface. The cloud at that time also lacked data-intensive instance types -- I used to test Azure instances every few months to see if they could even get near our I/O specs, and I never found one. It took years for cloud data servers to become available.

Maybe I'm a little grizzled, but the problem I'm having nowadays is how disappointing the new offerings are. I've always gravitated towards high-value, performance critical stuff like analytics-based web apps, and I still haven't found anything that beats Vertica (which is now available as a SaaS by the way).

Expand full comment
25 more comments...

No posts