16 Comments
Mar 28, 2022Liked by Benn Stancil

Great post and thank you for addressing an oft-overlooked topic. I define databases by what I think of as their fundamental modeling unit. RDBMSs model everything as a relation (or table). XML databases model everything as an XML document. Hierarchical databases as hierarchies. Network databases as a directed acyclic graph. Multi-dimensional DBMSs (MDDBMSs) aka OLAP cubes model everything as hypercubes. A hypercube has (hierarchical) dimensions as edges and store metrics in the intersections -- the classic example being product, geography, and time. As in, show me sales (the metric) of widgets (product) in NY (geo) in 3Q20 (time). Not sure if that helps much but it tries to give a conceptual definition of them. They were initially launched to be generally useful but found only one real use-case: financial planning and budgeting apps which do a slot of slicing, dicing, and drilling (and aggregation) across dimensions. EPM vendors today all deal with the basic nightmare that OLAP databases have died off and, to my knowledge, only Anaplan has built a real substitute. You are correct that they were fast, trading precomputation and storage for speed. You are also correct that they are not relevant in the database industry these days and that, e.g., Microsoft is basically phasing SSAS out leaving something in place that almost but not quite replaces it if you're building an EPM application. FWIW, I don't find the cube-viewer interfaces counter-intuitive; Cognos made a fortune selling PowerPlay for years which was a cube-viewer. At BusinessObjects, where I worked, we tended to build cubes on the fly (which size limited them) and presented a more dynamic reporting interface as opposed to a slicer/dicer interface which people, to your point, seemed to like better.

Expand full comment

Good post! I have to disagree with the Census as the worst government data portal, though; my personal nominee is trying to get soil data from the USDA (though can think of plenty of other critical data sets/sources that are only a smidge better):

https://websoilsurvey.sc.egov.usda.gov/App/HomePage.htm

Expand full comment
Mar 26, 2022Liked by Benn Stancil

One way to build more intuitive data tools is to go back to the first principles, in this case understanding why relational databases are called relational. A table is not necessarily describing a set of objects. Otherwise Codd would have called his invention object databases. A table in relational databases describes an n-ary relation between a set of attributes--not even a set of things but a set of attributes that might belong to different things. Sometimes the set of attributes happen to map well to distinct objects such as the purchases in Benn's example but other times it might just describe the relations between aggregated attributes that map better to the business questions. Relational tables are a genius way to model the real world where everything is somehow related to everything else. It's a way to draw some boundaries around blobs of things in the problem domain and store them in databases. So if I were to experiment with designing a more intuitive data tool for business users that didn't have to rely on cubes, I would start from 'relations'. A tool that would let business users describe how sales, states, months of the year and products are related. I have a hunch that it might lead to some interesting or even maybe more helpful new designs.

Expand full comment
Mar 25, 2022Liked by Benn Stancil

I have to disagree with your point that OLAP is less intuitive for users. To me, OLAP has two main benefits: (1) it's fast and (2) it's a business-friendly semantic layer that allows "anything by anything" queries. It's the business-friendly semantic layer that makes OLAP intuitive for everyone, not just data engineers. The speed makes it even better because users can ask questions as fast as they can think of them. I would agree with you that OLAP as a technology, meaning pre-calculation of aggregates, is dead. By pre-calculating views of data you inherently limit which questions can be asked. The key is to keep the OLAP benefits of speed and semantics and ditch the technical implementation that stopped scaling in the 90s.

Expand full comment

I both agree and disagree with your point. If we focus on olap cubes as a discrete structure in themselves, I think we miss the point. Instead, I would suggest looking at a cube as a model of dimensions and possible intersections. Each "cell" in the cube is a crossing of dimensions and descriptions of that. Cubes get super complicated because they represent so many possibilities.

Expand full comment