I just want to note that philosophers, and mathematicians, and a whole lot of other people... invented formal languages because natural language is such a complete shitshow of ambiguity and now a lot of people are gonna reinvent this multi-century old wheel at a ridiculous cost... To the extent that natural language can clearly express what I want so a bot can write it? Great! Bring those on! But there's a big gap from that and everything else that needs to be done.

Expand full comment
Jul 18, 2023Liked by Benn Stancil

Love the article, thanks for that!

I agree that text-to-sql does not work for complex and messy DB schemata.

That's exactly why I am now investing a lot into how to implement text-to-semantic-model. Then, it follows with sematic-to-sql translation, nothing new here (Malloy, DAX, GoodData MAQL).

It requires a huge effort to create such semantic models on top of DB models, with all the business titles/descriptions for entities such as attributes, facts, and metrics.

Thinking about if LLM could help with the onboarding (bootstrap), generating as good as possible titles/descriptions from table/column names.

Experimenting, looks like most powerful models nowadays are surprisingly good in this area as well.

Btw., companies having nice DB models (or even dbt models with DOC) will have competitive advantage here!

Expand full comment
Jul 16, 2023Liked by Benn Stancil

At least for Snowflake, SnowBuilder (patent pending)😎 Virtual transformation for.your cryptic column names into natural language names. Fully qualified names automatically, plus provide UPPER_CASE columns for key column. Different names, same column.

No more hitting the wrong key when reaching for quotes. In private preview.

The initial version of SnowBuilder was in fact used to build exactly what Benn described. Completely converted seven, QlikSense applications into seven delivered Snowflake views.

17 total VIEWs, 10 of them reusable.

2500 total columns.

29 existing data sources.

Only 2 simple hand written CREATE VIEW statements; everything else generated.

2267 name transformations.

121 expression transformations

124 UDF calls (created about 50 UDFs in place of QlikSense functions).

We will be providing a SnowBuilder repository for this for demo, exploration and teaching purposes. You won't be able to execute the resulting views, obviously, as you won't have the underlying data warehouse. Other demos will be executable.

Expand full comment

Love it!

Imo a blind spot in the BI space is not exploring using AI to accelerate or support the dashboard or analytics creation experience. I remember traditional BI vendors saying the majority of the time in a journey to building and deploying a dashboard (~80%) is actually not spend on creating the dashboard or analysing data. It's spent on finding data, understanding data, exploring data, preparing a semantic layer (defining dimensions, measurer, aliases, buckets, drilldown groups), and then writing expressions (think DAX, set analysis etc.) & designing/styling charts or dashboards.

And to get to this result, you would typically need to be an experienced BI developer combining a unique venn diagram of skills compromising data, business domain and design knowledge.

Can AI help simplify some parts of the journey in form of a BI developer co-pilot? A semantic layer generation based on a prompt? Chart or data suggestions to pick from? Generate a good-looking and user-friendly dashboard? Style a chart to look like the economist styling guide?

Suppose you can enable your average business user or citizen analyst to become more and more self-sufficient. In that case, you'd allow a hyperagile workforce to bring more meaningful data to data consumers faster. The analysis support definitely has value, but people usually just want to see sums, counts, averages (maybe ratios).

Expand full comment
Jul 14, 2023Liked by Benn Stancil

Shameless plug from atoma.ai - a pivot from patterns.app where we spent the past 2 years building an end-to-end python and SQL data orchestration and visualization tool where atoma now get's all of it's juicy context to do it's job...

Idk... hard to deny the enabling capabilities of LLMs for truly self-serve BI which has been a pipe dream for decades. Are virtual intelligences vs. dashboards a real enough product differentiation to win buyers wallets? We'll see!

(also no need for filters or any complex dashboard UI because the AI can just write vegalite)

Expand full comment
Jul 21, 2023Liked by Benn Stancil

So here is my idea:

- collect semantic model using API/SDK of BI vendor

- generate training data (Q/A, ...) from the model

- train (fine-tune) or prompt a LLM

When end users ask questions to such LLM, the quality of answers should be much better thanks to semantic properties mapped to column names.

Under-the-hood, LLM can translate questions e.g. to a report definitions consisting of entities from the semantic model.

For instance: Report{ attributes => [customer_name], metrics => [revenue] }

Such report definition can be sent to BI server, which can translate it to SQL, execute it and return data (JSON, data frame, arrow, ...).

Expand full comment
Jul 18, 2023Liked by Benn Stancil

Snowflake has always had the ability to create mixed case with spaces. column names. You just have to embed them in double quotes, like most RDBMS (except SQL Server). Upper case is the SQL standard unless qualified by double quotes.

I figured it was joke, but didn't get it.

But having to constantly qualify mixed case with double quotes is a royal PITA, as is much else in writing VIEW, i.e. COLUMN COMMENTs

SnowBuilder (patent pending) removes 90+ of that.

It's really much, much more. As soon as I get domain issues resolved for new website, I will publish much more.

It's pure SQL. It generates DDL and DML from meta-data repositories (Snowflake tables). It performs what I call Virtual Transformations. Instead of transforming physical data, it does it as VIEWs. Semantic (and presentation, not that anybody talks about that) are build from layered VIEWs. "Serially reusable meta-data".

Extract, Load, Virtual Transform (ELVT) is the workflow. But the Extract is meta-data, not data.

We call it "SnowBuilder SQL", e.g user functions. SBLD_VIEW, SBLD_SELECT take the name of a "VIEW"

and generate CREATE VIEW DDL and the SELECT statement respectively using the source VIEW name in a repository.

NAME transformation is a key feature. Everything generated is always fully qualifed, e.g "Customer Name".

I refer to the as "Freedom from the tyranny of the pinky finger"!

No SnowBuilder UI. Editing either direct SQL for bulk editing, or pull meta-data into spreadsheet, edit. and load back into repository.

All user managed and delivered as SECURE UDF via SHARE (marketplacee). Unlimited usage and no limit on repositories.

Expand full comment

I think Benn should write a Children’s Book - “If You Give a Mouse a Datum.”

Expand full comment

fire article! thank you!

Expand full comment
Jul 15, 2023Liked by Benn Stancil

Almost every one of your post makes us looks at things a different way, introduces a new paradigm, like *no-one else* does. It's especially true for that specific post. I am so amazed... and thankful for all these thought-provoking articles you are writing and sharing with us.

Expand full comment
Jul 14, 2023Liked by Benn Stancil

Benn, you have an amazing talent to be extremely right and extremely wrong at the same time

Expand full comment

Why not just provision a vector store with context info then feed the bot the prompt it needs to translate the queries you're discussing? Your take is naive.

Expand full comment