7 Comments

It seems that "high quality text" will almost need a clear quantification of "quality" in the future for accurate monetization to occur.

My 2024 brain has zero ideas for quantifying the quality of text across the web. I could start by saying heuristically that the average NYT article would probably be at least a 7/8 out of 10, while The New York Post couldn't be more than a 4/5 out of 10. But then, after major news outlets, you get into weird data sources like Reddit where r/wallstreetbets would score a 0/10 but a subreddit like r/personalfinance could score an 8/10, and then of course also vary by user and context.

As I'm writing this, it seems that only large organizations like the NYT would have the argumentative and lobbying/lawyer power to ever have their text monetized. And then how do you monetize it?

Interesting read and thought provoker. I originally thought the court case was a bit silly from the Times but now I agree with your point that they are just looking to "make money from that tide"

Expand full comment

I was just having a conversation the other day with someone who thought of an LLMs usefulness just being for what it "knows". We talked about an example of an LLM that is a company historian that knows what decisions were made and why they were made at your company. Just as soon as this idea came up we discussed the problems with a historian. Granted our problems were mostly unrelated to licensing - although privacy/HR was a potential problem. However - the biggest consideration was how companies would want to redact certain information, restrict other information, and completely forget a 3rd set of information. Those would be pretty difficult activities to achieve with 100% accuracy - and mistakes could be costly, dangerous, harmful etc. Plus what company (or person) actually wants a perfect memory of what happened - we choose to forget stuff all the time. :-) Anyways - all that to say I'm in the camp of "doing" being more useful than "knowing". However, I think part of making the doing useful is referencing external data through databases, apis, and web-crawling (like a search engine would). Therefore the knowing is less Oracle like and more search engine like. However if the Oracle route was chosen I agree - there is no way the thing would work without lots of copy-written material.

Expand full comment