The New York Times has launched legal actions against OpenAI and Microsoft to claim copyright rights on how ChatGPT was trained. This is not the first trial in that regard, but it promises to be momentous as it will mostly establish jurisprudence in the U.S. for what can be done (or not) to train large language model AIs.
What interests me here is how this legal action might spark the creation of an umbrella deal repeating the history of streaming vs. the music industry.
Historically, streaming rights negotiations have evolved from basic licensing agreements to more intricate deals encompassing user engagement, data sharing, and exclusivity. What will happen for AI might be a lookalike where major content producers will architecture a package deal that would-be AI developers will have to sign and pay for when training their AI models on copyrighted content.
Soon enough, the NYT, Facebook, or (hopefully) Wikipedia will find an agreeable deal with Microsoft and others. They might even time-lock some of their content in this deal – meaning LLMs will not be able to crawl content fresher than a month or a week old so that you still have an incentive to go to the NYT website. And leading social media platforms will simply add another disclaimer that consumers won't read when clicking accept - accept - accept to onboard Instagram or TikTok.
But contrary to the music industry, where there's a manageable number of major and independent labels, LLMs cast an exponentially wider content net. We could discuss how the long tail of self-hosted blogs, newsletters, personal pictures, and so much more have zero chance of being compensated.
But the pivotal shift at play might be with major brands.
What happens when Loréal or Nestlé realize they've been training these models with all their decades-long production of brand assets? When we ask Dall-E to generate the image of a luxury car that looks like a Mercedes Benz. Could Microsoft and others afford a packaged copyright deal with every brand on the planet? Will we invent 'reasonable' criteria, such as being on the S&P500 to have access compensation?
On the very day that Disney is supposed to relinquish Mickey Mouse to the public domain, the complex discussion about content, trademarks, and copyrights is now propelled in the exponential age of AI. It's going to be a slow burn, but you can bet that the way major content producers, and soon enough the way major brand assets are managed, is going to change in a significant way.
The early stage of this might be rapidly growing paywalls around desirable content, making the early days dream of an open flow Internet begone forever.