There's a concept that is starting to take hold as a central narrative for how the web will evolve very rapidly: AI has started an irreversible transformation of the web as we knew it, and before we even realize it, 99% of the content produced and shared will be AI-generated.
This is both somehow obvious and totally fascinating. Not unlike standing in the centre of Pompeii when the Vesuvius starts fuming and bursting.
The core idea is that AI-generated content will be produced at Moore's law speed and acceleration because (a) the nature of these AI systems allows them to churn content with incredible speed and volumes, soon out-pacing anything three billion humans glued to their phones and laptops would ever be able ever to achieve, and (b) once started these systems will also start to learn from their own content in an exponential self-feeding loop (more on this in a moment).
I agree with my colleague from the Copenhagen Institute for Futures Studies, Timothy Shoup, when he states that “ in the scenario where GPT-3 ‘gets loose’, the internet would be completely unrecognizable”. And in this scenario, he would bet on 99% to 99.9% being AI-generated by 2025 to 2030. – Sofie Hvitved
The contamination of the internet has already started as a large part of digital business is all about generating SEO-optimized clicks and ad revenues. For many, this poisoning of the web is a boon. Google being the only gatekeeper you need to fool to make abuse referencing content, they're now the main target of artificial, AI-generated content:
NewsGuard, a company that provides tools for vetting news sources, has exposed hundreds of ad-supported sites with generic-sounding names featuring misinformation created with generative AI. It’s causing a problem for advertisers. Many of the sites spotlighted by NewsGuard seem exclusively built to abuse programmatic advertising, or the automated systems for putting ads on pages. In its report, NewsGuard found close to 400 instances of ads from 141 major brands that appeared on 55 of the junk news sites. - Kyle Wiggers
And Google will probably speed up things as they themselves start to use AI to digest search results, creating a shittier version of our already not-so-stellar-anymore web experience:
Again, it’s the dynamics of AI — producing cheap content based on others’ work — that is underwriting this change, and if Google goes ahead with its current AI search experience, the effects would be difficult to predict. Potentially, it would damage whole swathes of the web that most of us find useful — from product reviews to recipe blogs, hobbyist homepages, news outlets, and wikis. Sites could protect themselves by locking down entry and charging for access, but this would also be a huge reordering of the web’s economy. In the end, Google might kill the ecosystem that created its value, or change it so irrevocably that its own existence is threatened. - James Vincent
In its latest investigation, NewsGuard revealed 49 websites purely generated by AI language models simulating "human-produced" content without disclosure of their nature. These sites are designed to publish high volumes of content across politics, health, entertainment, finance, and technology. False narratives are prevalent, and the content features repetitive phrases and bland language laced with low-quality programmatic ads, perfect for search engine optimization.
To quote Melissa Heikkilä: "We may be witnessing, in real-time, the birth of a snowball of bullshit."
Large language models are trained on data sets that are built by scraping the internet for text, including all the toxic, silly, false, malicious things humans have written online. The finished AI models regurgitate these falsehoods as fact, and their output is spread everywhere online. Tech companies scrape the internet again, scooping up AI-written text that they use to train bigger, more convincing models, which humans can use to generate even more nonsense before it is scraped again and again, ad nauseam.
This obviously concerns images and videos, too. In all of these cases, the notion of poisoning or contamination becomes central. The more content is produced by AI on the web; the more AI will learn from this content to produce even more content:
The internet is now forever contaminated with images made by AI. The images that we made in 2022 will be a part of any model that is made from now on. - Mike Cook
Ironically, AI specialists and LLM proponents start to get worried themselves. What will be the quality of future AI crawling and learning from a web where most of the content is already AI-generated? From their viewpoint, there will be a pre-2023 and post-2023 period. A vintage version of the web pre-AI where the content was human-generated, rich, and flavourful on one side. On the other side, a post-AI web post-2023 where the next-gen models will only be able to be fed on a limited set of content pre-2023 or feed on a much abundant but poorer content.
I can easily imagine when we will have "Certified Human-Only Content" labels and certifications. Or, more probably, "Certified 90% Human Content." A question for SF authors would be how activists or competing corporations would design content bombs, making AI systems crawling them get mad and spew nonsense afterward–where's William Gibson when we need him?
Meanwhile, in probably what is the most useless thing to do, we updated our Terms and Conditions:
(8) You may not use any automated or manual methods, including but not limited to web crawlers, spiders, or any other software, to extract, scrape, or crawl our content without obtaining our prior written consent, with the sole exception of referencing links to our articles. We explicitly state that we do not authorize the crawling or scraping of our content by AI software such as LLM (Large Langage Model) or any other type of current or future technology. Unauthorized crawling violates our intellectual property rights and is strictly prohibited.