AI needs high-quality human-generated data for training. That comes from the internet. But the internet is becoming increasingly overrun with AI-generated garbage. How screwed is future AI training? Absolutely fucked.
>But the thing is, models gravitate toward the most common output. It won't give you a controversial snickerdoodle recipe but the most popular, ordinary one. And if you ask an image generator to make a picture of a dog, it won't give you a rare breed it only saw two pictures of in its training data; you'll probably get a golden retriever or a Lab.
>Now, combine these two things with the fact that the web is being overrun by AI-generated content and that new AI models are likely to be ingesting and training on that content. That means they're going to see a lot of goldens!
The paper in Nature is pretty technical.
...and the supplementary content even more so, but in the Tech Crunch article there is an image that explains everything. In just four steps, the AI goes from a fairly representative idea of dogs to complete garbage.