Stack Overflow Joins Twitter and Reddit in Charging AI Companies for Training Data

Stack Overflow CEO Prashanth Chandrasekar said that the website plans to charge AI developers for access to its data as soon as halfway through this year.

We may earn a commission from links on this page.
Artificial intelligence companies like OpenAI scrape the internet for data in order to make their tech as smart as it is.
Artificial intelligence companies like OpenAI scrape the internet for data in order to make their tech as smart as it is.
Image: Dennis Diatel (Shutterstock)

Stack Overflow is joining in on the AI resistance in forcing companies behind the rapidly advancing tech to pay up. The go-to resource for programmers is joining Twitter and Reddit in forcing AI companies to pay for the data they use to train their technology.

As detailed in Wired, developing the systems that run viral AI tools like ChatGPT and DALL-E can cost the companies behind them hundreds of millions of dollars, and Stack Overflow is about to make it more expensive. Artificial intelligence companies like OpenAI scrape the internet for data in order to make their tech as smart as it is, and have mainly been able to do that for free until now. Stack Overflow CEO Prashanth Chandrasekar said that the website plans to charge AI developers for access to its data as soon as halfway through this year according to the outlet.

Advertisement

“Community platforms that fuel LLMs absolutely should be compensated for their contributions so that companies like us can reinvest back into our communities to continue to make them thrive,” Chandrasekar said as quoted by Wired. “We’re very supportive of Reddit’s approach.”

Advertisement

An investigation by The Washington Post published this week revealed the millions of websites that are inadvertently training AI through Google’s massive C4 dataset, with Reddit and Stack Overflow making the cut. Other sites like Wikipedia, Medium, The New York Times, and even Gizmodo have been used to train AIs like Facebook’s LLaMA and Google’s T5. Perhaps the most notable statistic was that the copyright symbol appeared more than 200 million times in the dataset. 

Advertisement

The data from these sites are clearly valuable to AI programmers, and Chandrasekar hopes that the revenue from charging those developers for access to Stack Overflow will allow the website to keep attracting users and maintaining high-quality information.

The move comes as the conversation surrounding the ethics of training AI picks up steam. Universal Music Group, one of the largest record labels in the world, asked Spotify, Apple Music, and other streaming platforms to limit AI’s access to the copyrighted material of its artists. The ask was timely, as a completely AI-generated collaboration between The Weeknd and Drake went viral.

Advertisement

Want to know more about AI, chatbots, and the future of machine learning? Check out our full coverage of artificial intelligence, or browse our guides to The Best Free AI Art Generators, The Best ChatGPT Alternatives, and Everything We Know About OpenAI’s ChatGPT.

Advertisement