A coalition of publishing companies representing nearly 400 newspapers has filed a federal lawsuit against OpenAI and Microsoft in the Southern District of New York, accusing the tech giants of scraping and repurposing journalistic content without authorization to train their artificial intelligence systems.
The complaint, filed Wednesday, alleges that articles produced by the plaintiffs were systematically collected from their news sites, copied, and stored in the defendants’ internal systems to fuel large language models powering tools such as ChatGPT and Microsoft Copilot. The publishers claim the material was exploited without any compensation to the copyright holders and that copyright management information was removed during the scraping process.
“A Vital Force Under Threat”
The legal action represents one of the largest coordinated efforts to date by local and regional news outlets against AI developers. Matthew Platkin, former attorney general of New Jersey and counsel for the plaintiffs, told Bloomberg Law News that the lawsuit aims to protect the economic foundation of community journalism across the United States.
“Local journalism is a trusted source of information for the vast majority of Americans. It is the lifeblood of our democracy, and this business model has put local news at risk of extinction.”
Platkin emphasized that any settlement or court ruling should extend benefits beyond major media conglomerates, ensuring that smaller, regional outlets—which often cover issues neglected by national publications—also receive fair compensation.
The Legal Argument
According to the petition, OpenAI and Microsoft accessed news websites, reproduced full articles, and retained copies to train AI models capable of generating derivative works on demand. The publishers argue that the value generated by AI products rests in part on decades of journalistic investment, including billions spent on content creation, subscription systems, and paywalls designed to protect intellectual property.
Among the remedies sought are statutory damages under U.S. copyright law and injunctive relief to halt what the plaintiffs describe as ongoing, unauthorized use of their content.
Broader Industry Conflict
The lawsuit arrives amid a wave of similar legal challenges from media and reference organizations. The case joins recent actions brought by CNN, The New York Times, Reddit, Encyclopaedia Britannica, and Merriam-Webster, all of which allege that their copyrighted material was used to train AI models without permission.
Defendants’ Response
OpenAI issued a statement asserting that its models are trained on publicly available information and that its activities fall under the legal doctrine of fair use. “We believe our work is lawful and consistent with long-standing principles of copyright law,” the company said. Microsoft had not issued a public response as of the latest reporting from Bloomberg Law News.
Legal analysts say the outcome could set a significant precedent for how copyright law applies to machine learning and whether AI developers must negotiate licensing agreements with content creators before training their models. For local news organizations, the stakes extend beyond damages: they view the case as pivotal to preserving a sustainable future for independent journalism.
Source: Olhar Digital
