New York Times Sues OpenAI And Microsoft For Using Its Stories To Train Chatbots

The New York Times is striking back against the threat that artificial intelligence poses to the news industry, filing a federal lawsuit Wednesday against OpenAI and Microsoft seeking to end the practice of using its stories to train chatbots.

The Times says that the companies are threatening its livelihood by effectively stealing billions of dollars worth of work by its journalists, in some cases spitting out Times’ material verbatim to people who seek answers from generative artificial intelligence like OpenAI’s ChatGPT. The newspaper’s lawsuit was filed in federal court in Manhattan.

OpenAI and Microsoft did not respond to requests for comment.

The media is one of many industries that could be upended by the rapid development of AI. Media organizations have already been pummeled by a migration of readers to online platforms and while many publications — most notably the Times — have successfully carved out a digital space, AI could become a significant threat.

“These bots compete with the content they are trained on,” said Ian B. Crosby, partner and lead counsel at Susman Godfrey, which is representing The Times.

Artificial intelligence companies scrape information available online, including articles published by news organizations, to train generative AI chatbots. The large language models are also trained on a huge trove of other human-written materials, such as instructional manuals and digital books. That helps them to build a strong command of language and grammar and to answer questions correctly.

But the technology is still under development and gets many things wrong. In its lawsuit, for example, the Times said OpenAI’s GPT-4 falsely attributed product recommendations to Wirecutter, the paper’s product reviews site, endangering its reputation.

OpenAI and other AI companies, including rival Anthropic, have attracted billions in investments very rapidly since public and business interest in the technology exploded, particularly this year.

Microsoft has a partnership with OpenAI that allows it to capitalize on the company’s AI technology. The Redmond, Washington, tech giant is also OpenAI’s biggest backer and has invested at least $13 billion into the company since the two began their partnership in 2019, according to the lawsuit. As part of the agreement, Microsoft’s supercomputers help power OpenAI’s AI research and the tech giant integrates the startup’s technology into its products.

The paper’s complaint comes as the number of lawsuits filed against OpenAI for copyright infringement is growing. The company has been sued by several writers — including comedian Sarah Silverman — who say their books were ingested to train OpenAI’s AI models without their permission. In June, more than 4,000 writers signed a letter to the CEOs of OpenAI, Google, Microsoft, Meta and other AI developers accusing them of exploitative practices in building chatbots that “mimic and regurgitate” their language, style and ideas.

As AI technology develops, growing fears over its use have also fueled labor strikes and lawsuits in other industries, including Hollywood. Different stakeholders are realizing the technology could disrupt their entire business model, but the question will be how to respond to it, said Sarah Kreps, director of Cornell University’s Tech Policy Institute.

Kreps said she agrees The New York Times is facing a threat from these chatbots, but she also argued solving the issue completely is going to be an uphill battle.

“There’s so many other language models out there that are doing the same thing,” she said.

The lawsuit filed Wednesday claims that generative AI tools developed by OpenAI and Microsoft are closely summarizing content from the Times, mimicking its style and even reciting it verbatim. The complaint cited examples of OpenAI’s GPT-4 spitting out large portions of news articles from the Times, including a Pulitzer-Prize winning investigation into New York City’s taxi industry that was published in 2019 and took 18 months to complete. It also cited outputs from Bing Chat — now called Copilot — that it said included verbatim excerpts from Times articles.

The Times did not list specific damages that it is seeking, but said the legal action “seeks to hold them responsible for the billions of dollars in statutory and actual damages that they owe for the unlawful copying and use of The Times’s uniquely valuable works.” It is also asking the court to order the tech companies to destroy AI models or data sets that use its work.

Web traffic is an important component of the paper’s advertising revenue and helps drive subscriptions to its online site. The outputs from AI chatbots divert that traffic away from the paper and other copyright holders, the Times says, making it less likely that users will visit the original source for the information.

Less traffic to the Times’ Wirecutter articles, for example, means less people clicking on affiliate links, which in turn means less revenue for the paper’s product review site.

The New York Times said it’s never given permission to anyone to use its content for generative AI purposes. The lawsuit also follows what appears to be breakdowns in talks between the newspaper and the two companies that began in April, and could be a way to kickstart talks on ending a business dispute.

The News/Media Alliance, a trade group representing more than 2,200 news organizations, applauded Wednesday’s action by the Times.

“Quality journalism and GenAI can complement each other if approached collaboratively,” said Danielle Coffey, alliance president and CEO. “But using journalism without permission or payment is unlawful, and certainly not fair use.”

In July, OpenAI and The Associated Press announced a deal for the artificial intelligence company to license AP’s archive of news stories. This month, OpenAI also signed a similar partnership with Axel Springer, a media company in Berlin that owns Politico and Business Insider. Under the deal, users of OpenAI’s ChatGPT will receive summaries of “selected global news content” from Axel Springer’s media brands. The companies said the answers to queries will include attribution and links to the original articles.

The Times has compared its action to a copyright lawsuit more than two decades ago against Napster, when record companies sued the file-sharing service for unlawful use of their material. The record companies won and Napster was soon gone, but it has had a major impact on the industry. Industry-endorsed streaming now dominates the music business.

(AP)