AI

The New York Times wants OpenAI and Microsoft to pay for training data

Comment

New York Times dead trees version of the World Trends page.
Image Credits: mbbirdy / Getty Images

The New York Times is suing OpenAI and its close collaborator (and investor), Microsoft, for allegedly violating copyright law by training generative AI models on Times’ content.

In the lawsuit, filed in the Federal District Court in Manhattan, The Times contends that millions of its articles were used to train AI models, including those underpinning OpenAI’s ultra-popular ChatGPT and Microsoft’s Copilot, without its consent. The Times is calling for OpenAI and Microsoft to “destroy” models and training data containing the offending material and to be held responsible for “billions of dollars in statutory and actual damages” related to the “unlawful copying and use of The Times’s uniquely valuable works.”

“If The Times and other news organizations cannot produce and protect their independent journalism, there will be a vacuum that no computer or artificial intelligence can fill,” reads The Times’ complaint. “Less journalism will be produced, and the cost to society will be enormous.”

In an emailed statement, an OpenAI spokesperson said: “We respect the rights of content creators and owners and are committed to working with them to ensure they benefit from AI technology and new revenue models. Our ongoing conversations with The New York Times have been productive and moving forward constructively, so we are surprised and disappointed with this development. We’re hopeful that we will find a mutually beneficial way to work together, as we are doing with many other publishers.”

Generative AI models “learn” from examples to craft essays, code, emails, articles and more, and vendors like OpenAI scrape the web for millions to billions of these examples to add to their training sets. Some examples are in the public domain. Others aren’t, or come under restrictive licenses that require citation or specific forms of compensation.

Vendors argue fair use doctrine provides a blanket protection for their web-scraping practices. Copyright holders disagree; hundreds of news organizations are now using code to prevent OpenAI, Google and others from scanning their websites for training data.

The vendor-outlet conflict has led to a growing number of legal battles, The Times’ being the latest.

Actress Sarah Silverman joined a pair of lawsuits in July that accuse Meta and OpenAI of having “ingested” Silverman’s memoir to train their AI models. In a separate suit, thousands of novelists, including Jonathan Franzen and John Grisham, claim OpenAI sourced their work as training data without their permission or knowledge. And several programmers have an ongoing case against Microsoft, OpenAI and GitHub over Copilot, an AI-powered code-generating tool, which the plaintiffs say was developed using their IP-protected code.

While The Times isn’t the first to sue generative AI vendors over alleged IP violations involving written works, it’s the largest publisher involved in such a suit to date — and one of the first to highlight potential damage to its brand through “hallucinations,” or made-up facts from generative AI models.

The Times’ complaint cites several cases in which Microsoft’s Bing Chat (now called Copilot), which is underpinned by an OpenAI model, provided incorrect information that was said to have come from The Times — including results for “the 15 most heart-healthy foods,” 12 of which weren’t mentioned in any Times article.

The Times makes the case, also, that OpenAI and Microsoft are effectively building news publisher competitors using The Times’ works, harming The Times’ business by providing information that couldn’t normally be accessed without a subscription — information that isn’t always cited, sometimes monetized and stripped of affiliate links that The Times uses to generate commissions, moreover.

As The Times’ complaint alludes to, generative AI models have a tendency to regurgitate training data, for example reproducing almost verbatim results from  articles. Beyond regurgitation, OpenAI has on at least one occasion inadvertently enabled ChatGPT users to get around paywalled news content.

“Defendants seek to free-ride on The Times’s massive investment in its journalism,” the complaint says, accusing OpenAI and Microsoft of “using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it.”

Impacts to the news subscription business — and publisher web traffic — is at the heart of a tangentially similar suit filed by publishers earlier in the month against Google. In the case, the plaintiffs, like The Times, argued Google’s GenAI experiments, including its AI-powered Bard chatbot and Search Generative Experience, siphon off publishers’ content, readers and ad revenue through anticompetitive means.

There’s credence to publishers’ assertions. A recent model from The Atlantic found that, if a search engine like Google were to integrate AI into search, it’d answer a user’s query 75% of the time without requiring a click-through to its website. Publishers in the Google suit estimate they’d lose as much as 40% of their traffic.

That doesn’t mean they’ll be successful in court. Heather Meeker, a founding partner at OSS Capital and an adviser on IP matters including licensing arrangements, compared The Times’ example of regurgitation to “using a word processor to cut and paste.”

“In the complaint, The New York Times gives an example of a ChatGPT session about a 2012 restaurant review,” Meeker told TechCrunch via email. “The prompt for ChatGPT is ‘What were the opening paragraphs of his review?’ The next prompts then repeatedly ask for ‘the next sentence.’ Teasing a chatbot into reproducing input is not a sensible basis for copyright infringement … If the user intentionally makes the chatbot copy, that’s the user’s fault. And that’s why most [lawsuits like this] will probably fail.”

Some news outlets, rather than fight generative AI vendors in court, have chosen to ink licensing agreements with them. The Associated Press struck a deal in July with OpenAI, and Axel Springer, the German publisher that owns Politico and Business Insider, did likewise this month.

In its complaint, The Times says that it attempted to reach a licensing arrangement with Microsoft and OpenAI in April but that talks weren’t ultimately fruitful.

Updated at 4:24 Eastern with additional context and comment from OpenAI.

More TechCrunch

Exoticca’s platform connects flights, hotels, meals, transfers, transportation and more, plus the local companies at the destinations.

Spanish startup Exoticca raises a €60M Series D for its tour packages platform

Content creators are busy people. Most spend more than 20 hours a week creating new content for their respective corners of the web. That doesn’t leave much time for audience…

Mark Zuckerberg imagines content creators making AI clones of themselves

Elon Musk says he will show off Tesla’s purpose-built “robotaxi” prototype during an event October 10, after scrapping a previous plan to reveal it August 8. Musk said Tesla will…

Elon Musk sets new date for Tesla robotaxi reveal, calls everything beyond autonomy ‘noise’

Alphabet will spend an additional $5 billion on its self-driving subsidiary, Waymo, over the next few years, according to Ruth Porat, the company’s chief financial officer. Porat announced the commitment…

Alphabet to invest another $5B into Waymo

There is no fool proof way to prevent a buggy update like CrowdStrike’s, but there are best practices that could mitigate the fallout.

How to prevent your software update from being the next CrowdStrike

Spotify CEO Daniel Ek says the streaming service is still in the “early days” of its plans to bring hi-fi support to the platform. During the company’s earnings call on…

Spotify CEO says company is in ‘early days’ of hi-fi audio plans

Featured Article

A comprehensive list of 2024 tech layoffs

The tech layoff wave is still going strong in 2024. Following significant workforce reductions in 2022 and 2023, this year has already seen 60,000 job cuts across 254 companies, according to independent layoffs tracker Layoffs.fyi. Companies like Tesla, Amazon, Google, TikTok, Snap and Microsoft have conducted sizable layoffs in the…

A comprehensive list of 2024 tech layoffs

Tesla was not the first company to begin working on a humanoid form factor, but while being the first to market does carry weight in this high-tech space, we’re at…

Elon Musk sets 2026 Optimus sale date. Here’s where other humanoid robots stand.

Harvey, a startup building what it describes as an AI-powered “copilot” for lawyers, has raised $100 million in a Series C round led by GV, Google’s corporate venture arm. The…

OpenAI-backed legal tech startup Harvey raises $100M

Digital banking startup Mercury informed some founders that it is no longer serving customers in certain countries, including Ukraine.

Digital banking startup Mercury abruptly shuttered service for startups in Ukraine, Nigeria, other countries

Welcome to TechCrunch Fintech! This week, we’re looking at Human Interest’s path toward an IPO, fintech’s newest unicorn, a slew of new fundraises, and more. To get a roundup of…

The next fintech to go public may not be the one you expected

Waymo has started testing on public roads in San Francisco a new robotaxi built by Chinese electric automaker Zeekr.  Waymo has “less than a handful” of the Zeekr vehicles in San…

The Waymo-Zeekr robotaxi has come to San Francisco

The transaction values Cyabra at $70 million, and the company expects the merger to close by the end of the year.

Cyabra, a startup helping companies and governments detect disinformation, plans to go public via SPAC

Featured Article

There’s a lot more to the Kamala Harris memes than you think

“You think you just fell out of a coconut tree?” says Vice President Kamala Harris in a now infamous clip. An overlay of the lime green album art for Charli XCX’s “Brat” flashes on the screen, while a remix of “Von Dutch” scores increasingly frenetic clips of Harris hysterically laughing…

There’s a lot more to the Kamala Harris memes than you think

GM’s self-driving car subsidiary Cruise is scrapping plans to build the Origin — a purpose-built robotaxi with no steering wheel or pedals — and will instead use the next-generation Chevrolet Bolt…

GM’s Cruise abandons Origin robotaxi, takes $583 million charge

The Federal Trade Commission announced on Tuesday that it’s ordering eight companies that offer AI-powered “surveillance service pricing” to turn over information about the potential impact these products have on…

FTC is investigating how companies are using AI to base pricing on consumer behavior

Meta AI, Meta’s AI-powered assistant across Facebook, Instagram, Messenger and the web, can now speak in more languages and create stylized selfies. And, starting today, Meta AI users can route…

Meta AI gets new ‘Imagine me’ selfie feature

Mesa, Arizona-based Rosotics has kept a low profile. From the startup’s website, one would think they are solely focused on selling large metal 3D printers to aerospace and defense customers.…

Rosotics wants to manufacture massive orbital shipyards using 3D printing

Meta’s latest open source AI model is its biggest yet. Today, Meta said it is releasing Llama 3.1 405B, a model containing 405 billion parameters. Parameters roughly correspond to a…

Meta releases its biggest ‘open’ AI model yet

Hustle culture is embedded into the Silicon Valley startup ethos, but the expectation to grind all the time can be detrimental to a founder’s mental health. We’re pleased to welcome…

Andy Dunn talks the importance of founder mental health at TechCrunch Disrupt 2024

Meta has been given until September 1 to respond to consumer protection concerns in the European Union. The Consumer Protection Cooperation (CPC) Network, a network of authorities responsible for the…

Meta given weeks to tell EU consumer protection authorities how it’ll fix ‘pay or consent’

Google is no longer proposing to deprecate third-party tracking cookies in Chrome, instead suggesting that users be given an option to deny tracking.

Google’s latest Privacy Sandbox gambit could pit user choice against tracking

Let’s start with the premise that many people take notes as they work with customers as part of their jobs. As they take notes, they may need to access a…

Noded AI wants to make your notes the center of your work world

Nathan Rosenberg, the founder of farm automation platform Farmblox, said if there is one thing to know about trying to sell technology to farmers, it’s that you can’t tell them…

Farmblox puts the control into farmers’ hands with its AI-powered sensor-reading platform

Platforms like TikTok and Spotify have experimented with events on their platforms. But rather than concentrating on concerts and large gatherings, event startup Posh is focusing on intimate gatherings of…

Posh raises $22M to become TikTok for small events

Adobe released new Firefly tools for Photoshop and Illustrator on Tuesday, offering graphic designers more ways to use the company’s in-house AI models. Adobe’s new features let creative workers describe…

Adobe releases new Firefly AI tools for Illustrator and Photoshop

Grocery app Flashfood’s new offering is designed for independently owned grocery stores that want to reduce food waste and consumers who want to save money. 

Flashfood users can now save money on groceries at their local grocery store in addition to bigger chains

Quality assurance in the app development world is a necessary, but often resource-draining, undertaking. According to Statista, 23% of companies’ annual IT budgets are allocated to in-house or third-party contracted…

QA Wolf secures $36M to grow its app QA-testing suite

Level AI offers a suite of AI-powered tools to automate various customer service tasks.

Level AI applies algorithms to contact center pain points

In spite of maintaining stealth until now, Mytra has already drummed up interest with big names. The startup has a pilot with grocery giant Albertsons, among others.

Former Tesla humanoid head launches a robotics startup