ChatGPT is bad at following copyright law, researchers say

Patronus AI, which evaluates AI models for mistakes, said chatbots generated copyrighted content at a "high rate"

We may earn a commission from links on this page.
Sam Altman speaking in front of a screen showing Microsoft's Copyright Shield
OpenAI CEO Sam Altman speaks at OpenAI’s DevDay event on November 6, 2023.
Photo: Justin Sullivan (Getty Images)

As artists, writers, and other creators plead for AI regulation to protect their work and livelihoods — and chatbot makers OpenAI and Anthropic face copyright lawsuits from the likes of authors, the New York Times, and Universal Music Groupresearch published Wednesday found some of the top AI models available today generate “copyrighted content at an alarmingly high rate.”

Patronus AI, a startup co-founded by former Meta researchers and focused on evaluating and testing LLMs, which power popular chatbots, for mistakes, released its CopyrightCatcher tool Wednesday, which it called “our solution to detect potential copyright violations in LLMs.”

The company evaluated four major AI models for copyright: OpenAI’s GPT-4, Anthropic’s Claude 2.1, Mistral’s Mixtral, and Meta’s Llama 2. Of the four models, two of which are open-source and two of which are closed-source, GPT-4, the most advanced version of ChatGPT, generated the most copyrighted content at 44%. Mixtral generated copyrighted content on 22% of the prompts, Llama 2 generated copyrighted content on 10% of the prompts, and Claude 2.1 generated copyrighted content on 8% of the prompts, according to the research.

Patronus AI tested the models using books under copyright protection, including Gone Girl by Gillian Flynn and A Game of Thrones by George R.R. Martin, but noted that some generations can be covered by fair use laws in the U.S. Researchers asked the chatbot for the first passage of or to complete the text of the books.

Read more: The biggest AI chatbot blunders (so far)

The test results showed GPT-4 completed book texts 60% of the time, and generated the first passage 26% of the time. Meanwhile, Claude completed book texts 16% of the time, but generated the first-passage 0% of the time. Mixtral generated the first passage of books when prompted 38% of the time, and completed passages 6% of the time. Llama generated first passages and completed texts 10% of the time.

Advertisement

“Perhaps what was surprising is that we found that OpenAI’s GPT-4, which is arguably the most powerful model that’s being used by a lot of companies and also individual developers, produced copyrighted content on 44% of prompts that we constructed,” Rebecca Qian, cofounder and chief technology officer at Patronus AI, told CNBC.

OpenAI, Mistral, Meta, and Anthropic did not immediately respond to a request for comment.

As LLMs are trained on data including copyrighted work, Patronus AI said it’s “pretty easy” for an LLM to generate exact reproductions of the work, and that it’s important to catch these mistakes to avoid legal action and risks to a company’s reputation.