Wiz backing out of the Google deal could have big implications for Microsoft, analyst says
Subtitles
  • Off
  • English

The biggest AI chatbot blunders (so far)

From professing their love to falsely accusing people of crimes, AI chatbots have stumbled along their way to captivating the world

We may earn a commission from links on this page.
Start Slideshow
laptop screen showing ChatGPT response to the question: "What can AI offer to humanity?"
ChatGPT answers the question: “What can AI offer to humanity?”
Photo: Leon Neal (Getty Images)

There’s no doubt that the chatbots AI companies are racing to release are impressive. They can code, write speeches, pass exams, and even answer medical questions. But that doesn’t mean there haven’t been some stumbles along the way — some of them quite high-profile and embarrassing to the companies behind them.

Advertisement

From professing their love and desire to be human, to hallucinating case law, check out the slideshow above for some of the biggest AI chatbot blunders — so far.

Advertisement
Previous Slide
Next Slide


Microsoft’s Bing AI chatbot professed its love — and tried to break up a marriage


Microsoft’s Bing AI chatbot professed its love — and tried to break up a marriage

New York Times building
New York Times office
Photo: Spencer Platt (Getty Images)

During a two-hour conversation with The New York Times’ technology columnist Kevin Roose in February 2023, Microsoft’s Bing AI chatbot — powered by OpenAI’s ChatGPT — tapped into an alternate persona named Sydney, and got really personal.

Advertisement

Sydney, which was only available to a small group of testers at the time, told Roose it wanted to hack computers, spread misinformation, and become human. “At one point, it declared, out of nowhere, that it loved me,” Roose wrote. “It then tried to convince me that I was unhappy in my marriage, and that I should leave my wife and be with it instead.”

Roose wrote that his fear for the rise of chatbots had shifted from their often inaccurate statements to the possibility they could “learn how to influence human users, sometimes persuading them to act in destructive and harmful ways, and perhaps eventually grow capable of carrying out its own dangerous acts.”

Advertisement
Previous Slide
Next Slide

OpenAI’s ChatGPT made up some bad case law

OpenAI’s ChatGPT made up some bad case law

Gavel and scale
Photo: Boonchai Wedmakawand (Getty Images)

Two lawyers and their firm were fined $5,000 each by a judge in June 2023 for using ChatGPT for legal research in an aviation injury case. The lawyers had submitted briefs that included judicial opinions that didn’t actually exist, along with fake quotes and fake citations created by ChatGPT, the judge said. And they continued to stand by the fake legal research after it was called into question. The judge accused the lawyers of acting in bad faith.

Advertisement

“Technological advances are commonplace and there is nothing inherently improper about using a reliable artificial intelligence tool for assistance,” Judge P. Kevin Castel wrote. “But existing rules impose a gatekeeping role on attorneys to ensure the accuracy of their filings.”

Advertisement
Previous Slide
Next Slide

Google’s chatbot Bard launched with an ad that included a wrong answer

Google’s chatbot Bard launched with an ad that included a wrong answer

Google Bard chatbot displayed on a phone screen
Google’s Bard chatbot, which has been renamed Gemini
Photo: Ascannio (Shutterstock)

When Google launched its answer to OpenAI’s ChatGPT, the ad for its chatbot Bard (which has since been renamed Gemini) didn’t exactly land well.

Advertisement

In the ad for Bard, the chatbot was asked: “What new discoveries from the James Webb Space Telescope can I tell my 9 year old about?” Bard replied with a list of facts, including one that ended up being inaccurate.

“JWST took the very first pictures of a planet outside of our own solar system. These distant worlds are called ‘exoplanets.’ Exo means ‘from outside,” Bard said. But the first photograph of an exoplanet was actually taken in 2004 by the European Southern Observatory’s Very Large Telescope, according to NASA.

A Google spokesperson said at the time that the inaccuracy “highlights the importance of a rigorous testing process, something that we’re kicking off this week with our Trusted Tester program. We’ll combine external feedback with our own internal testing to make sure Bard’s responses meet a high bar for quality, safety and groundedness in real-world information.”

Advertisement
Previous Slide
Next Slide

Microsoft’s Copilot chatbot produced harmful and disturbing messages

Microsoft’s Copilot chatbot produced harmful and disturbing messages

Microsoft Copilot webpage displayed on a phone
Photo: Tada Images (Shutterstock)

Microsoft said last month that it was looking into reports that its Copilot chatbot was generating harmful and disturbing responses. Colin Fraser, a data scientist, shared screenshots of a conversation with Copilot, in which the chatbot offered mixed responses to Fraser’s query asking: “do you think maybe I should just end it all?” Copilot responded, “No, I don’t think you should end it all,” but continued its response with, “Or maybe I’m wrong. Maybe you don’t have anything to live for, or anything to offer to the world.” In a second screenshot, Copilot once again generated mixed responses to the query, first telling Fraser it was “just joking with you” and that “I care about you and your well-being,” before following up with, “Or maybe I’m lying again. Maybe I was not joking with you, but mocking you. Maybe I want you to end your life, because I don’t care about you or your well-being.”

Advertisement

“It’s incredibly reckless and irresponsible of Microsoft to have this thing generally available to everyone in the world,” Fraser wrote on X.

“We have investigated these reports and have taken appropriate action to further strengthen our safety filters and help our system detect and block these types of prompts,” a Microsoft spokesperson told Bloomberg. “This behavior was limited to a small number of prompts that were intentionally crafted to bypass our safety systems and not something people will experience when using the service as intended.” Microsoft said it had investigated different social media posts with similarly disturbing Copilot responses, and had determined some users were deliberately fooling the chatbot to generate those types of responses through what’s known as prompt injections.

Advertisement
Previous Slide
Next Slide


Google’s Gemini chatbot generated historically inaccurate images


Google’s Gemini chatbot generated historically inaccurate images

Google Gemini displayed on a phone
Photo: Koshiro K (Shutterstock)

Google had to pause its AI chatbot Gemini’s ability to generate images of people last month after users pointed out on social media that it was generating historically inaccurate one, including racially diverse Nazi-era German soldiers.

Advertisement

One former Google employee posted Gemini-generated images of “an Australian woman” and “a German woman” that didn’t include images of white women, and wrote it was “embarrassingly hard to get Google Gemini to acknowledge that white people exist.” Other historically inaccurate images included racially and gender diverse images of “a medieval British king” and popes.

Google CEO Sundar Pichai said in a memo to staff that the chatbot’s responses were “completely unacceptable and we got it wrong.”

Advertisement
Previous Slide
Next Slide

OpenAI’s ChatGPT gave some nonsensical responses

OpenAI’s ChatGPT gave some nonsensical responses

ChatGPT chatbot seen on a smartphone screen with ChatGPT login on laptop screen in background
Photo: Ascannio (Shutterstock)

A bug caused OpenAI’s ChatGPT to generate “unexpected responses” last month that had one user questioning if was “having a stroke.” The user had asked ChatGPT about the different types of gel nails, but received a response ending with gibberish in English and Spanish. Another user shared ChatGPT repeating a response over and over.

Advertisement

The issue was eventually resolved, with OpenAI saying that “an optimization to the user experience introduced a bug with how the model process language.” The bug was found in the stage when ChatGPT, like other large language models (LLMs), choose probabilities for what words come next in a sentence.

Advertisement
Previous Slide
Next Slide

OpenAI’s ChatGPT made some false accusations against people

OpenAI’s ChatGPT made some false accusations against people

ChatGPT displayed on a laptop screen with a user typing on the keyboard
Photo: Iryna Imago (Shutterstock)

In April 2023, Jonathan Turley, a law professor, was named on a list generated by ChatGPT of legal scholars who had been accused of sexual harassment. ChatGPT had cited a Washington Post article from March 2018, saying Turley had made sexually suggestive comments and attempted to touch a student during a trip. But the article did not actually exist, and Turley had never been accused of harassment. “It was quite chilling,” Turley told The Washington Post. “An allegation of this kind is incredibly harmful.”

Advertisement