AI

Cloudflare launches a tool to combat AI bots

Comment

grey robot head on red background
Image Credits: Getty Images

Cloudflare, the publicly traded cloud service provider, has launched a new, free tool to prevent bots from scraping websites hosted on its platform for data to train AI models.

Some AI vendors, including Google, OpenAI and Apple, allow website owners to block the bots they use for data scraping and model training by amending their site’s robots.txt, the text file that tells bots which pages they can access on a website. But, as Cloudflare points out in a post announcing its bot-combating tool, not all AI scrapers respect this.

“Customers don’t want AI bots visiting their websites, and especially those that do so dishonestly,” the company writes on its official blog. “We fear that some AI companies intent on circumventing rules to access content will persistently adapt to evade bot detection.”

So, in an attempt to address the problem, Cloudflare analyzed AI bot and crawler traffic to fine-tune automatic bot detection models. The models consider, among other factors, whether an AI bot might be trying to evade detection by mimicking the appearance and behavior of someone using a web browser.

“When bad actors attempt to crawl websites at scale, they generally use tools and frameworks that we are able to fingerprint,” Cloudflare writes. “Based on these signals, our models [are] able to appropriately flag traffic from evasive AI bots as bots.”

Cloudflare has set up a form for hosts to report suspected AI bots and crawlers and says that it’ll continue to manually blacklist AI bots over time.

The problem of AI bots has come into sharp relief as the generative AI boom fuels the demand for model training data.

Many sites, wary of AI vendors training models on their content without alerting or compensating them, have opted to block AI scrapers and crawlers. Around 26% of the top 1,000 sites on the web have blocked OpenAI’s bot, according to one study; another found that more than 600 news publishers had blocked the bot.

Blocking isn’t a surefire protection, however. As alluded to earlier, some vendors appear to be ignoring standard bot exclusion rules to gain a competitive advantage in the AI race. AI search engine Perplexity was recently accused of impersonating legitimate visitors to scrape content from websites, and OpenAI and Anthropic are said to have at times ignored robots.txt rules.

In a letter to publishers last month, content licensing startup TollBit said that, in fact, it sees “many AI agents” ignoring the robots.txt standard.

Tools like Cloudflare’s could help — but only if they prove to be accurate in detecting clandestine AI bots. And they won’t solve the more intractable problem of publishers risking sacrificing referral traffic from AI tools like Google’s AI Overviews, which exclude sites from inclusion if they block specific AI crawlers.

More TechCrunch

Fast food chain Whataburger’s app has gone viral in the wake of Hurricane Beryl, which left around 1.8 million utility customers in Houston, Texas without power. Hundreds of thousands of…

Whataburger app becomes unlikely power outage map after Houston hurricane

Bumble’s new reporting option arrives at a time when, unfortunately, AI-generated photos on dating apps are common

Bumble users can now report profiles that use AI-generated photos

The concept of Airchat is fun, especially if you’re someone who loves to send voice memos instead of typing out long paragraphs on your phone keyboard.

Talky social app Airchat gets a major overhaul, making it more like an asynchronous Clubhouse

Featured Article

The fall of EV startup Fisker: A comprehensive timeline

Here is a timeline of the events that led fledgling automaker Fisker to file for bankruptcy.

3 hours ago
The fall of EV startup Fisker: A comprehensive timeline

Ahead of these potential competitors comes Openvibe, a simple aggregator for the open social web.

Openvibe combines Mastodon, Bluesky and Nostr into one social app

Welcome to TechCrunch Fintech! Last week was a holiday in the United States, so news was a bit lighter than normal. But there was still fintech-related items to report, including…

Should venture capitalists be held accountable when startups screw up?

Fisker Inc. co-founders Henrik Fisker and his wife, Geeta Gupta-Fisker, are lowering their salaries to $1 in order to keep their failed EV startup’s bankruptcy proceedings funded, as lawyers work…

Henrik Fisker drops salary to $1 to keep Fisker Inc. bankruptcy case alive

After announcing a whopping $20 million seed last year, Unlikely AI founder William Tunstall-Pedoe has kept the budding U.K. foundation model maker’s approach under lock and key. Until now: TechCrunch…

Alexa co-creator gives first glimpse of Unlikely AI’s tech strategy

We’re excited to invite Jesse Pollak to TechCrunch Disrupt 2024 to talk about the future of decentralization.

Jesse Pollak will tell us why Coinbase is launching its own Base blockchain at TechCrunch Disrupt 2024

Infactory is a kind of fact-checking search engine that will be focused exclusively on data at launch.

Humane execs leave company to found AI fact-checking startup

In a first, the Federal Trade Commission is banning an app from serving users under the age of 18. The agency announced on Tuesday that it’s banning NGL, an anonymous…

FTC bans NGL from offering its anonymous social app to minors

When people start navigation on Google Maps, the vehicle’s speed is shown in miles or kilometers, depending on the region.

Google Maps is rolling out speedometer, speed limits on iPhone and CarPlay globally

Design and animation are core to the Duolingo experience, which makes learning a new language or skill more like a game rather than a task to be dreaded.

Duolingo acquires Detroit-based design studio Hobbes

Two of my friends died within the last three years. By some coincidence, both of their birthdays fall in the beginning of July. So, twice this week, Facebook has reminded…

Facebook keeps asking me to say ‘happy birthday’ to dead people

Running a small business means doing more with less. AI agents can help, but building custom agents for specific workflows remains challenging, even with today’s low-code/no-code tools. The idea behind…

With $6M in seed funding, Enso plans to bring AI agents to SMBs

The feature puts Spotify in more direct competition with YouTube as a place where creators can interact with their listeners.

Chasing YouTube, Spotify adds comments to podcasts

A new iOS app called Wayther wants to help you better plan your road trips by giving you real-time road conditions and weather forecasts along your route. Created by indie…

Meet Wayther, an iOS weather forecast app designed specifically for road trips

Evolve has confirmed that the personal data of at least 7.6 million people was accessed during LockBit’s ransomware attack.

Evolve Bank says ransomware gang stole personal data on millions of customers

Etsy has been grappling with an influx of generic “junk” and AI-generated products on its platform. The service revised its seller policy on Tuesday, introducing new labels that clarify whether…

Etsy adds AI-generated item guidelines in new seller policy 

Seae Ventures is acquiring Unseen Capital after the death of founder Kayode Owens in 2021. The combined firm will continue to invest in healthcare for minorities and underserved populations. Owens,…

Seae Ventures acquires Unseen Capital after founder death

Apple released the third developer beta version of iOS 18 on Monday. While there are no major new features like Apple Intelligence in this update, there are some neat design…

With the latest iOS 18 developer beta, Apple makes flashlight UI more fun

A startup called DreamFlare AI is emerging from stealth on Tuesday with the goal of helping content creators make and monetize short-form AI-generated content. The company, co-founded by former Google…

Ex-Googler joins filmmaker to launch DreamFlare, a studio for AI-generated video

Nala, a remittance startup that is now widening its portfolio through a new B2B payments platform, has raised $40 million equity in a rare deal that becomes one of the largest…

Nala to use $40M Series A to build B2B payments platform, scale remittance services

Solo founder Cat Jones took the plunge on setting up a travel business right around the time the pandemic was hitting Europe in March 2020. Fast-forward to summer 2024 and…

Byway is using AI to help travelers slow down and take the scenic route

An adtech business owned by Microsoft is the target of a complaint backed by European privacy advocacy group, noyb — a nonprofit that punches far above its weight when it…

Microsoft-owned adtech Xandr accused of EU privacy breaches

Quora says that Previews works best with chatbots that “excel” at programming, like Claude 3.5 Sonnet, GPT-4o and Google’s Gemini 1.5 Pro.

Quora’s Poe now lets users create and share web apps

For over a decade, real-money gaming companies and fantasy sports startups have marketed themselves as video game companies. But as these businesses face increasing regulatory scrutiny, a coalition of more…

Indian game firms want to distance themselves from fantasy sports

Huffington Post founder Arianna Huffington and OpenAI CEO Sam Altman are throwing their weight behind a new venture, Thrive AI Health, that aims to build AI-powered assistant tech to promote…

OpenAI Startup Fund backs AI healthcare venture with Arianna Huffington

The essential labor of data work, like moderation and annotation, is systematically hidden from those who benefit from the fruits of that labor. A new project puts the lived experiences…

Data workers detail exploitation by tech industry in DAIR report

Hello and welcome back to TechCrunch Space. I hope everyone had a great Independence Day. On to the news!

TechCrunch Space: SpaceX’s big plans for Starship in Florida