Scraping: Cloudflare traps AI in an infinite maze

Scraping: Cloudflare traps AI in an infinite maze
Cloudflare is diverting AI bots with a maze of AI-generated content.

Cloudflare unveils AI Labyrinth, a new tool designed to combat AI-powered data scraping. Here’s how it works.

In a statement published on its blog, Cloudflare unveiled its new tool to combat automated data collection by AI. It’s called AI Labyrinth and addresses the problem of “the explosion of new crawlers used by AI companies to retrieve data for model training,” the web infrastructure provider said. The new feature is already available as an opt-in, including for free plans.

AI Labyrinth: Cloudflare Takes AI at Its Own Game

Cloudflare started from the following observation: existing tools for blocking malicious bots can have a counterproductive effect. Indeed, by blocking crawlers, sites run the risk of alerting attackers, “which leads to a change of approach and an endless arms race ,” Cloudflare emphasizes.

So the company took a different approach, turning AI’s weapons against itself. AI Labyrinth “uses AI-generated content to slow down, confuse, and waste the resources of AI crawlers and other bots that don’t follow no-crawl guidelines,” the blog post states. As a result, crawlers aren’t blocked: they’re redirected to a panoply of AI-generated pages that are “convincing enough to entice a crawler to crawl them .” This distracts crawlers from the site’s actual content and wastes their resources.

A safely constructed maze

To trap AI crawlers, Cloudflare relies on Workers AI to automatically generate content that mimics human style, designed to be varied and believable. To ensure the system’s effectiveness while avoiding unwanted side effects, several precautions have been taken:

  • Content pre-generation: content is generated upstream (not on the fly), so as not to impact website performance.
  • Content sanitization: a cleaning step is applied to avoid any XSS (Cross-Site Scripting) type flaws.
  • Fact-based content: Although not relevant to the site, the content is based on verified scientific information, in order to avoid the spread of misinformation.
  • Invisible embedding: Links to these pages are hidden in the HTML code, without affecting the display of the site for human visitors.
  • SEO Protection: Each generated page contains tags that prevent it from being indexed by search engines.

How to enable AI Labyrinth on your site

Enabling the AI ​​Labyrinth feature can be done in seconds. To do this, go to your Cloudflare dashboard, then, in the Security section , open Settings and enable the AI ​​Labyrinth box . You can also do this from the Bots section , combining this tool with the AI ​​bot blocking tool.

Cloudfare-Bots-AI
You can pair AI Labyrinth with the bot-blocking option. © Cloudflare
Share this article
2
Share
Shareable URL
Prev Post

Microsoft 365 Copilot launches two professional agents based on OpenAI technology

Next Post

Facebook launches a friends-only feed: a return to its roots?

Leave a Reply

Your email address will not be published. Required fields are marked *

Read next