Cloudflare will automatically block mixed web crawlers that collect data for AI companies

Cloudflare will automatically block mixed web crawlers that collect data for AI companies

Cloudflare will change its policy for web crawlers?Web crawlers — specialized automated programs that systematically scan the web, follow hyperlinks and collect information., which both index sites for search engines and are used to train or operate artificial intelligence models.

This is reported by Engadget.

The hosting platform will block such web crawlers starting in September this year. The company wants to give site owners back control over the use of their content.

What is known about the new rules 

Previously Cloudflare already allowed site owners, if they wished, to prohibit data collection for AI chatbots.

Now that approach will become the default for new customers and new sites of existing users.

Beginning September 15, 2026, new Cloudflare customers and new sites of existing users will automatically receive settings that allow indexing for search engines but block the use of pages with advertising for training AI models and operating agents.

Mixed web crawlers that do not provide site owners with the ability to separately control the use of content for artificial intelligence will also be blocked.

Free-tier users will also be switched to these settings unless they opt out before the new policy takes effect.

Cloudflare co-founder and CEO Mettyu Prins said that because most internet traffic is no longer human, the company must move faster to create a sustainable ecosystem. The new tools are intended to encourage operators of mixed bots to clearly separate traditional search functions from model training.

Launch of Pay Per Use monetization 

As part of the update, Cloudflare is also relaunching the Pay Per Crawl feature introduced in 2025 (which allowed AI bots to be unblocked only if a scraping fee was paid).

The updated tool is called Pay Per Use. Now payments to site owners will be made not for the mere fact of a page being crawled, but when their content appears in AI chatbot responses. 

As TechCrunch notes, Cloudflare’s new policy is also indirectly aimed at Google, since that company has access to roughly twice the amount of information as other AI firms.

The company bot — Googlebot — both indexes pages for search results and collects data to train Gemini and power other AI features.

Although Google allows sites to disable the separate Google-Extended crawler (so data won’t be used for model training), publishers cannot consent to appearing in AI Mode results without their content being used to train AI.

Cloudflare’s new policy is intended to force companies to change that practice.

Remember that Cloudflare is cutting more than 1,100 employees worldwide, which is one fifth of its staff. That decision is related to the company’s shift to an AI-first operating model.

Read also: Used material without permission to train AI. Nearly 400 U.S. newspapers sued OpenAI and Microsoft

Powered by WPeMatico

https://en.ain.ua/2026/07/03/cloudflare-will-automatically-block-mixed-web-crawlers-that-collect-data-for-ai-companies/