AI search engine Perplexity is utilizing stealth bots and different techniques to evade web sites’ no-crawl directives, an allegation that if true violates Web norms which were in place for greater than three many years, community safety and optimization service Cloudflare stated Monday.
In a weblog submit, Cloudflare researchers stated the corporate acquired complaints from clients who had disallowed Perplexity scraping bots by implementing settings of their websites’ robots.txt recordsdata and thru Net utility firewalls that blocked the declared Perplexity crawlers. Regardless of these steps, Cloudflare stated, Perplexity continued to entry the websites’ content material.
The researchers stated they then got down to check it for themselves and located that when identified Perplexity crawlers encountered blocks from robots.txt recordsdata or firewall guidelines, Perplexity then searched the websites utilizing a stealth bot that adopted a variety of techniques to masks its exercise.
>10,000 domains and tens of millions of requests
“This undeclared crawler utilized a number of IPs not listed in Perplexity’s official IP vary, and would rotate by these IPs in response to the restrictive robots.txt coverage and block from Cloudflare,” the researchers wrote. “Along with rotating IPs, we noticed requests coming from totally different ASNs in makes an attempt to additional evade web site blocks. This exercise was noticed throughout tens of hundreds of domains and tens of millions of requests per day.”
The researchers offered the next diagram for instance the move of the approach they allege Perplexity used.

If true, the evasion flouts Web norms in place for greater than three many years. In 1994, engineer Martijn Koster proposed the Robots Exclusion Protocol, which offered a machine-readable format for informing crawlers they weren’t permitted on a given web site. Websites that their content material listed put in the straightforward robots.txt file on the high of their homepage. The usual, which has been broadly noticed and endorsed ever since, formally turned a regular below the Web Engineering Job Pressure in 2022.