
Perplexity AI Accused of Scraping Explicitly Blocked Websites
AI-powered search startup Perplexity AI is allegedly bypassing restrictions set by websites to stop AI agents from scraping their content, according to an August 4th report by internet infrastructure provider Cloudflare. The provider has since delisted Perplexity’s crawlers as verified bots.
According to Cloudflare, the company originally became aware of the issue when several customers reported encountering crawling activity by Perplexity’s bots even after explicitly including rules to block the AI company.
Crawlers, as the name suggests, are bots designed to “crawl” websites in search of specific content or information. The use of non-AI crawlers, for example, is fundamental for sites to get indexed by Google or other popular search engines. Sites also have files with explicit instructions regarding what information crawlers may access.
To test these claims, researchers at Cloudflare created a series of new domains, making sure they were in no way publicly accessible or indexed by any search engines. The researchers then included explicit instructions in the websites’ code to block bots from accessing the website in any way. They then asked Perplexity AI to fetch them data from each of the newly created domains.
The report found that Perplexity managed to access key information about the websites regardless, and further analysis suggested that Perplexity engaged in “stealth” practices to bypass the restrictions.
“We observed that Perplexity uses not only their declared user-agent, but also a generic browser intended to impersonate Google Chrome on macOS when their declared crawler was blocked,” reads the report.
As a response, Cloudflare has now implemented rules to block all known Perplexity crawlers at the infrastructure level.
“This controversy reveals that Cloudflare’s systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats. If you can’t tell a helpful digital assistant from a malicious scraper, then you probably shouldn’t be making decisions about what constitutes legitimate web traffic,” reads an in-depth response from Perplexity’s official blog.
This isn’t the first time Perplexity has been accused of unethically circumventing restrictions on scraping. Last year, the startup found itself amidst controversy after news outlets accused it of plagiarising would-be protected content.
Despite the controversy, the AI startup is rapidly gaining popularity from both the public and private investors. Last month, reports surfaced that Apple is considering acquiring Perplexity AI.