Perplexity accused of scraping websites that explicitly blocked AI scraping

August 4, 2025

1 2 minutes read

AI Startup Perplexity crawls and scraps the content of websites that have explicitly stated that they do not want to be scraped, according to Cloudflare of internet infrastructure provider.

On Monday, Cloudflare Published research Say it observed the AI startup, ignored blocks and hides its crawling and scraping activities. The network infrastructure giant accused the astonishment of obscuring his identity in trying to scrape web pages “in an attempt to bypass the preferences of the website”, the researchers of Cloudflare wrote.

AI products such as those offered by Parxity depend on the swallowing of large amounts of data from the internet, and AI startups have long had text, images and videos of the internet many times without permission to make their products work. Recently, websites have tried to fight back using the Web Standard Robots.txt file, that search engines and AI companies tell which pages can be indexed and that should not, efforts which have seen mixed results so far.

Perplexity seems to be willing to circumvent these blocks by changing the “user agent” of the bots, which means a signal that a website visitor identifies based on their device and version type; In addition to changing their autonomous system networks, or ASN, essentially a number that, according to Cloudflare, identifies large networks on the internet.

“This activity was observed over tens of thousands of domains and millions of requests a day. We could do this Crawler fingerprints using a combination of machine learning and network signals,” read CloudFlare’s message.

Spokesperson Jesse Dwyer van Pertlexity rejected the blog post of Cloudflare as a ‘sales pitch’, in which he added an e -mail to WAN that the screenshots in the post ‘show that there was no content accessible’. In a follow-up e-mail, DWyer claimed that the bone in the Cloudflare blog ‘is not even ours’.

CloudFlare said it was the first time to notice the behavior after his customers complained that confusion was crawling and their sites scraped, even after they had added rules on their robots file and for specific blocking of the well -known bots of Perplexity. Cloudflare said it then carried out tests to check and confirmed that perplexity bypassed these blocks.

WAN event

San Francisco
|
27-29 October 2025

“We have noticed that Pertlexity not only uses their explained user agent, but also a generic browser that was intended to stimulate Google Chrome on macOS when their explained Crawler was blocked,” said Cloudflare.

The company also said that it has removed the Bots of Pertlexity from his verified list and added new techniques to block them.

Cloudflare recently took a public attitude against AI Crawlers. Last month, Cloudflare announced the launch of a marketplace with which website -owners and publishers AI scrapers can charge them who visit their sites. Cloudflare’s Chief Executive Matthew Prince the alarm sounded At the time, AI breaks the business model of the internet, in particular publishers. Last year, Cloudflare also launched a free tool to prevent bots from scraping websites to train AI.

This is not the first time that perplexity is accused of scraping without permission.

Last year, news shops, as wiredAlleged confusion plagiarism their content. Weeks later, Aravind Srinivas van Pertlexity could not immediately answer when he was asked to provide the definition of the Plagiarism company during an interview with Devin Coldwey of WAN on the Disrupt 2024 conference.

Source link

Perplexity accused of scraping websites that explicitly blocked AI scraping

Pentagon labels AI company Anthropic a supply chain risk ‘effective immediately’ : NPR

Anthropic to challenge DOD’s supply-chain label in court

Radar reveals everything you need to know about the Nancy Guthrie case

Meghan Markle and Prince Harry still get Netflix Cash, but Charity Project in trouble

ICE-T increases the consciousness of celebrity Fentanyl deaths in A&E Special

Related Articles

From human clicks to machine intent: Preparing the web for agentic AI

Elon Musk says xAI has open sourced Grok 2.5

Authors call on publishers to limit their use of AI

Tomorrow: Join Ali Ghodsi and Dario Amodei for a fireside chat

Pentagon labels AI company Anthropic a supply chain risk ‘effective immediately’ : NPR

Anthropic to challenge DOD’s supply-chain label in court

Radar reveals everything you need to know about the Nancy Guthrie case