Open source devs are fighting AI crawlers with cleverness and vengeance

March 28, 2025

0 4 minutes read

AI web-crawling bots are the cockroaches of the internet, many software developers believe. Some developers started fighting back on ingenious, often humorous ways.

Although every website can be the target due to poor Crawler – behavior – sometimes taking the site – open source developers are “disproportionately” influenced, writes Niccolò Venerandi, developer of a Linux desktop known as plasma and owner of the blog Libreenews.

By their nature, sites that organize for free and open source (Foss) projects, more of their infrastructure publicly, and they also usually have fewer resources than commercial products.

The problem is that many AI bots do not honor the robots Exclusion Protocol.txt file, the tool that Bots tells what not to crawl, originally made for search engine bots.

In a “call for help” Blog post In January, Foss developer XE IarSo described how Amazonbot ruthless on a git -series site to the point of causing DDOS out loss. Git -Servers host foss projects so that everyone who wants to download the code can contribute or contribute to it.

But this bone ignored Iarso’s robot.txt, hid behind other IP addresses and occurred as other users, IarSo said.

“It is useless to block AI Crawler -Bots because they lie, change their user agent, use residential IP addresses such as proxies, and more,” complained IARSO.

“They will scrape your site until it falls, and then they will scrape it a bit more. They will click on each link on each link on each link and view the same pages time and time again. Some of them will click on the same link several times even in the same second,” wrote the developer in the post.

Enter the God of Graves

So Iararo fought back with cleverness and built a tool called Anubis.

Anubis is A reverse proxy proof-of-work check That must be assumed before requests may hit a git server. It blocks bots but late by browsers managed by people.

The funny part: Anubis is the name of a God in Egyptian mythology that leads the dead in judgment.

“Anubis weighed your soul (heart) and if it was heavier than a feather, your heart was eaten and you died, like, mega,” Iarso told WAN. If a web request adopts the challenge and is determined to be human, A cute anime photo announces success. The drawing is “my view of anthropomorphizing Anubis,” says Iarso. If it is a bone, the request will be refused.

The wang mentioned project has spread as the wind under the Foss community. Ieso shared it on Github On March 19, and in just a few days, it collected 2,000 stars, 20 contributors and 39 forks.

Revenge as a defense

The immediate popularity of Anubis shows that the pain of Iarso is not unique. In fact, Venerandi shared the story after the story:

Founder CEO of Bronhut Drew Devault described Spend “from 20-100% of my time in a certain week by reducing hyper-aggressive LLM crawlers on scale” and “experienced dozens of short malfunctions per week.”
Jonathan Corbet, a famous Foss developer who runs Linux Industry -news site LWN, warned his site Delayed by traffic at DDOS level “From ai scraper bots.”
Kevin Fenzi, the sysadmin of the huge Linux Fedora project, said the AI scraper bots Had become so aggressive, he had to block the entire country of Brazil against access.

Venerandi tells WAN that he knows that several other projects are experiencing the same problems. One of them “had to temporarily forbid all Chinese IP addresses at some point.”

Let that sink in -that developers “even have to prohibit entire countries” to ward off AI bots that ignore Robot.txt files, says Venerandi.

Apart from weighing the soul of a web application, other developers believe that revenge is the best defense.

A few days ago Hacker Newsuser Xyzal presented robot loading. TXT prohibited pages with “a bucket of loading articles about the benefits of drinking bleach” or “articles about a positive effect of catching measles on the performance in bed.”

“I think we should strive for the bots to get _negative_ utility value by visiting our traps, not just zero value,” Xyzal explained.

In January, an anonymous maker announced as a “Aaron” a tool with the name Fagons That is meant to do exactly that. It runs into an endless maze of fake content, a goal that the Dev has admitted Ars Technica Is aggressive, if not downright malignant. The tool is named after a carnivorous plant.

And Cloudflare, perhaps the largest commercial player that offers different tools to ward off AI crawlers, released a similar tool called AI Labyrinth last week.

It is intended to “delay, confuse and waste the resources of AI crawlers and other bots that do not respect ‘no crawl’ guidelines,” CloudFlare described In his blog post. CloudFlare said that misbehaving with AI -Crawlers “irrelevant content instead of extracting your legitimate website data.”

The Devault of Sourcehut told WAN that “Nepenthes has a satisfactory sense of justice, because it nonsense feeds the crawlers and poisons, but in the end Anubis is the solution that worked” for his site.

But Devault also gave a public, sincere plea for a more direct solution: “Please stop legitimizing LLMS or AI image generators or github Copilot or one of this waste. I beg you to stop stopping, stop talking about them, stop making new, just stop.”

Since the chance is Zilch, developers, especially in Foss, are fighting back with cleverness and a touch of humor.

Source link