Hachyderm @hachyderm

Recent searches

Search options

Only available when logged in.

**algernon ludd** @algernon@come-from.mad-scientist.club · Jan 19 *

algernon ludd @algernon@come-from.mad-scientist.club

ROFLMAO.

Claude decided to crawl one of the sites on my new server, where known bots are redirected to an iocaine maze. Claude has been in the maze for 13k requests so far, over the course of 30 minutes.

I will need to fine tune the rate limiting, because it didn't hit any rate limits - it scanned using 902 different client IPs. So simply rate limiting by IP doesn't fly. I'll rate limit by (possibly normalized) agent (they all used the same UA).

Over the course of this 30 minutes, it downloaded about ~300 times less data than if I would've let it scrape the real thing, and each request took about the tenth of the time to serve than the real thing would have. So I saved bandwidth, saved processing time, likely saved RAM too, and served garbage to Claude.

Job well done.

Summary card of repository algernon/iocaine

MadHouse Git RepositoriesiocaineThe deadliest poison known to AI.

**InfoSecBen** @Infosecben@ioc.exchange · Jan 19

Jan 19

InfoSecBen @Infosecben@ioc.exchange

@algernon this is awesome and I want to do it myself too. Is there a write-up or blog on how you set it up?

**algernon ludd** @algernon@come-from.mad-scientist.club · Jan 19 *

Jan 19 *

algernon ludd @algernon@come-from.mad-scientist.club

@Infosecben There are some notes in #iocaine's repo, here, and my exact setup is documented here (the server config is also free software).

Hope that helps! But if you have questions, feel free to @ me, I'm more than happy to help you serve garbage to the robots.

MadHouse Git Repositoriesiocaine/docs/deploying.md at mainiocaine - The deadliest poison known to AI

aburka @aburka@hachyderm.io

@algernon @Infosecben I thought I had heard some of the bots are using fake user agents that don't identify them as crawlers at all (so your proxy config there wouldn't catch them), is that true?

Jan 20, 2025, 07:16 AM··Phanpy Dev

0boosts·3favorites

**algernon ludd** @algernon@come-from.mad-scientist.club · Jan 20

Jan 20

algernon ludd @algernon@come-from.mad-scientist.club

@aburka @Infosecben Yep, some of them use fake user agents, and those are not caught in this trap. Yet.

I just configured my reverse proxy to direct /cgi-bin/ to the maze, and I will be adding links to the sites hosted there, so that crawlers will find it. I can then do some digging in the logs and figure out how to handle the misbehavers.

**Ariel ( arc)** @arichtman@eigenmagic.net · Jan 20

Jan 20

Ariel ( arc) @arichtman@eigenmagic.net

@algernon @aburka @Infosecben is there a way to do dynamic config so that any source IP that requests something suspicious gets added to the maze list? I'm thinking there's some well known resources that it makes no sense to see a request for in the course of a normal human visiting the website....

**algernon ludd** @algernon@come-from.mad-scientist.club · Jan 20 *

Jan 20 *

algernon ludd @algernon@come-from.mad-scientist.club

@arichtman @aburka @Infosecben I don't know if it is possible to set that up with Caddy out of the box. If there isn't, I can always write a module.

But first things first: trapping & limiting known baddies is the first step. Leading other baddies into the maze, and limiting within the maze is the next step, and I'll iterate from there, likely by adding IP ranges or new user agents to the known baddies list.

It's a bit manual, but I'm not automating it until it turns out that automating it would save time.

**BenAveling** @BenAveling@mastodon.world · Jan 20

Jan 20

BenAveling @BenAveling@mastodon.world

@arichtman Anything that hits something robots.txt tells it not to hit is a candidate... @algernon @aburka @Infosecben

**algernon ludd** @algernon@come-from.mad-scientist.club · Jan 20

Jan 20

algernon ludd @algernon@come-from.mad-scientist.club

@BenAveling @arichtman @aburka @Infosecben A candidate, yes, but that in itself is far from enough indication. I think a better indicator is how much time it spends in the maze. A human won't spend much time there, and won't crawl links at lightning speed.

**aburka** @aburka · Jan 20

Jan 20

aburka @aburka

@algernon @Infosecben I liked an idea I saw around here of putting a link on the main page saying "if you're human don't click here", put the target URL of the link in robots.txt, and then put iocaine on the other end. That way humans won't click (at least not more than once...), well behaved crawlers will stay out, and the bastards will get caught

**algernon ludd** @algernon@come-from.mad-scientist.club · Jan 20

Jan 20

algernon ludd @algernon@come-from.mad-scientist.club

@aburka @Infosecben yep, that's the plan (in addition to the current setup)!

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back