hachyderm.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
Hachyderm is a safe space, LGBTQIA+ and BLM, primarily comprised of tech industry professionals world wide. Note that many non-user account types have restrictions - please see our About page.

Administered by:

Server stats:

8.9K
active users

Cassandrich

Got some complaints the git server was sometimes giving errors from too much load. Load caused by abusive LLM scrapers hammering cgit. So I added frontend rules that apply a 10 byte/sec rate limit to client IPs hammering >5 requests per 10 sec for the duration of the hammering. Load average has plummeted. 😈

I wasn't actually able to confirm the limits affecting anything I casually tried hitting the server with, but it appears to be working, so 🤷

haproxy recipe in case it's useful to anyone (or if anyone has recommendations to fix/improve it):

stick-table type ip size 10k expire 10s store http_req_rate(10s)

filter bwlim-out mylimit default-limit 1 default-period 1s

http-response set-bandwidth-limit mylimit if { sc_http_req_rate(0) gt 5 }

@lanodan I specifically did super-slow rather than error to keep them bogged down with open connections. 😈

@dalias In my server I block them at the firewall level. Hosts are blocked for 72h and at any given day there are over 500 of them in the list, at least; today there are more than 700.

The way I did it was through a Varnish rule that redirects to a daemon that logs and adds a rule to the firewall (which then prunes after a few hours).

This list is updated hourly: tia.mat.br/blocked-ips.php (FWIW, it's not actually in PHP; it's a hack for Varnish to pick the right rule.)

@dalias Some of the rules also catch things trying to find wordpress or phpmyadmin or other things, and this has severely reduced the load& amount of logs

Another thing I did was adding this fake robots.txt with a honeypot that also blocks an IP address for 3 days if they don't follow the Disallow rule: tia.mat.br/robots.txt

@lafp For now I'm happy with what I did. It got rid of all the excessive load with no risk of blocking any legitimate access.

@dalias Oh, absolutely! Limiting the bandwidth is a pretty good idea, especially since a lot of bots only handle timeouts for connections, not for fetching data.

@dalias 5 requests per 10 seconds seems like something that a human could easily be hit by accident when looking through commit logs.

@alwayscurious Yeah, probably should make it more like 20-30 per 10 sec.