hachyderm.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
Hachyderm is a safe space, LGBTQIA+ and BLM, primarily comprised of tech industry professionals world wide. Note that many non-user account types have restrictions - please see our About page.

Administered by:

Server stats:

9.4K
active users

Over the years, I made a handful of maps of various things in Cambridge; I have collected some, but not all of them, on this page about housing things in Cambridge.

This includes things like maps of where you could legally build a fourplex (short answer: not many places!); the distribution of tax paid per parcel (Kendall Square pays a lot!) and more.

crschmidt.net/housing/cambridg

crschmidt.netHousing Explorations in CambridgeHousing-related explorations in Cambridge.

Fun fact: sharing this link on Mastodon caused my server to serve 112,772,802 bytes of data, in 430 requests, over the 60 seconds after I posted it (>7 r/s). Not because humans wanted them, but because of the LinkFetchWorker, which kicks off 1-60 seconds after Mastodon indexes a post (and possibly before it's ever seen by a human).

Every Mastodon instance fetches and stores their own local copy of my 750kb preview image.

(I was inspired by to look by @jwz's post: mastodon.social/@jwz/109411593.)

Mastodonjwz (@jwz@mastodon.social)Mastodon stampede. "Federation" now apparently means "DDoS yourself." Every time I do a new blog post, within a second I have over a thousand simultaneous hits of that URL on my web server from unique IPs. Load goes over 100, and mariadb stops... https://jwz.org/b/yj6w

@crschmidt well, this sounds like a p0 bug. Mastodon is going into robots.txt on many servers once this gets noticed widely.

@cshabsin Don't worry! I just confirmed that Mastodon doesn't respect robots.txt for any of these fetches, so even if it's added to robots.txt, it will have no effect!

@crschmidt @cshabsin that definitely seems ... inappropriate.

@tw @crschmidt @cshabsin link preview bots all ignore robots.txt, so mastodon is at least following precedent here.

Except that I think Mastodon's implementation is wrong: on a centralized network the preview is created at the 'request' of the person sharing, so robots.txt doesn't apply. But here it's created fully automatically, so it really should apply. The fix would be to capture the site at sharing time and send it along in the post, which is also more efficient (though prone to abuse?)

@jefftk @tw @cshabsin yeah, the prone to abuse and "hard to standardize across all implementations" are the reasons it was rejected in 2017, and has languished as an untouched feature request since 2020 (respectively). Time to rethink that. (I don't love that a single implementation is 95% of the fediverse, but it is; standardization is frankly secondary to making sure the core implementation works well.

@gme @crschmidt @cshabsin @jefftk That's a pretty dismissive take on software violating an agreed-upon Internet standard...

I read the blog post and at the very top OP even admits that Mastodon is not a crawler. So what "standard" is being broken?

@gme @crschmidt @tw @cshabsin where do you see that in the blog post? I agree that scraping a preview isn't crawling if you do it at send time, but doing it at automatically at retrieve time is

@gme @crschmidt @tw @cshabsin in the portion of the post that you've screenshotted I'm only talking about fetching at posting time

You also wrongly assume that the fediverse is only made up of Mastodon servers. When I receive a post on Pleroma my Pleroma instance also fetches the URL to generate a preview.

Again, the technical solution to this technical problem is for a site to be behind a CDN. If a site is getting hammered where it can't handle the legitimate traffic to it then it should be placed behind a CDN.

@gme @crschmidt @tw @cshabsin saying "Mastodon is doing the wrong thing here" doesn't mean Pleroma isn't also doing the wrong thing!

Scott Hanselman 👸🏽🐝🌮

@gme @crschmidt @tw @cshabsin @jefftk I will likely regret wading in here but this is a rather bizarre thread. Spec or not, it’s poor design. It’s irresponsible. It’s messy. It’s resource intensive. To say “put a CDN on it” doesn’t change the waste - it hides the bug/implementation and shifts the responsibility. @jwz and friends are correct to bring attention to it. Don’t bury it. Fix it. @Gargron

@shanselman I will also likely regret wading in on this, but:

1: Using robots.txt requires also fetching robots.txt first, so that only marginally reduces the total number of requests.

2: Caching and CDNs are already a well-established pattern on the web, and necessary for a lot of things that have nothing to do with Mastodon/fediverse. Solving that problem other ways will likely create new problems, so at least that falls back on existing solutions that are known to work.
but also, I'm reading back up this thread... come on everyone, be nice! We are all on the same side!

@shanselman @gme @crschmidt @tw @cshabsin @jefftk @jwz @Gargron Maybe it’s because I’m coming at this from the view of a native app developer but “just put it behind a CDN” in response to inefficiencies in the fediverse feels like an app developer telling users “just charge your phone more often” in response to their app eating all the battery. It’s up to devs to be a good platform citizen

And I was approaching this from both a sysadmin and infosec perspective.

Anyone that deploys an application today that is not behind a WAF is foolish.

Cloudflare offers both. A WAF and CDN.

Double-win in my book.

@gme @crschmidt @tw @cshabsin @jefftk @Gargron @jwz @pilky both things can be true. Yes a WAF/CDN is a must have in 2022. But mastodon still behaves poorly. So we likely agree.

@shanselman @gme @cshabsin Yeah, to use my battery example again it’s the difference between an app draining your battery because it’s doing something genuinely processor intensive vs draining your battery because the devs need to optimise better. This feels a lot more like the latter case.

A CDN is helpful for dealing with high traffic, but it’s not an excuse for unnecessarily causing high traffic

@shanselman @gme @crschmidt @tw @cshabsin @jefftk @jwz @Gargron

Does anyone know whether Mastodon also fetches previews for links pointing to individual toots / posts? Can the fediverse DDOS itself with this?