Over the years, I made a handful of maps of various things in Cambridge; I have collected some, but not all of them, on this page about housing things in Cambridge.
This includes things like maps of where you could legally build a fourplex (short answer: not many places!); the distribution of tax paid per parcel (Kendall Square pays a lot!) and more.
Fun fact: sharing this link on Mastodon caused my server to serve 112,772,802 bytes of data, in 430 requests, over the 60 seconds after I posted it (>7 r/s). Not because humans wanted them, but because of the LinkFetchWorker, which kicks off 1-60 seconds after Mastodon indexes a post (and possibly before it's ever seen by a human).
Every Mastodon instance fetches and stores their own local copy of my 750kb preview image.
(I was inspired by to look by @jwz's post: https://mastodon.social/@jwz/109411593248255294.)
@crschmidt well, this sounds like a p0 bug. Mastodon is going into robots.txt on many servers once this gets noticed widely.
@cshabsin Don't worry! I just confirmed that Mastodon doesn't respect robots.txt for any of these fetches, so even if it's added to robots.txt, it will have no effect!
@crschmidt @cshabsin that definitely seems ... inappropriate.
@tw @crschmidt @cshabsin link preview bots all ignore robots.txt, so mastodon is at least following precedent here.
Except that I think Mastodon's implementation is wrong: on a centralized network the preview is created at the 'request' of the person sharing, so robots.txt doesn't apply. But here it's created fully automatically, so it really should apply. The fix would be to capture the site at sharing time and send it along in the post, which is also more efficient (though prone to abuse?)
@jefftk @tw @cshabsin yeah, the prone to abuse and "hard to standardize across all implementations" are the reasons it was rejected in 2017, and has languished as an untouched feature request since 2020 (respectively). Time to rethink that. (I don't love that a single implementation is 95% of the fediverse, but it is; standardization is frankly secondary to making sure the core implementation works well.
@gme @crschmidt @cshabsin @jefftk That's a pretty dismissive take on software violating an agreed-upon Internet standard...
@gme @crschmidt @tw @cshabsin where do you see that in the blog post? I agree that scraping a preview isn't crawling if you do it at send time, but doing it at automatically at retrieve time is
@gme @crschmidt @tw @cshabsin in the portion of the post that you've screenshotted I'm only talking about fetching at posting time
@gme @crschmidt @tw @cshabsin saying "Mastodon is doing the wrong thing here" doesn't mean Pleroma isn't also doing the wrong thing!
@gme @crschmidt @tw @cshabsin @jefftk I will likely regret wading in here but this is a rather bizarre thread. Spec or not, it’s poor design. It’s irresponsible. It’s messy. It’s resource intensive. To say “put a CDN on it” doesn’t change the waste - it hides the bug/implementation and shifts the responsibility. @jwz and friends are correct to bring attention to it. Don’t bury it. Fix it. @Gargron
@shanselman @gme @crschmidt @tw @cshabsin @jefftk @jwz @Gargron
If it really bothers you so much:
Contribute @shanselman .
@AstridSawatzky @shanselman @gme @crschmidt @tw @cshabsin @jwz @Gargron first step is getting agreement that this actually needs fixing https://github.com/mastodon/mastodon/issues/21738 https://github.com/mastodon/mastodon/issues/4486
@jefftk @shanselman @gme @crschmidt @tw @cshabsin @jwz @Gargron thank you so much for your answer.
@shanselman @gme @crschmidt @tw @cshabsin @jefftk @jwz @Gargron Maybe it’s because I’m coming at this from the view of a native app developer but “just put it behind a CDN” in response to inefficiencies in the fediverse feels like an app developer telling users “just charge your phone more often” in response to their app eating all the battery. It’s up to devs to be a good platform citizen
@shanselman @gme @cshabsin Yeah, to use my battery example again it’s the difference between an app draining your battery because it’s doing something genuinely processor intensive vs draining your battery because the devs need to optimise better. This feels a lot more like the latter case.
A CDN is helpful for dealing with high traffic, but it’s not an excuse for unnecessarily causing high traffic
@shanselman @gme @crschmidt @tw @cshabsin @jefftk @jwz @Gargron
Does anyone know whether Mastodon also fetches previews for links pointing to individual toots / posts? Can the fediverse DDOS itself with this?