Over the years, I made a handful of maps of various things in Cambridge; I have collected some, but not all of them, on this page about housing things in Cambridge.
This includes things like maps of where you could legally build a fourplex (short answer: not many places!); the distribution of tax paid per parcel (Kendall Square pays a lot!) and more.
Fun fact: sharing this link on Mastodon caused my server to serve 112,772,802 bytes of data, in 430 requests, over the 60 seconds after I posted it (>7 r/s). Not because humans wanted them, but because of the LinkFetchWorker, which kicks off 1-60 seconds after Mastodon indexes a post (and possibly before it's ever seen by a human).
Every Mastodon instance fetches and stores their own local copy of my 750kb preview image.
(I was inspired by to look by @jwz's post: https://mastodon.social/@jwz/109411593248255294.)
@crschmidt well, this sounds like a p0 bug. Mastodon is going into robots.txt on many servers once this gets noticed widely.
@cshabsin Don't worry! I just confirmed that Mastodon doesn't respect robots.txt for any of these fetches, so even if it's added to robots.txt, it will have no effect!
@crschmidt @cshabsin that definitely seems ... inappropriate.
@tw @crschmidt @cshabsin link preview bots all ignore robots.txt, so mastodon is at least following precedent here.
Except that I think Mastodon's implementation is wrong: on a centralized network the preview is created at the 'request' of the person sharing, so robots.txt doesn't apply. But here it's created fully automatically, so it really should apply. The fix would be to capture the site at sharing time and send it along in the post, which is also more efficient (though prone to abuse?)
@jefftk @tw @cshabsin yeah, the prone to abuse and "hard to standardize across all implementations" are the reasons it was rejected in 2017, and has languished as an untouched feature request since 2020 (respectively). Time to rethink that. (I don't love that a single implementation is 95% of the fediverse, but it is; standardization is frankly secondary to making sure the core implementation works well.
@gme @crschmidt @cshabsin @jefftk That's a pretty dismissive take on software violating an agreed-upon Internet standard...
@gme @crschmidt @tw @cshabsin where do you see that in the blog post? I agree that scraping a preview isn't crawling if you do it at send time, but doing it at automatically at retrieve time is
@gme @crschmidt @tw @cshabsin @jefftk
Lack of alt text in a text only image... I think many clients even offer to OCR it for you. This small thing goes a long way to making this place more open and accessible!