hachyderm.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
Hachyderm is a safe space, LGBTQIA+ and BLM, primarily comprised of tech industry professionals world wide. Note that many non-user account types have restrictions - please see our About page.

Administered by:

Server stats:

9.4K
active users

Cool project: "Nepenthes" is a tarpit to catch (AI) web crawlers.

"It works by generating an endless sequences of pages, each of which with dozens of links, that simply go back into a the tarpit. Pages are randomly generated, but in a deterministic way, causing them to appear to be flat files that never change. Intentional delay is added to prevent crawlers from bogging down your server, in addition to wasting their time. Lastly, optional Markov-babble can be added to the pages, to give the crawlers something to scrape up and train their LLMs on, hopefully accelerating model collapse."

zadzmo.org/code/nepenthes/

zadzmo.orgZADZMO code

@tante I have mixed feelings.

Crawlers should respect robots.txt….

At the same time: there is clearly an emotionally based bias happening with LLM’s.

I feel weird about the idea of actively sabotaging. Considering it is only towards bad actors… and considering maybe robots.txt often are too restrictive in my opinion… the gray areas overlap a bit.

Why should we want to actively sabatoge AI dev? Wouldn’t that lead to possible catastrophic results? Who benefits from dumber ai?

@altruios @tante Because what they're doing is without consent, in violation of law in ways that normal ppl have had their lives ruined over, but they're backed by asshole billionaires so it's fine when they do it. We all benefit from sabotaging their scam products.

@dalias @altruios @tante They're crawling the web, running code against it and providing a service based on the results. How's that any different to what search engines have been doing for the last 30 years?

@woe2you @altruios @tante No, they're preparing unauthorized derivative works. And they're explicitly and intentionally disregarding opt-out. Ability to access something via the web does not imply right to incorporate it into other works, republish altered versions of it, etc.

@dalias @woe2you @tante

The difference between a human reading a website and writing an article ‘inspired by’ what they’ve read And an LLM consuming and outputting content the same way is we recognize that an LLM is a tool and can do the same thing faster.

Reading is training. Reading isn’t copying. Output is the issue: not input. It’s worrisome to see so many not grasp this.

Looking/copying isnt stealing. It just isn’t. No one lost their website.

@altruios @woe2you @tante You really have no understanding of the mechanics of the LLM fraud and are just repeating their false talking points about learning/training.

LLMs are overfitted (that's the large part) models, and as such, they actually encode copies of large swaths of their training material. The best description is "lossy compression algorithm". I don't get a free pass pirating movies because the compression is lossy.

@altruios @woe2you @tante In any case, it's not up to the AI simps to decide if we should be furious at this industry for scraping & pirating our stuff. It's up to us. And questioning/challenging our response is an asshole move.

@dalias @woe2you @tante I would love to be wrong about something I am angry about or hate. I try to make that happen as often as possible. I don’t see challenging discourse as disrespectful or an asshole move. Everyone has an opinion. Sharing is how we learn.

I really do think a lot of anger is misdirected at AI/LLMs when it would be more accurate to say that capitalism as a system is the thing at fault here.

Ai is just the easy thing to point towards.

@altruios @woe2you @tante LLMs and capitalism are inseparable. Nobody would be making slop machines except for capitalist motives.

@dalias @woe2you @tante that is inaccurate.
There are other motives other than profit to making AI. LLM’s are a reasonable attempt: they would have been made regardless of capitalism.

Capitalism is the problem. And AI, while an accelerant to capitalism… would be an accelerant regardless of political systems deployed…

If we had a leftist government: AI development would be optimized for citizens first as a public open source inspectable utility…

Cassandrich

@altruios @woe2you @tante You really have no idea what you're talking about, and seem to be basing this on marketing copy not any understanding of the actual math.

@dalias @woe2you @tante

Railing against capitalism* didn’t see that in the marketing brochure /s

But really? What have I said that would indicate I’m only consuming marketing material?

@altruios @woe2you @tante Thinking that LLMs are useful, or that even if they were, that the astronomical environmental and economic costs of building them would be justified by anything other than VC "last mark holding the worthless stock" shenanigans.

@altruios hey, itsa walrus!
How many questions can you respond with? Are you so supportive of ai in your day to day life or only on social media? What makes you think people who create have an obligation to permit corporates to use llms to copy and paste creator sites (not just 'looking' ffs, that requires sentient beings.) and then use that to make their own version, but one that benefits them not the creator?
Why are you so concerned with people accepting llm amd ai scraping