hachyderm.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
Hachyderm is a safe space, LGBTQIA+ and BLM, primarily comprised of tech industry professionals world wide. Note that many non-user account types have restrictions - please see our About page.

Administered by:

Server stats:

9.4K
active users

Cool project: "Nepenthes" is a tarpit to catch (AI) web crawlers.

"It works by generating an endless sequences of pages, each of which with dozens of links, that simply go back into a the tarpit. Pages are randomly generated, but in a deterministic way, causing them to appear to be flat files that never change. Intentional delay is added to prevent crawlers from bogging down your server, in addition to wasting their time. Lastly, optional Markov-babble can be added to the pages, to give the crawlers something to scrape up and train their LLMs on, hopefully accelerating model collapse."

zadzmo.org/code/nepenthes/

zadzmo.orgZADZMO code

@tante I have mixed feelings.

Crawlers should respect robots.txt….

At the same time: there is clearly an emotionally based bias happening with LLM’s.

I feel weird about the idea of actively sabotaging. Considering it is only towards bad actors… and considering maybe robots.txt often are too restrictive in my opinion… the gray areas overlap a bit.

Why should we want to actively sabatoge AI dev? Wouldn’t that lead to possible catastrophic results? Who benefits from dumber ai?

@altruios @tante Because what they're doing is without consent, in violation of law in ways that normal ppl have had their lives ruined over, but they're backed by asshole billionaires so it's fine when they do it. We all benefit from sabotaging their scam products.

@dalias @altruios @tante They're crawling the web, running code against it and providing a service based on the results. How's that any different to what search engines have been doing for the last 30 years?

@woe2you @dalias @tante data rates. They be ddos’ing people like they don’t care. That’s the difference. One crawl a week vs 100,000 every hour, or whatever the actual rate is…

I fundamentally have no problem with a crawler. I do not approve of crawlers that are that aggressive and disrespectful.

@altruios @dalias @tante I agree that a misbehaving crawler is annoying. I'm just trying to understand why people seem to have a knee jerk reaction against LLMs in general.

@woe2you @altruios @tante Because if we gobbled up millions of copyrighted works and produced derivatives of them, we'd be facing decades in prison, billions in fines. But when they seize our stuff that way and enclose it and use it to manufacture slop, in gross violation of copyright, it's deemed legitimate business.

@woe2you @altruios @tante You should understand that LLMs have no legitimate purposes. They do not produce intelligence or knowledge or information. They produce *information-shaped* slop. The only way they get better is getting better at deceiving fools that the slop is what it looks like.

If you don't understand what they are, what they're doing, and how they do it, then kindly stop calling the reactions by people who do "knee-jerk".

@dalias @altruios @tante I can talk to my smart home in natural language instead of having to sound like a BASIC program and get the name of every device precisely right, and it can respond in natural language. How is that not a "legitimate purpose" per your definition?

If you could explain without calling me an asshole or a simp that would be appreciated too.

@woe2you @altruios @tante You don't need LLMs to do any of that.

@dalias@hachyderm.io @woe2you@toot.wales @altruios@mastodon.social @tante@tldr.nettime.org yeah. Microsoft Power Automate, Scratch and several other "languages" I can think of on the top of my head that allow you to more or less skip the BASIC syntax and program things by moving blocks containing logic around. You don’t need an LLM for that and you shouldn’t use one either. tell an LLM to do 1+2 and it won’t calculate that answer. Instead it’ll hallucinate, I’ve seen it spit out 5 for 2+2 because that’s a common reference with 1984 and the radio head song 2+2=5

@dalias@hachyderm.io @woe2you@toot.wales @altruios@mastodon.social @tante@tldr.nettime.org basic voice commands do not need an LLM or a language (I mean, command line programs don’t need to compile their arguments). voice detection isn’t built on the same thing.

@puppygirlhornypost2 @dalias @tante @woe2you voice detection is different than the (seemingly) semantic understanding of unexpected keywords given context clues. You can talk in regulate diction… have it transcribed accurately… congratulations: you’ve detected a voice!!!

turning that into a command proved tricky… an LLM (at that step) however, seems to be what works better than anything we’ve seen…

Cassandrich

@altruios @puppygirlhornypost2 @tante @woe2you Because you haven't seen much and you're bedazzled by the hype machine.

@dalias @puppygirlhornypost2 @tante @woe2you neither of those accusations are accurate.

I’ve seen a lot, in smart assistants, can you point to anything that works better? I don’t think I’ve seen everything… I just suspect in my active searching I would have found an example of what you imply exists…