hachyderm.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
Hachyderm is a safe space, LGBTQIA+ and BLM, primarily comprised of tech industry professionals world wide. Note that many non-user account types have restrictions - please see our About page.

Administered by:

Server stats:

9.7K
active users

Discovered this morning that Maven heymaven.com (a social media startup who's CEO is ex OpenAI "Ken Stanley: leading the Open-Endedness Team at OpenAI") is mass importing public posts from the #fediverse with no links back to the original and no way to delete them. It seems there is no Opt-out or Opt-in mechanism at all. It also has posts from #Bluesky pulled in via @bsky.brid.gy that are also not linked back to the original.

Here's an example: app.heymaven.com/profile/66927

1.12 million fediverse posts scraped by AI startup Maven founded by ex OpenAI lead...

confirmation by Maven CTO Jimmy Secretan app.heymaven.com/discover/1190

app.heymaven.comMavenMaven: Follow interests, not influencers

UPDATE: Looks like its a bit more complex (isn't it always)
So the CTO is here at @jsecretan and has clarified that they are in the process of implementing bidirectional #ActivityPub, but in the meantime ingested the "federated timeline" of Mastodon.social
You can look at their AP response here: staging.maven.ly/mastodon/acto though it doesn't seem to be live on their main domain.

UPDATE 2: so it looks like @jsecretan is deleting the entire 1.12 million scraped posts off of Maven after this thread blew up. So cool I guess? But also sorta totally comes off as "whoopsies" we had no idea what would happen if you scrape millions of posts with no link back to the original. I hope to see an official post mortum on this incident from Maven

RealGene ☣️

@liaizon @jsecretan
Maybe this is naive, but once an LLM has "ingested" source material, what remains is a bunch of statistics; the "source" is no longer required or stored.

Trying to "remove" it from a model sounds a lot like trying to unbake a cake.

@RealGene @liaizon @jsecretan is there an option to dispose of the cake, just like a CD with pirate software, and rebake it, this time in a legal way?

@RealGene @liaizon @jsecretan I assume they could roll it back to an earlier version, pre-scrape?

@nev @liaizon @jsecretan
Yes, and monkeys could fly out of my butt…

@RealGene @liaizon @jsecretan Just load a backup from before it ate my posts. If I decide to allow it to keep my posts, I should be credited with my contribution to its statistics & there should be a list containing links to every post eaten publicly available.

@jackemled @liaizon @jsecretan
Oh, and you think these primates have, or have ever tried, to successfully restore from backup?

@RealGene @liaizon @jsecretan Then they can simply delete it & start over, since they have all of the data to recreate it & the money to pay the power bills for it.