Discovered this morning that Maven https://heymaven.com (a social media startup who's CEO is ex OpenAI "Ken Stanley: leading the Open-Endedness Team at OpenAI") is mass importing public posts from the #fediverse with no links back to the original and no way to delete them. It seems there is no Opt-out or Opt-in mechanism at all. It also has posts from #Bluesky pulled in via @bsky.brid.gy that are also not linked back to the original.
Here's an example: https://app.heymaven.com/profile/66927
1.12 million fediverse posts scraped by AI startup Maven founded by ex OpenAI lead...
confirmation by Maven CTO Jimmy Secretan https://app.heymaven.com/discover/1190743
UPDATE: Looks like its a bit more complex (isn't it always)
So the CTO is here at @jsecretan and has clarified that they are in the process of implementing bidirectional #ActivityPub, but in the meantime ingested the "federated timeline" of Mastodon.social
You can look at their AP response here: https://staging.maven.ly/mastodon/actor/1 though it doesn't seem to be live on their main domain.
UPDATE 2: so it looks like @jsecretan is deleting the entire 1.12 million scraped posts off of Maven after this thread blew up. So cool I guess? But also sorta totally comes off as "whoopsies" we had no idea what would happen if you scrape millions of posts with no link back to the original. I hope to see an official post mortum on this incident from Maven
@liaizon @jsecretan
Maybe this is naive, but once an LLM has "ingested" source material, what remains is a bunch of statistics; the "source" is no longer required or stored.
Trying to "remove" it from a model sounds a lot like trying to unbake a cake.
@RealGene @liaizon @jsecretan I assume they could roll it back to an earlier version, pre-scrape?
@nev @liaizon @jsecretan
Yes, and monkeys could fly out of my butt…
@RealGene @liaizon @jsecretan Just load a backup from before it ate my posts. If I decide to allow it to keep my posts, I should be credited with my contribution to its statistics & there should be a list containing links to every post eaten publicly available.
@jackemled @liaizon @jsecretan
Oh, and you think these primates have, or have ever tried, to successfully restore from backup?
@RealGene @liaizon @jsecretan Then they can simply delete it & start over, since they have all of the data to recreate it & the money to pay the power bills for it.