Hachyderm @hachyderm

Recent searches

Search options

Only available when logged in.

Doc Impossible @Impossible_PhD@hachyderm.io

I haven't been very worried about AI, even though I'm a writer.

Why?

Because it takes a while for the law teams employed by the titans of old media to rumble to action, but it was always clear they were coming. These are the teams that don't sue other companies unless they're certain of winning.

And today, the New York Times sued OpenAI for several billion dollars.

https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html

A lawsuit by The New York Times could test the emerging legal contours of generative A.I. technologies.

The New York Times · Dec 27, 2023New York Times Sues OpenAI and Microsoft Over Use of Copyrighted WorkBy Michael M. Grynbaum

Dec 27, 2023, 09:03 PM··Tusky

152boosts·226favorites

**Doc Impossible** @Impossible_PhD · Dec 27, 2023

Dec 27, 2023

Doc Impossible @Impossible_PhD

BTW: this isn't even the lawsuit OpenAI is *really* scared of.

The House of Mouse has yet to tee up to the plate against them or Midjourney or the like.

Modern copyright laws are terrifying, y'all, and the courts have found definitively and repeatedly that AI products are derivative material, and cannot be copyrighted. Unless copyright law is completely rewritten (BTW, it needs to be) the *only* thing AI can be used for is to build better search.

Just another tech hustle, like crypto & NFTs.

**Lizzie Danger** @Dani@mastodon.sandwich.net · Dec 27, 2023

Dec 27, 2023

Lizzie Danger @Dani@mastodon.sandwich.net

@Impossible_PhD Heh. Was wondering when that'd happen.

I guess, regarding the whole thing, I've been of a mindset to give it time to see where it ends up after the FUD/hype shake out

**April Fools (she/it)** @amberage@eldritch.cafe · Dec 27, 2023

Dec 27, 2023

April Fools (she/it) @amberage@eldritch.cafe

@Impossible_PhD what I'm curious about is: how much effort is it to re-train such a generative model?

They're gonna get sued again and again, and each time it's gonna end with "remove our stuff from your model and pay us damages, or keep it in and pay us damages and licensing fees". And as far as I understand, the way these models work, it's impossible to "remove" anything because the training data isn't stored inside as discrete units, it all contributes to the biasing of these artificial neurons.

So will they have to re-train their models after each lawsuit with one source fewer? Or how is that gonna work.

**Lona Theartlav** @theartlav · Dec 27, 2023

Dec 27, 2023

Lona Theartlav @theartlav

@amberage @Impossible_PhD Pretty much. It takes the same effort and cost as it took to train them in the first place, each time you need to remove something from the training set. There are shortcuts, but they might not stand in the legal sense.

**Doc Impossible** @Impossible_PhD · Dec 27, 2023

Dec 27, 2023

Doc Impossible @Impossible_PhD

@theartlav @amberage Correct. Post-training, the way these things are made, material *cannot* be removed. You have to retrain from scratch.

**Madopal** @madopal@mstdn.social · Dec 27, 2023

Dec 27, 2023

Madopal @madopal@mstdn.social

@Impossible_PhD @theartlav @amberage And and and...you have to have a method for enumerating what it's been trained on, which right now no one seems to care about. Can't wait for pipeline engineers to become all the rage.

**alastair87 (old account)** @alastair@social.alastair87.me · Dec 28, 2023

Dec 28, 2023

alastair87 (old account) @alastair@social.alastair87.me

@Impossible_PhD@hachyderm.io @theartlav@hachyderm.io @amberage@eldritch.cafe I tried an experiment - I have a coding one called Tabnine that is only trained on code that is licensed under something like MIT or Apache - i.e. you can use it in free or non-free software. It is helpful for looking things up for one.

I asked it "Who is Karl Marx?" and it started to answer before it then refreshed and told me off for breaking terms of service regarding politics, religions etc. It is possible to imperfectly add guardrails after the fact, but as demonstrated there was enough in the dataset for it to attempt an answer so it did and the flicker of an answer I got before it cut out appeared correct, not nonsense.

**alastair87 (old account)** @alastair@social.alastair87.me · Dec 28, 2023

Dec 28, 2023

alastair87 (old account) @alastair@social.alastair87.me

@Impossible_PhD@hachyderm.io @theartlav@hachyderm.io @amberage@eldritch.cafe I expect they may also be using a generic base model that has been trained on more than just code, though they don't seem to disclose what it is.

**alastair87 (old account)** @alastair@social.alastair87.me · Dec 28, 2023

Dec 28, 2023

alastair87 (old account) @alastair@social.alastair87.me

@Impossible_PhD@hachyderm.io @theartlav@hachyderm.io @amberage@eldritch.cafe I think for the package I'm using they may be able to train on my code as well, but that's not a problem in this case because I am willing to consent to that.

**Rachel** @rachel@transitory.social · Dec 27, 2023 *

Dec 27, 2023 *

Rachel @rachel@transitory.social

@amberage@eldritch.cafe @Impossible_PhD@hachyderm.io

@Impossible_PhD@hachyderm.io @simontoth@hachyderm.io

they could train models on only copyright-free/copyleft/their own sources but 1) the quality would be lower and 2) they still couldn't copyright the results so they couldn't use it for certain things, just like, lazy stock art generators

Edit: copyleft/other licenses still won't be enough for them to use. The main point was that they'd be limited significantly by being held to legal uses

**Doc Impossible** @Impossible_PhD · Dec 27, 2023

Dec 27, 2023

Doc Impossible @Impossible_PhD

@rachel @amberage @simontoth The problem with that is the 92-year threshold for copyright expiry. Anything trained on such old material would spit out prose that's *hopelessly* and unrecognizably weird.

**Rachel** @rachel@transitory.social · Dec 27, 2023

Dec 27, 2023

Rachel @rachel@transitory.social

@Impossible_PhD@hachyderm.io @amberage@eldritch.cafe @simontoth@hachyderm.io true. There is art/writing/etc that people produce today that they publish with explicit copyright-free/copyleft licences. Some realms have more of it than others and it is absolutely in lower volume that what is otherwise obtainable for training.

Overall my biggest fears is how it'll be used to accelerate the crumbling of the web as we know it, and the torrent of shit news articles.

**Rachel** @rachel@transitory.social · Dec 27, 2023

Dec 27, 2023

Rachel @rachel@transitory.social

@Impossible_PhD@hachyderm.io @amberage@eldritch.cafe @simontoth@hachyderm.io mostly thinking of things like open source code, social media posts used for training by those social media companies and such and that is still full of legal landmines

**Doc Impossible** @Impossible_PhD · Dec 27, 2023

Dec 27, 2023

Doc Impossible @Impossible_PhD

@rachel @amberage @simontoth Courts have already found that tweets are copyrightable material, and property of their writers. SO yeah, they can't be used.

**Rachel** @rachel@transitory.social · Dec 27, 2023

Dec 27, 2023

Rachel @rachel@transitory.social

@Impossible_PhD@hachyderm.io @amberage@eldritch.cafe @simontoth@hachyderm.io it would please me so very much to see musky get sued by his own site's users for his attempts at making an anti-woke fancy autocomplete

They probably added something into the unenforceable EULA saying they can do it

**maegul** @maegul · Dec 28, 2023

Dec 28, 2023

maegul @maegul

@rachel @Impossible_PhD @amberage @simontoth

If the business case gets made well enough, I’ve figured for a while we’d end up in an uncomfortably dystopian scenario where content farming by AI companies becomes normal.

IE, people are hired to just produce stuff for which they’ve signed away the rights.

Historically high rates of education/skill plus capitalism hell scape leads pretty neatly to such a state IMO.

These “content cattle” will be doing what they’re passionate about anyway.

**maegul** @maegul · Dec 28, 2023

Dec 28, 2023

maegul @maegul

@rachel @Impossible_PhD @amberage @simontoth

Like, “are you a philosopher but need to pay rent? Come, debate ideas and publish for pay!”

“Music career not panning out and sick of the YouTube algorithm? … just jam everyday with us!”

Seems to tie too well into the education industrial complex that’s developed around the middle class and their “passion coddling”.

**Rylie** @Specialist_Being_677 · Dec 27, 2023

Dec 27, 2023

Rylie @Specialist_Being_677

@rachel @Impossible_PhD @amberage @simontoth stuff I publish (code) under open source or copyleft licenses still isn't suitable training data: there are license obligations of varying intensity, and nobody training these massive models is obeying then. I adamantly oppose treating publicly shared content as obligation free, even if all the tech bros wish it were for their bottom line. I have a bigger rant in the footer of my web site but I won't dox my alt

**Rylie** @Specialist_Being_677 · Dec 27, 2023

Dec 27, 2023

Rylie @Specialist_Being_677

@rachel @Impossible_PhD @amberage @simontoth tldr: use my code and follow the associated license. Don't train your plagiarism machine on it.

**Rachel** @rachel@transitory.social · Dec 27, 2023

Dec 27, 2023

Rachel @rachel@transitory.social

@Specialist_Being_677@hachyderm.io @Impossible_PhD@hachyderm.io @amberage@eldritch.cafe @simontoth@hachyderm.io oh yeah I definitely should not have included copyleft there just like it is certainly a violation of GPL software in GitHub copilot

**Janne Moren** @jannem@fosstodon.org · Dec 28, 2023

Dec 28, 2023

Janne Moren @jannem@fosstodon.org

@rachel @Impossible_PhD @amberage @simontoth
There's a lot of newer results showing that a smaller model trained on a smaller but well-curated data set will be as good as these huge models trained on the garbage that is the entire internet.

My guess is, the legal issues will accelerate the move towards this approach. It'll let you use traceable and properly licensed data, and costs less overall to both train and use.

**Jennifer Kayla | Theogrin** @theogrin@chaosfem.tw · Dec 28, 2023

Dec 28, 2023

Jennifer Kayla | Theogrin @theogrin@chaosfem.tw

@Impossible_PhD @rachel @amberage @simontoth

My favourite bit of meaningless trivia is that both James Joyce's *Ulysses* and Timothy Dexter's *a pickle for the knowing ones* are in the public domain, and I would kill to see a plagiaristic, stochastic parrot poisoned by them and so, so many others.

**Cassandrich** @dalias · Dec 28, 2023

Dec 28, 2023

Cassandrich @dalias

@rachel @Impossible_PhD @amberage @simontoth No, copyleft is even worse for them. It means they have to make the model and all derivative works free under the same license, if they can. If they can't (because of other conflicting legal obligations) they can't distribute tbe derivative works at all.

**Šimon Tóth** @simontoth · Dec 27, 2023

Dec 27, 2023

Šimon Tóth @simontoth

@Impossible_PhD Unless I missed something, the only cases around AI were simply about "not made by a human and therefore cannot be copyrighted".

**Doc Impossible** @Impossible_PhD · Dec 27, 2023

Dec 27, 2023

Doc Impossible @Impossible_PhD

@simontoth Mmm hmmm. Thing is, that means it's derivative *by definition*.

Originality is the determinant of what is and is not copyrightable. Everything that is not original is either derivative or out-and-out theft. In either case, massive infringement. Proving originality is the only defense against a robust copyright infringement lawsuit.

**Šimon Tóth** @simontoth · Dec 27, 2023

Dec 27, 2023

Šimon Tóth @simontoth

@Impossible_PhD Well, no. The "Monkey selfie" was original, but it was rejected for the same reason.

**draNgNon** @draNgNon · Dec 27, 2023

Dec 27, 2023

draNgNon @draNgNon

@Impossible_PhD I will make a personal speculation: the Mouse had two events to wait for: a) the twin strikes which centred on generative AI b) what is happening to the actual mouse on 1 Jan when it goes out of copyright.

those outcomes no doubt affect some of the phrasing in the filings.

not that my speculations matter in the slightest, it's just fun to muse on it

**Marty Fouts** @MartyFouts@mastodon.online · Dec 28, 2023

Dec 28, 2023

Marty Fouts @MartyFouts@mastodon.online

@Impossible_PhD This article https://web.law.duke.edu/cspd/mickey/ makes me think that the house of mouse might not be anxious to enter this fray, given their reliance on both sides of copyright. I also think that OpenAI MSFT will use a very different tactic in court, relying on the fact that AI training is not an expressive use of the material. EU regulation and US law can be read in a way that would support such an argument. Only the courts can decide if that would apply.

web.law.duke.eduMickey, Disney, and the Public Domain: a 95-year Love Triangle | Duke University School of Lawby Jennifer Jenkins, Director, Duke Center for the Study of the Public Domain CC BY 4.0 On January 1, 2024, after almost a century of copyright protection, Mickey Mouse, or at least a version of Mickey Mouse, will enter the United States public domain. The first movies in which the iconic mouse appeared – Steamboat Willie and the silent version of Plane Crazy[1] – were made in 1928 and works from that year go into the public domain in the US on New Year’s Day 2024.[2] (Note that this article is only about US law.

**susannah has moved** @susannah@octodon.social · Dec 28, 2023

Dec 28, 2023

susannah has moved @susannah@octodon.social

@Impossible_PhD but i sure do wish we’d rewrite copyright law

**Lona Theartlav** @theartlav · Dec 27, 2023

Dec 27, 2023

Lona Theartlav @theartlav

@Impossible_PhD
Hmm. That's not quite enough to stop me worrying, but it's nice to see.

bk @knutson_brain@sfba.social · Dec 27, 2023

Dec 27, 2023

bk @knutson_brain@sfba.social

@Impossible_PhD Download PrivateGPT while you can…

**Lona Theartlav** @theartlav · Dec 27, 2023

Dec 27, 2023

Lona Theartlav @theartlav

@knutson_brain @Impossible_PhD
Seriously, i'd recommend GGML. No dependencies, works with all sorts of models, all local.
And i do recommend to play with it, it shows just how bad these things can be on average, without having to deal with the sites and limits.

**Jennifer** @Jennifer@bookstodon.com · Dec 27, 2023

Dec 27, 2023

Jennifer @Jennifer@bookstodon.com

@Impossible_PhD hahaha awesome! I'm a writer too and so far am not worried about AI either (art is a different story). I played with Chat GPT and the output is awful. It could not do my job. I hope the NYT wins!!

**Doc Impossible** @Impossible_PhD · Dec 27, 2023

Dec 27, 2023

Doc Impossible @Impossible_PhD

@Jennifer Nah. Disney gets to eat the Midjournies of the world, and Disney has no mercy whatsoever. When The House of Mouse eventually files its lawsuit, it'll make this look like amateur hour.

**Jennifer** @Jennifer@bookstodon.com · Dec 27, 2023

Dec 27, 2023

Jennifer @Jennifer@bookstodon.com

@Impossible_PhD I'm not familiar with Midjourney. ? I'm not a fan of Disney, but if Open AI is infringing on any of their copyrights, that will be one hell of a lawsuit!

**Doc Impossible** @Impossible_PhD · Dec 27, 2023

Dec 27, 2023

Doc Impossible @Impossible_PhD

@Jennifer Midjourney is ChatGPT, expect with pictures. ChatGPT incorporating image generation into GPT4 was a duuuumb move, because now Disney gets to come for them too.

**Jennifer** @Jennifer@bookstodon.com · Dec 27, 2023

Dec 27, 2023

Jennifer @Jennifer@bookstodon.com

@Impossible_PhD ooooh ok! LOL. My husband is an artist so AI for art makes me kind of mad.

**ChookMother** @anne_twain@theblower.au · Dec 27, 2023

Dec 27, 2023

ChookMother @anne_twain@theblower.au

@Impossible_PhD This is gopd news. Waiting to see if they succeed.

**Doc Impossible** @Impossible_PhD · Dec 27, 2023

Dec 27, 2023

Doc Impossible @Impossible_PhD

@anne_twain I'd bet just about everything I own that they will. OpenAI has no actual legal defense here, given prior findings that OAI product is non-copyrightable. Their only way out is to settle, and I'm 99% sure that the Times won't because their actual objective here is to shut down OpenAI.

**hybrid havoc** @hybridhavoc@darkfriend.social · Dec 27, 2023

Dec 27, 2023

hybrid havoc @hybridhavoc@darkfriend.social

@Impossible_PhD @anne_twain
Whether or not the output of generative AI can be copyrighted isn't really a question here though. It's unrelated to the question of copyright infringement.

I'm still of the opinion that these various copyright infringement cases are going to fail, that OpenAI and others have a relatively strong fair use defense. But we'll see how things are ruled in the end.

**ChookMother** @anne_twain@theblower.au · Dec 28, 2023

Dec 28, 2023

ChookMother @anne_twain@theblower.au

@hybridhavoc @Impossible_PhD Well spotted. It's not the question we're discussing.

The question we're asking is whether generative AI is infringing the copyright of the newspaper. That's what the lawsuit is about.

**alastair87 (old account)** @alastair@social.alastair87.me · Dec 27, 2023

Dec 27, 2023

alastair87 (old account) @alastair@social.alastair87.me

@Impossible_PhD@hachyderm.io Except we know OpenAI and Microsoft will have planned for this, and Microsoft has very much been through and won on anti-trust suits, so getting an unethical and potentially illegal business model past legal challenges isn't new ground for them. My guess would be that like then they intend to win through with most but not all of it.

**Doc Impossible** @Impossible_PhD · Dec 27, 2023

Dec 27, 2023

Doc Impossible @Impossible_PhD

@alastair This isn't a regulatory case, though. Not even close. The law is completely different, and copyright has almost no loopholes--decades of lobbying from Disney and other old media titans have closed almost every single one, and the remainders don't stand a chance of protecting OAI here.

**alastair87 (old account)** @alastair@social.alastair87.me · Dec 27, 2023

Dec 27, 2023

alastair87 (old account) @alastair@social.alastair87.me

@Impossible_PhD@hachyderm.io I'm inclined to assume that broadly speaking anything Disney etc have considered, OpenAI/Microsoft will have. So to my mind that means they have planned for losing at this, which makes me wonder what the next gambit would be. They could perhaps pay the media companies for training data and I'm fairly skeptical that that would lead to much if anything extra for a lot of writers in many cases.

**Doc Impossible** @Impossible_PhD · Dec 27, 2023

Dec 27, 2023

Doc Impossible @Impossible_PhD

@alastair They've already said publicly that even the lowest possible training licensure fees would make AI impossible--like, at tenths of a cent per trained item, we're talking here. The volume they need to feed these things is *mind-shatteringly* large.

**Jason Petersen (he)** @jason@logoff.website · Dec 27, 2023

Dec 27, 2023

Jason Petersen (he) @jason@logoff.website

@Impossible_PhD @mattly the NYT is something you consider aligned with good?

“We have IP extortion and protection rackets on our side” doesn’t make me feel better

@mcmenguc@mstdn.social · Dec 27, 2023

Dec 27, 2023

@mcmenguc@mstdn.social

@Impossible_PhD

**Rich Puchalsky** @RichPuchalsky@kolektiva.social · Dec 28, 2023

Dec 28, 2023

Rich Puchalsky @RichPuchalsky@kolektiva.social

@Impossible_PhD

The NYT is going to sue the AI makers, win some large amount of money, and then make a deal with them that totally fails to protect the rest of us.

The real reason AI will fail is that it can't actually do any of the things that it's supposed to do.

**The Laughing Muse** @thelaughingmuse@calckey.art · Dec 28, 2023

Dec 28, 2023

The Laughing Muse @thelaughingmuse@calckey.art

@Impossible_PhD@hachyderm.io OpenAI is great for creating first drafts of things, especially for people who aren't hugely comfortable writing. Then, once that first draft has been generated, a human will need to take over and confirm information, verify sources, and begin the editing/rewriting process.

**Doc Impossible** @Impossible_PhD · Dec 28, 2023

Dec 28, 2023

Doc Impossible @Impossible_PhD

@thelaughingmuse Yeah, fun fact: if you start with copyright infringement, no matter what you do with it, the product is a derivative work.

It's still copyright infringement.

And just as a note, I'm a professor of writing. I know very very well the complexities of the writing process. There are far better ways which will yield superior product, regardless of comfort or proficiency.

**Lee Fife - old account** @colo_lee@zirk.us · Dec 28, 2023

Dec 28, 2023

Lee Fife - old account @colo_lee@zirk.us

@Impossible_PhD The Times is suing over verbatim reproduction of their articles.
It remains to be decided if training on copyrighted material is s problem or not.
Assuming chatgpt really will spit out articles verbatim in response to appropriate prompts, I think the Times has s good case.
I'm personally much more skeptical about the training copyright violation assertions.

**Doc Impossible** @Impossible_PhD · Dec 28, 2023

Dec 28, 2023

Doc Impossible @Impossible_PhD

@colo_lee I mean, courts have already determined that AI writing is not copyrightable because it's inherently derivative work.

I'm not skeptical. It's barely a hard breath from current findings to that.

**Lee Fife - old account** @colo_lee@zirk.us · Dec 28, 2023

Dec 28, 2023

Lee Fife - old account @colo_lee@zirk.us

@Impossible_PhD Yeah, we'll see.
Human work is clearly copyrightable, and AI output is not. That seems right.
But it doesn't mean the use of copyrighted works to train AI is a violation.
It'll be interesting to see how this plays out legally