Jesus, even NaNoWriMo has decided to get it wrong for no good reason.
@ajroach42 Context? I see they have an announcement on their site about closing the forums and other places where minors may be present and doing background checks, so I assume there was some sort of harassment going on, but I can't seem to find any details about it?
@sam They appear to have gone all in on AI, and are claiming that any concerns about AI are invalid because of classism and ableism?
or at least that's what I took away from the article being shared today.
@ajroach42 @sam @forestine I can see that. I know plenty of writers who have to *hide* because they are using AI tools. I benefit so much from AI tools due to (at least that's what I can trace it to) my neurodivergence. There are plenty of uses of text models that don't entail "ripping off other artists work".
1/
@ajroach42 @sam @forestine
Similarly, I see at meetups and when mentoring online so many less privileged kids benefit from learning tech with LLMs and achieving things they never thought they could do. Being assisted gives them the impulse and confidence necessary to continue putting in the actual work and in fact allows me to mentor them towards fundamentals that much more quickly.
2/
@ajroach42 @sam @forestine
The articles mentions "AI criticism often has undertones of classism and ableism" and yeah, I *totally* see that. It's almost impossible say on mastodon to bring up #LLMs and how they help me with my ADHD/autism brain and not get responses invalidating that.
3/3
The company with which nanowrimo has partnered appears to operate using services built on plagiarism and without consent.
It's possible that I've misunderstood their business model (or that I'm just generally biased against the AI industry as one built on plagiarism and the violation of consent) but I didn't see evidence to the contrary.
NaNoWriMo are defending this decision by pointing out the potential assistive benefits of AI.
I'm not here to argue against those benefits. I've seen people receive a big confidence boost from working with AI assistive tools (and I've seen people get themselves in to a lot of trouble when those AI assistive tools malfunctioned in ways they did not notice, but that's neither here nor there.)
I am here to argue against a creative organization which is focused on highlighting individual creativity promoting a suite of services built on plagiarism, and then hiding behind disability and nuerodiversity to shield themselves from criticism.
If I've misunderstood the organization at hand, and they have managed to build AI powered services that are not centered on plagiarism and the violation of consent, that's wonderful. WriMo, in defending themselves, should perhaps lead with that instead of trying to use people with disabilities and neurodiversity as human shields.
I hope that we can agree that consent is important, plagiarism is generally not great, and NaNoWriMo should center an ethical argument, if they have one, and not use disabled people as a shield from criticism if they do not.
@ajroach42 @sam @forestine There is certainly a lot of criticism are how training corpuses are built, but I don't see that company promoting their product as a plagiarism tool or a tool to replace writers, and instead clearly design and position it as a writing aid.
I personally did react (I have my own chips on my shoulder) because as you might have seen, the mere fact that I mentioned that I agree that AI criticism can have undertones of ableism and classism got me... blocked.
@mnl @ajroach42 @sam it's because those undertones are not there. use a different tool. this thing is destroying the planet.
@mnl @ajroach42 @sam i have had ableism and classism used against me in academia against my art and writing and this argument is crap.
@mnl @ajroach42 @sam ai is a plagiarism tool because their datasets are plagiarized, scraped without people's consent
@ajroach42 @sam @forestine For context, I assume we are talking about this company: https://prowritingaid.com
@mnl @sam @forestine Regardless of how they market the tool, unless they can clearly demonstrate that it was built with consent, then it absolutely was not. That's just the truth of the industry.
And if they have somehow managed to build a useful AI tool that isn't built on plagiarism, there's still the problem of LLMs consuming significantly more power than basically anything else we do with computers. Boiling the oceans, as it were.
I can certainly understand having an emotional reaction if you find that these tools provide a genuine assistive value to you, and you see them being criticized. I think a lot of that criticism does assume that the assistive argument is made in bad faith, not because it can't be true, but because it is used as a distraction from the actual issues at hand.
AI as assistive tech is used as a red herring by companies looking to launder their own unethical behaviors. The fact that it is also occasionally truly useful as an assistive technology doesn't negate the rest of the criticisms leveled against it.
(And AI based technology is, in general, unreliable as an assistive technology because it is unreliable in general. It can absolutely be helpful occasionally, but at what cost?)
That being said, we're likely at an impasse here, so I'm going to bow out.
@ajroach42 @mnl @hachyderm.io @sam
i would add that tools existed before AI that will exist long afterwards
@ajroach42 @sam @forestine@sunny.garden You're doing the thing that I think is ableist here too. The debate around the ethics of building training corpuses is way out of my league, but I would certainly welcome models built on more nuanced approached than the pile or whatever.
But, "And AI based technology is, in general, unreliable as an assistive technology because it is unreliable in general. It can absolutely be helpful occasionally" to me means that I somehow am not able to use something "unreliable"
@ajroach42 @sam It's to me akin to me telling a person on an electric wheelchair that wheelchairs are not as agile as a person on two legs and that the battery is using rare earth minerals of questionable provenance.
Yes, valid, and also I shouldn't necessarily bring it up if people say they find wheelchairs useful. Maybe the better take is to build nicer wheelchairs, now that we see that they are useful.
@ajroach42 @sam Like, I'm perfectly aware that say, large language models are... statistical models of human language and are not omniscient gods of reasoning that always do the perfect thing.
But I know how to make reliable, practical use of them.
Then you're doing better than me! I absolutely do not know how to make reliable practical use of them, and have been made to look foolish every time I've tried.
But, that's absolutely not the point.
The point is that nanowrimo is a group that exists to support writers, and has partnered with an product that is built on the exploitation of writers, and they are using disabled people as an excuse to do so.
@ajroach42 @sam I disagree with the last part, and while argumenting is outside of my depth, it's like... "we live in a society". We were all ok using search engines which are just as much statistical engines trained on scraped text for a long time.
I personally think that the more immediate access to those models (which you can run locally) to be more ethical than a minmax ad-revenue, but where you draw the boundary ethically is up to you.
@ajroach42 @sam I certainly fully understand people not participating and criticizing nanowrimo for that choice.
But even bringing that across, that one can enjoy using LLMs and finding them useful *and* have a nuanced take on how they were built or how they are operated, is *really* difficult. I think that's what the "categorical" in nanowrimo's statement refers to, and makes me think that whoever is behind the statement has probably a more nuanced take too.
I react strongly because it makes any kind of debate very fraught for me to even participate in, because I'm pretty sure I'll just get yelled at as plagiarist lazy incompetent techbro instead of tackling much more interesting things like "can we actually use this technology for progressive/subversive/empowering means, for example for less privileged categories of people" (strong yes from me).
(also, sorry for the walls of text, but i guess you know where that comes from now :)
And I get that.
But I think we have a lot more questions to answer before we get to "can we use this technology for positive ends" and well before we answer those questions, we'll have to grapple with ethics of how LLMs come in to existence.
and, before grappling with that question, an organization that is supposed to be advocating for writers has chosen to partner with a company built on plagiarism, and is drawing criticism for that.
I'd argue that search engines were also pretty unethical in their initial implementation, but that the open nature of the web made them a little easier to swallow and their voluntary consent mechanism came pretty early on in the life of the search engine.
It's not perfect, but it's better than the situation with LLMs, and it's better than the situation with LLMs will ever be.
It's a false equivalency.
@mnl @sam But that's not what happened.
I said Nanowrimo was doing something unethical and using disabled people as a shield for their actions, and you stepped in to shield nanowrimo from the consequences of their actions as a disabled person.
It doesn't matter if AI makes a useful assistive tool in this particular context. What matters is that Nanowrimo has decided to behave unethically, and is using disabled people as an excuse to do so.
@ajroach42 @sam I guess I am missing a big part of context then. I reacted indeed to that part, and already find the linked article pretty specious in its headline. Nowhere do I see that nanowrimo says not writing your novel with AI is classist/ableist (who would say that?) and I do agree with nanowrimo's statement, which is:
...
@ajroach42 @sam
"NaNoWriMo does not explicitly support any specific approach to writing, nor does it explicitly condemn any approach, including the use of AI."
and
"We also want to be clear in our belief that the categorical condemnation of Artificial Intelligence has classist and ableist undertones, and that questions around the use of AI tie to questions around privilege. "
I find this totally fine, and didn't find more linked in terms of criticism from the article linked.
@ajroach42 @sam It takes just one click on the twitter thread announcement to see a flurry of comments like "stop destroying the craft". How do you think that feels as a person who finally is able to overcome a disability and better participate in something (say, me writing documents at work and my life shit) to hear they are destroying the craft?
Just because they have a sponsor that uses AI (which AI? who knows?) for a tool that is clearly geared at assisting (not replacing) writers.
@mnl @sam I'm not sure how it feels, because I've never found a good way to use AI as an assistive tool to supplement my personal abilities and disabilities.
I'm also not going to take responsibility for the flurry of comments that other people have made on twitter.
Regardless of what the tool is *for*, there is very little chance it was not built on the exploitation of writers.
An organization that is supposed to advocate for and celebrate writers being sponsored by a company who's product can only exist through the exploitation of writers is bound to make some people angry.
I'm certain some of these reactions are going to be overblown. The linked article was, for example.
But that does not change the fact that this is a pretty obvious and easy mistake.
@ajroach42 @sam One of the thing that irritates me is how "AI" is being hyped as something that can "do X" or reason or whatever, and thus the criticism reacts to that. I think the "working" usecase of LLMs is massaging language around and being able to match patterns and apply "transformations" to text, and that that can often appear as "magic", at least at first glance and demo stage.
If you approach their use by centering human communication and finding which "massaging" steps can help ...
@ajroach42 @sam humans communicate, practical and "reliable" usecases become much more apparent. For example, meeting notes or code documentation or test coverage tend to be notoriously poor. But most of the knowledge shared in the meeting and all the knowledge about software is contained in the transcript or in the code itself.
Models along with human guidance can transform meeting transcripts into clear and concise actionable points and knowledge entries ...
@ajroach42 @sam while models can also generate nice READMEs and HOWTOs and error strings and logging lines and additional unit tests (or at least their names) and tools that assist people working on that codebase. Do they suck at writing code? Yeah, pretty much. Do they do ok at making up APIs or transforming a rambling paragraph into a bullet point list? 100%. Do they use much compute to do that? Not really.
I'm just as upset by tools that burn massive amounts of inference for no reason.
@mnl @sam Again, I'm not talking about if AI has practical use cases (and, again, I worked in the industry, and I've seen the failure modes. I absolutely would not trust any existing LLM to do any of the tasks that you describe here without enough human oversight that it would be easier for most people to do the same task without the LLM, unless it's non-critical enough that it's difficult to justify using the LLM to begin with.)
There are lots of potential practical use cases for AI.
They all come with a huge embedded energy cost, even if the actual end use doesn't consume much power, because of the power of training (and the actual end use consumes more power than you might expect, even when the results appear very quickly.)
But the core of the issue is having an LLM tool that has not disclosed how its models were trained sponsor a celebration of writing, without clear public assurances that they will not use text from WriMo to train future models without author consent.
That's one of the only statements they've made on AI. It's not explicitly relevant in this context, but it's one of the only things they've said publicly, so it's all we have to go on.
But they partnered with a company that sells an AI product that is almost certainly built on the exploitation of writers.
And they are an organization that claims to champion writers.
The article in question is pretty bad, and makes some bad faith assumptions and claims (including the awful headline that I somehow entirely missed) but the core of the issue remains.
@ajroach42 @sam But is the core of the issue then the fact that we don't know what data and models prowritingaid is using, or the fact that they (I think genuinely) do so out of wanting to make writing more accessible?
We can (easily, I see your point for sure) agree to disagree, and I apologize if I came in barging in a thread that was more for venting, but I do think of LLMs as assistive technology *and* see a lot of ableism/classism in the categorical rejection of LLMs I encounter.
I spent 5 years of my life working for a company that trained many LLMs. I don't have to know which models they're using to know that it would be basically impossible for them to have done so in an ethical fashion, and if they had done so, it would be the first and only thing they said about their work.
(Because they'd have spent hundred of millions of dollars to do it.)
I'm very familiar with the field. I make a lot less money now than I used to because I decided I couldn't participate in something as morally bankrupt as what I saw happening in AI.
The field, as a whole, is built on plagiarism and ignoring consent and any positive outcomes that come from that have to be viewed in light of the toxic stew from which they are born.
If they want to make writing more accessible, that's great. If they can do it in a way that doesn't abuse writers in the process, they have my blessing.
I've seen no evidence that this is what is happening, and I have enough personal experience to believe that without evidence to the contrary, they are most likely engaging in a massive violation of consent.
Classism and ableism absolutely creep in to the LLM conversation frequently, and that sucks.
But LLMs are almost categorically inherently unethically produced, and almost categorically consuming more power than almost anything else we do on a computer (at a time when that power consumption is a leading contributor to global warming.)
If they are also useful for you, and you're comfortable with those tradeoffs, that's fine. It's great even. I'm glad you have a tool that is useful for you.
But that still leaves us with an LLM based company sponsoring a writing competition.
@ajroach42 @sam I also refused two offers from openai with life-changing amount of money, I feel you. Discussion about what consent means in building training corpuses is out of my depth, I am coming from a pretty punk rock "information wants to be free" approach, use libgen daily (and also spend 4 figures on books every month), and see the situation a bit similar to say, my lithium batteries, except that here I can't even quantify the physical harm.
@ajroach42 @sam Ironically, I believe the only profession that is legitimately at threat (in the long run) is software developers as they are today (something I both embrace and also makes me anxious about what will become of people in the same profession as me).
@ajroach42 @sam
TO close this off, and thanks for engaging with me, here's a talk which maybe sheds a bit more light on my uses. I don't do this to toot my horn, but because I do indeed believe this is transformative technology for progressives, just not in the way 99% of the public discourse addresses it, and I am annoyed that "my peeps" doesn't see its subversive and empowering potential.
https://youtu.be/zwItokY087U?si=NGbPN7bW5RRorvmu
https://media.ccc.de/v/gpn21-80-go-go-golems-computers-should-compute
I can't subscribe to "information doesn't want to be free" as a statement or as a "punk" ideology.
Information doesn't want anything, information doesn't have agency. People want to be free.
Part of personal freedom is consent.
LLMs as they exist today were built without consent.
But I suppose this reveals the fundamental disagreement at the core of this conversation. We have different ideas of what is harmful and different tolerances for harm.
Thanks for chatting.
@ajroach42 @sam I was more using it as a signifier of my roots in hacker/warez/cracking groups :)
https://en.wikipedia.org/wiki/Information_wants_to_be_free (sorry if you were aware).
And yes we disagree on that core assumption, and also that doesn't mean we can't have interesting and mutually nourishing conversations around it, because I think a fair amount of our values still overlap (I got a lot of your zines! :} )
@mnl @sam I'm familiar with the roots of the phrase, I just think it's being misapplied in this context (and, again, information doesn't want anything. It has no agency. What do people want?)
I was paraphrasing Doctorow's Information Doesn't Want to be Free, which I think is a pretty great text and I would love to see updated for the era of the LLM.
https://en.wikipedia.org/wiki/Information_Doesn't_Want_to_Be_Free
@ajroach42 @sam doesn't that book actually support more my side of the argument?
As I understand the criticism around consent/plagiarism is that the training corpus for the pretraining comes from big piles of relatively indiscriminately scraped data including copyrighted works.
...
@ajroach42 @sam
My own more articulated criticism (but I'm really out of my depth) around the training data is:
- noone knows wtf is in it, and the companies don't either. Just requiring the companies to lay open their datasets would put a proper damper on things
- the labor for the reinstruct pass is based on gig economy and outsourced. This is where I see opensource communities being able to *crush* the big players at their game
@mnl @sam
>doesn't that book actually support more my side of the argument?
No! It doesn't support either side of the LLM conversation, because it was written in a world where LLMs didn't exist. Hence my comment that I'd like to see the book updated for LLMs.
Most of what Doctorow writes in that book is about how people might be able to get paid on the internet, which most LLMs sidestep by making sure no one gets paid.
>As I understand the criticism around consent/plagiarism is that the training corpus for the pretraining comes from big piles of relatively indiscriminately scraped data including copyrighted works.
More or less, yeah. Indiscriminately scraping lots of things: old tweets, tumblr posts, old blogs, author's websites, books, comment sections. These datasets are huge, and full of things that were acquired under shady circumstances, often in direct violation of the TOS or the license on a given site (and they're still growing. It's not like this was a one time thing that happened and was done. These datasets are still being created, and they're still being created without the consent of most of the people who's writing is being included.)
> - noone knows wtf is in it, and the companies don't either. Just requiring the companies to lay open their datasets would put a proper damper on things
That's not exactly true. Some of these training datasets are public and we can see exactly what's in them (and it's bad!) But yeah, there are also a lot of unknowns and none of these companies have been especially forthcoming, because they're waiting for the courts to tell them that they can do what they're doing, in spite of the fact that it's probably actually illegal.
@ajroach42 @sam It's been a while since I read it, and thanks for bringing it to my attention again. I was thinking of the fact that so much copyrighted material is in the training sets, which to me seems to be the same violation of consent than say, having PDFs of your work on libgen.
I'm gonna read/skim it again and I do indeed hope to see it updated for #LLMs. Cory Doctorow unnerves me with his tech-demagogy so I don't follow him, but I'll grep around a bit to see his writing on the topic.
@ajroach42 @mnl That was the title of the article, FWIW, I also linked directly to the zendesk FAQ thing because I thought the title took it out of context and made it sound like NaNoWriMo was saying something they weren't (though I'm still not okay with what they did say, so maybe it's a minor point)
@ajroach42 @sam that's literally the title.
@ajroach42 @sam no worries :]
@ajroach42 @mnl @sam @forestine having been an ML and dealing with hq directly this doesn't surprise me in the least.
They want to look like the good guy but they just want money and they don't care who they have to abuse and step over or steal from to get that done.
I can absolutely believe they are just hiding behind these arguments and don't give a shit about the issue. They just want that VC techbro sponsorship money.
@mnl @ajroach42 @sam @forestine the problem is that the big popular LLMs were created by ripping off artists’ work. I use them myself for certain things (mainly writing code) so I can’t condemn people who do, but also you can’t pretend there’s a non-exploitative use of these tools.