The enshittification of AI has lead to the choice of AI used by VLC to be groaned at. I even saw a post cross my feed of someone looking for a replacement for VLC.
VLC is working on on-device realtime captioning. This has nothing to do with generating images or video using AI. This has nothing to do with LLMs.
(edit: There's claims VLC is using a local LLM. It will use whisper.cpp, and not be using OpenAI's models. I don't know which models they will be using. I cannot find any reference to VLC using a LLM.)
While it would be preferred to use human generated captions for better accuracy, this is not always possible. This means a lot of video media is inaccessible to those with hearing impairment.
What VLC is doing is something that will contribute to accessibility in a big way.
AI transcription is still not perfect. It has its problems. But this is one of those things that we should be hoping to advance.
I'm not looking to replace humans in creating captions. I think we're very far from ever being able to do this correctly without humans. But as I said, there's a ton of video content that simply do not have captions available, human generated or not.
So long as they're not trying to manipulate the transcription using GenAI means, this is the wrong one to demonize.
@bedast Then don't call it AI. Call it speech to text. But if it uses a language model to more effectively predict words based on context rather than doing an analyzable mechanical local transformation, it is at least partly the "bad kind of AI" - it has the capacity to introduce biases from training data making output that "sounds right" but means the wrong thing, which is much worse than substituting nonsensical homophones now and then (which the reader will immediately recognize as mistakes). Same principle as why autocorrected text is worse than text with typos.
@bedast Enthusiastically calling new functionality "AI" signals to your audience that you're aligned with the scams and makes them distrust you.
This is not hard.
If you have privacy respecting, on-device, non-plagiarized, ethically built statistical model based processing, DON'T CALL IT "AI".
@dalias @bedast speech recognition has used language models for decades now. It was one of original applications of language models, way before they scaled up to aping shakespeare.
But even without language models, the act of transcription is very close to generative ai, as its the task of predicting the next text token, given previous tokens and encoded audio sequence.
Just because it came from university doesn't mean it's "ethical" in the modern copyrightist sense. Academia did not care much about the copyright and scraped, cleaned and used web data at will. It is only now after commercial entities have labour-replacing AIs in the oven that copyright maximalism has gained ground.