Firefox's new offline translation is certianly a nice feature.
My question is, does clicking on these buttons install non-free software on my computer?
I don't think that the free software community has settled on an answer to this question.
Did a quick test with a fresh install of firefox and it automatically uses the ML model for detecting language without being configured. It also automatically downloaded the spanish translation model even though I didn't ask it to do the translation.
I'm unsure if the language detection model is being shipped in firefox or downloaded on first run.
The https://fasttext.cc/ model is shipped in the firefox package, verified with this experiment.
edit: turns out not to be, see followups to this post
or did I? I forgot about .cache/mozilla
Looking in the source, they have horrible things like 100+ kb of minifired js that loads the fasttext LLM, but I didn't find the LLM binary itself
yeah, verified this using a fresh user account
where they hide the LLM in the source code I don't know, wild!
@joeyh
Err, does any of that show llm use for *detection*? Language classification alone turns out to be easy (trigram or ngram stats, I think? It was an "intro to NLP" problem set over a decade ago) so I'd expect they'd start with that instead...
@eichin yeah and firefox does contain such ngrams (I think used by something else)... but according to their docs about this, it's using the fasttext llm for classification.
Unless perhaps it falls back to the ngrams when fasttext can't be downloaded? Could be.
@eichin oho! browser.translations.languageIdentification.useFastText config exists and is false by default apparently. So maybe it is using the ngrams.
@eichin what difference if any there is between ngrams and a LLM when it comes to providing source code is an interesting thing to ponder...
@joeyh
Training recognizers (specifically that could *not* produce any of their training material as output) was pretty well believed-by-lawyers to not infringe the copyright of a work 15 years ago - so the more interesting part of it to ponder is what you want to enable the users of you code to do/what does "preferred form for editing" even mean. Today, disk is cheap, maybe just check in your training data :)