Have you considered federating hashes of positive matches and working with the Lemmy team to not outward federate on a local positive match (and potentially have the hash go instead)?
The former can reduce overhead and electricity use, and the latter will stop more distribution and aid those sans-GPU who can’t run it.
Over time, the hash DB will grow and get better. In addition, perhaps there is metadata that can be used to track image similarity to positive matches to reduce false-positives, but I imagine that algorithm would be much more complicated.
Hashes won’t work for novel GenerativeAI images. For this kind of thing we need to be sharing tensors and comparing distances so that it catches format changes and compression artifacts. Theoretically possible. Practically, I don’t know how feasible it is.
How large is each tensor? If it can be stored as JSON or Base64 and is of sufficiently small size, integration into ActivityPub wouldn’t be all that bad. The time consuming part would likely be integration into Lemmy itself.
Another option would be a separate service, similar to how Lemmy Explorer works, where a list of the latest tensors can be downloaded. It’s centralized vs distributed, but probably easier to implement. Just an API admins can register for to send and get tensors.I would be happy to assist with this if it is a route you would like to explore. Feel free to DM me.
@fmstrat each tensor is small. The problem is when you have millions of them and you have to compare each image to each. You can't index this. It has to be one by one. And you still need to covert the new image to tensors as well,which still needs gpu. I just don't see anything useful here. The current system would be faster.
Good point. I wonder how the commercial hash-based systems are doing it…