hachyderm.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
Hachyderm is a safe space, LGBTQIA+ and BLM, primarily comprised of tech industry professionals world wide. Note that many non-user account types have restrictions - please see our About page.

Administered by:

Server stats:

8.9K
active users

Shafik Yaghmour

"Our analysis shows that
52% of ChatGPT answers contain incorrect information and 77%
are verbose.": dl.acm.org/doi/pdf/10.1145/361

If you have to basically verify everything then what is the point? I can see how it may point you in the right direction and that could be helpful but it could easily do the opposite as well.

It is essentially a coin flip.

@shafik This. The whole magic of 🐈💨 is exactly the same phenomenon as cold reading by psychics. It's not that it has any significant likelihood of being correct. It's that it frames everything with filler that sounds impressive to gullible fools.

@dalias @shafik
1. This data is based on GPT 3.5. Their small test of GPT 4 suggests it'd have ~66% accuracy.
2. These are not binary yes/no answers. The random baseline is 0, not 1/2.
3. The biggest error category in this test was misunderstanding the question. The chat interface allows adding clarifications, the test did not. Factual errors were less common.

@kornel @dalias 66% while better than a coin flip is still astonishingly bad.

This is before we even consider deeper issues such as hallucinating python packages: lasso.security/blog/ai-package and hackers squatting on these packages.

Adversarial poisoning is another issues not yet deeply explored.

The issues get very hairy once you dig into them. Folks are just scratching the surface right now.

LLMs devs are clearly not thinking deeply about infosec and the decades of lessons we have.

www.lasso.securityLasso Security Research: Diving into AI Package HallucinationsDiscover Lasso Security's recent research on AI Package Hallucinations, extending the attack technique to GPT-3.5-Turbo, GPT-4, Gemini Pro, and Coral.

@shafik hmm apart from the slightly distorting basics, which is probably the intention, we should also consider this for human-human communication.
@kornel @dalias