Hachyderm @hachyderm

Recent searches

Search options

Only available when logged in.

Shafik Yaghmour @shafik@hachyderm.io

"Our analysis shows that
52% of ChatGPT answers contain incorrect information and 77%
are verbose.": https://dl.acm.org/doi/pdf/10.1145/3613904.3642596

If you have to basically verify everything then what is the point? I can see how it may point you in the right direction and that could be helpful but it could easily do the opposite as well.

It is essentially a coin flip.

Abstract from linked paper:

Q&A platforms have been crucial for the online help-seeking behavior of programmers. However, the recent popularity of ChatGPT is
altering this trend. Despite this popularity, no comprehensive study
has been conducted to evaluate the characteristics of ChatGPT’s answers to programming questions. To bridge the gap, we conducted
the first in-depth analysis of ChatGPT answers to 517 programming
questions on Stack Overflow and examined the correctness, consistency, comprehensiveness, and conciseness of ChatGPT answers.
Furthermore, we conducted a large-scale linguistic analysis, as well
as a user study, to understand the characteristics of ChatGPT answers from linguistic and human aspects. Our analysis shows that
52% of ChatGPT answers contain incorrect information and 77%
are verbose. Nonetheless, our user study participants still preferred
ChatGPT answers 35% of the time due to their comprehensiveness
and well-articulated language style. However, they also overlooked
the misinformation in the ChatGPT answers 39% of the time. This
implies the need to counter misinformation in ChatGPT answers to
programming questions and raise awareness of the risks associated
with seemingly correct answers.

May 24, 2024, 08:47 PM··Web

6boosts·21favorites

**Cassandrich** @dalias · May 24, 2024

May 24, 2024

Cassandrich @dalias

@shafik This. The whole magic of is exactly the same phenomenon as cold reading by psychics. It's not that it has any significant likelihood of being correct. It's that it frames everything with filler that sounds impressive to gullible fools.

**Kornel** @kornel@mastodon.social · May 24, 2024

May 24, 2024

Kornel @kornel@mastodon.social

@dalias @shafik
1. This data is based on GPT 3.5. Their small test of GPT 4 suggests it'd have ~66% accuracy.
2. These are not binary yes/no answers. The random baseline is 0, not 1/2.
3. The biggest error category in this test was misunderstanding the question. The chat interface allows adding clarifications, the test did not. Factual errors were less common.

**Shafik Yaghmour** @shafik · May 25, 2024

May 25, 2024

Shafik Yaghmour @shafik

@kornel @dalias 66% while better than a coin flip is still astonishingly bad.

This is before we even consider deeper issues such as hallucinating python packages: https://www.lasso.security/blog/ai-package-hallucinations and hackers squatting on these packages.

Adversarial poisoning is another issues not yet deeply explored.

The issues get very hairy once you dig into them. Folks are just scratching the surface right now.

LLMs devs are clearly not thinking deeply about infosec and the decades of lessons we have.

www.lasso.securityLasso Security Research: Diving into AI Package HallucinationsDiscover Lasso Security's recent research on AI Package Hallucinations, extending the attack technique to GPT-3.5-Turbo, GPT-4, Gemini Pro, and Coral.

**anmey** @anmey@social.anoxinon.de · May 25, 2024

May 25, 2024

anmey @anmey@social.anoxinon.de

@shafik hmm apart from the slightly distorting basics, which is probably the intention, we should also consider this for human-human communication.
@kornel @dalias

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back