"Our analysis shows that
52% of ChatGPT answers contain incorrect information and 77%
are verbose.": https://dl.acm.org/doi/pdf/10.1145/3613904.3642596
If you have to basically verify everything then what is the point? I can see how it may point you in the right direction and that could be helpful but it could easily do the opposite as well.
It is essentially a coin flip.
@shafik This. The whole magic of is exactly the same phenomenon as cold reading by psychics. It's not that it has any significant likelihood of being correct. It's that it frames everything with filler that sounds impressive to gullible fools.
@dalias @shafik
1. This data is based on GPT 3.5. Their small test of GPT 4 suggests it'd have ~66% accuracy.
2. These are not binary yes/no answers. The random baseline is 0, not 1/2.
3. The biggest error category in this test was misunderstanding the question. The chat interface allows adding clarifications, the test did not. Factual errors were less common.
@kornel @dalias 66% while better than a coin flip is still astonishingly bad.
This is before we even consider deeper issues such as hallucinating python packages: https://www.lasso.security/blog/ai-package-hallucinations and hackers squatting on these packages.
Adversarial poisoning is another issues not yet deeply explored.
The issues get very hairy once you dig into them. Folks are just scratching the surface right now.
LLMs devs are clearly not thinking deeply about infosec and the decades of lessons we have.