Hachyderm @hachyderm

Recent searches

Search options

Only available when logged in.

**Ben Weinstein-Raun** @ben@superstimul.us · Nov 16, 2024

Nov 16, 2024

Ben Weinstein-Raun @ben@superstimul.us

I've said this and things like it elsewhere, but o1-preview feels qualitatively better than other LLMs to me, in a way that I don't think I experienced even with GPT-4 vs GPT-3. My implicit superintelligence timelines actually grew a bit longer with GPT-4's release, and have grown a bit more in the time since, but using o1-preview has shrunk them again. It's also increased my felt probability that AI systems will be scheme-y in ways that are hard to detect.

**Ben Weinstein-Raun** @ben@superstimul.us · Nov 16, 2024

Nov 16, 2024

Ben Weinstein-Raun @ben@superstimul.us

One aspect of the qualitative difference is that o1-preview appears better at answering difficult science and engineering questions in 10-30s than I am in 10m with Google (on topics I'm unfamiliar with), to roughly the same accuracy, which hasn't seemed true of 4o or Claude. I fairly often catch 4o in reasoning mistakes or simple fabrications, but this has only happened maybe once with o1-preview, in about a month of using up my credits. And in that case I'm not even hugely confident that it was wrong.

It's seriously limited by having a 3-year-old knowledge cutoff and no access to external tools, and yet I find myself needing to be selective about which things I ask it for fear of running out of credits.

[object Object] @objectObject@hachyderm.io

@ben interesting. I haven't found it to be much better than trying a different prompting approach when 4o loses the plot. What's an example prompt that does much better than 4o?

(That said, I've been mostly been using Claude Sonnet 3.5 lately. Also my recent science/engineering work has been mostly stuff I'm already good at.)

Nov 23, 2024, 05:35 AM·

0boosts·0favorites

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back