Study finds ChatGPT outperforms (human) undergraduates at certain kinds of problems that appear on the SAT.
I would venture that this says more about the SAT not measuring quite what it purports to measure than it says about whether ChatGPT is “intelligent.”
Off-the-cuff thoughts threaded below.
1/
A perennial criticism of the SAT and similar tests is that they tacitly assume specific cultural knowledge. This is a long-running controversy, but •if• that hypothesis is correct…
…wouldn’t we expect a machine that is incapable of reasoning but can regurgitate patterns present in a vast internet crawl to •outperform• humans, whose database of cultural tidbits is naturally more limited?
2/
Things to note from the article (haven’t read the study, so grain of salt here):
GPT was better than humans at…
• Classic analogy questions
• “Identify next shape in the sequence,” but with shapes first converted to text by humans
GPT was worse than humans at…
• Mathematical reasoning problems
• Problems involving physical space
• Identifying which pairs of short stories carried same essential meaning
• “Social interactions” (What does that mean in this context??)
3/
Let’s just assemble a casual grab bag of hypotheses here:
• Hypothesis: GPT is good at identifying / repeating patterns that show up when people talk on the internet
• Hypothesis: GPT is bad at logic, accuracy, things we generally call “reasoning”
• Hypothesis: Some SAT questions test for specific cultural knowledge, are not just abstract intelligence tests (whatever that means)
4/
Scan the article with that list in mind: https://www.theguardian.com/technology/2023/jul/31/chatgpt-better-than-undergraduates-at-solving-sat-problems-study-suggests
Are the findings consistent or inconsistent with those casual hypotheses?
What if we reject the article’s “intelligence” framing, and instead view the study as •testing• those hypotheses above? In what ways is it or is it not a good test of them?
What do the study’s results really tell us?
5/
Imagine an alternative headline: “SAT tests for cultural knowledge not intelligence, researchers find in ChatGPT study”
Could you write that that article? I think I could.
Would it hold up well under scrutiny? Probably not. But I’m not sure it would hold up any worse than the article as written.
DISCLAIMER: Again, haven’t read the actual study yet! If the article overstates / misstates / sensationalizes its findings, that would be par for the course.
/end
@inthehands Analogies are something I would expect it to do really well at. That's literally 99% of what I ever see it doing. Also, I thought they removed those from the SAT a long time ago? (Made me sad because they were my favorite part.)
@misc
Yeah. We could even try to view GPT as an “analogy machine” and it wouldn’t be completely wrong.