hachyderm.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
Hachyderm is a safe space, LGBTQIA+ and BLM, primarily comprised of tech industry professionals world wide. Note that many non-user account types have restrictions - please see our About page.

Administered by:

Server stats:

9.9K
active users

Study finds ChatGPT outperforms (human) undergraduates at certain kinds of problems that appear on the SAT.

I would venture that this says more about the SAT not measuring quite what it purports to measure than it says about whether ChatGPT is “intelligent.”

theguardian.com/technology/202

Off-the-cuff thoughts threaded below.

1/

The GuardianChatGPT better than undergraduates at solving SAT problems, study suggests By Dan Milmo

A perennial criticism of the SAT and similar tests is that they tacitly assume specific cultural knowledge. This is a long-running controversy, but •if• that hypothesis is correct…

…wouldn’t we expect a machine that is incapable of reasoning but can regurgitate patterns present in a vast internet crawl to •outperform• humans, whose database of cultural tidbits is naturally more limited?

2/

Things to note from the article (haven’t read the study, so grain of salt here):

GPT was better than humans at…

• Classic analogy questions
• “Identify next shape in the sequence,” but with shapes first converted to text by humans

GPT was worse than humans at…

• Mathematical reasoning problems
• Problems involving physical space
• Identifying which pairs of short stories carried same essential meaning
• “Social interactions” (What does that mean in this context??)

3/

Let’s just assemble a casual grab bag of hypotheses here:

• Hypothesis: GPT is good at identifying / repeating patterns that show up when people talk on the internet

• Hypothesis: GPT is bad at logic, accuracy, things we generally call “reasoning”

• Hypothesis: Some SAT questions test for specific cultural knowledge, are not just abstract intelligence tests (whatever that means)

4/

Scan the article with that list in mind: theguardian.com/technology/202

Are the findings consistent or inconsistent with those casual hypotheses?

What if we reject the article’s “intelligence” framing, and instead view the study as •testing• those hypotheses above? In what ways is it or is it not a good test of them?

What do the study’s results really tell us?

5/

The GuardianChatGPT better than undergraduates at solving SAT problems, study suggests By Dan Milmo

Imagine an alternative headline: “SAT tests for cultural knowledge not intelligence, researchers find in ChatGPT study”

Could you write that that article? I think I could.

Would it hold up well under scrutiny? Probably not. But I’m not sure it would hold up any worse than the article as written.

DISCLAIMER: Again, haven’t read the actual study yet! If the article overstates / misstates / sensationalizes its findings, that would be par for the course.

/end

@inthehands Analogies are something I would expect it to do really well at. That's literally 99% of what I ever see it doing. Also, I thought they removed those from the SAT a long time ago? (Made me sad because they were my favorite part.)

Paul Cantrell

@misc
Yeah. We could even try to view GPT as an “analogy machine” and it wouldn’t be completely wrong.