Hachyderm @hachyderm

Recent searches

Search options

Only available when logged in.

**Paul Cantrell** @inthehands · Jul 31, 2023 *

Jul 31, 2023 *

Study finds ChatGPT outperforms (human) undergraduates at certain kinds of problems that appear on the SAT.

I would venture that this says more about the SAT not measuring quite what it purports to measure than it says about whether ChatGPT is “intelligent.”

https://www.theguardian.com/technology/2023/jul/31/chatgpt-better-than-undergraduates-at-solving-sat-problems-study-suggests

Off-the-cuff thoughts threaded below.

The GuardianChatGPT better than undergraduates at solving SAT problems, study suggests By Dan Milmo

**Paul Cantrell** @inthehands · Jul 31, 2023

Jul 31, 2023

Paul Cantrell @inthehands

A perennial criticism of the SAT and similar tests is that they tacitly assume specific cultural knowledge. This is a long-running controversy, but •if• that hypothesis is correct…

…wouldn’t we expect a machine that is incapable of reasoning but can regurgitate patterns present in a vast internet crawl to •outperform• humans, whose database of cultural tidbits is naturally more limited?

**Paul Cantrell** @inthehands · Jul 31, 2023 *

Jul 31, 2023 *

Paul Cantrell @inthehands

Things to note from the article (haven’t read the study, so grain of salt here):

GPT was better than humans at…

• Classic analogy questions
• “Identify next shape in the sequence,” but with shapes first converted to text by humans

GPT was worse than humans at…

• Mathematical reasoning problems
• Problems involving physical space
• Identifying which pairs of short stories carried same essential meaning
• “Social interactions” (What does that mean in this context??)

**Paul Cantrell** @inthehands · Jul 31, 2023

Jul 31, 2023

Paul Cantrell @inthehands

Let’s just assemble a casual grab bag of hypotheses here:

• Hypothesis: GPT is good at identifying / repeating patterns that show up when people talk on the internet

• Hypothesis: GPT is bad at logic, accuracy, things we generally call “reasoning”

• Hypothesis: Some SAT questions test for specific cultural knowledge, are not just abstract intelligence tests (whatever that means)

**Paul Cantrell** @inthehands · Jul 31, 2023

Jul 31, 2023

Paul Cantrell @inthehands

Scan the article with that list in mind: https://www.theguardian.com/technology/2023/jul/31/chatgpt-better-than-undergraduates-at-solving-sat-problems-study-suggests

Are the findings consistent or inconsistent with those casual hypotheses?

What if we reject the article’s “intelligence” framing, and instead view the study as •testing• those hypotheses above? In what ways is it or is it not a good test of them?

What do the study’s results really tell us?

The GuardianChatGPT better than undergraduates at solving SAT problems, study suggests By Dan Milmo

**Paul Cantrell** @inthehands · Jul 31, 2023 *

Jul 31, 2023 *

Paul Cantrell @inthehands

Imagine an alternative headline: “SAT tests for cultural knowledge not intelligence, researchers find in ChatGPT study”

Could you write that that article? I think I could.

Would it hold up well under scrutiny? Probably not. But I’m not sure it would hold up any worse than the article as written.

DISCLAIMER: Again, haven’t read the actual study yet! If the article overstates / misstates / sensationalizes its findings, that would be par for the course.

/end

**Jesse** @misc@mastodon.social · Jul 31, 2023

Jul 31, 2023

Jesse @misc@mastodon.social

@inthehands Analogies are something I would expect it to do really well at. That's literally 99% of what I ever see it doing. Also, I thought they removed those from the SAT a long time ago? (Made me sad because they were my favorite part.)

Paul Cantrell @inthehands@hachyderm.io

@misc
Yeah. We could even try to view GPT as an “analogy machine” and it wouldn’t be completely wrong.

Jul 31, 2023, 09:28 PM··Ivory for iOS

1boost·3favorites

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back