hachyderm.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
Hachyderm is a safe space, LGBTQIA+ and BLM, primarily comprised of tech industry professionals world wide. Note that many non-user account types have restrictions - please see our About page.

Administered by:

Server stats:

8.9K
active users

Synthetic data are not the solution.

"Once the training began, researchers discovered a problem in the data: It wasn’t as diversified as they had thought, potentially limiting how much Orion would learn. "

This is beyond obvious from a statistical perspective.

wsj.com/tech/ai/openai-gpt5-or

@cigitalgem I'm thinking about how naive (and/or desperate) these statements make them look. Aren't these guys supposed to be "AI industry leaders"? One would think they'd be able to realize the limited applicability of synthetic data to the real world, not to mention the issue of recursive pollution to top it off

@elias_sorensen I had some interesting talks with the synthetic data guys in the fall. They were delusional and did not listen to reason.

@cigitalgem some of these guys are just chasing their own tails in their hype echo chamber at this point. While the transformer architecture stuff is cool, I wonder how overemphasis on LLMs will delay other necessary advancement in the field of data science. Also people getting excited by the benchmarks is driving me nuts. How do we know this isn't just overoptimization for these benchmarks...