Synthetic data are not the solution.
"Once the training began, researchers discovered a problem in the data: It wasn’t as diversified as they had thought, potentially limiting how much Orion would learn. "
This is beyond obvious from a statistical perspective.
@cigitalgem I'm thinking about how naive (and/or desperate) these statements make them look. Aren't these guys supposed to be "AI industry leaders"? One would think they'd be able to realize the limited applicability of synthetic data to the real world, not to mention the issue of recursive pollution to top it off
@elias_sorensen I had some interesting talks with the synthetic data guys in the fall. They were delusional and did not listen to reason.
@cigitalgem some of these guys are just chasing their own tails in their hype echo chamber at this point. While the transformer architecture stuff is cool, I wonder how overemphasis on LLMs will delay other necessary advancement in the field of data science. Also people getting excited by the benchmarks is driving me nuts. How do we know this isn't just overoptimization for these benchmarks...
@elias_sorensen @cigitalgem I prefer the term "high on their own supply".