This from @caseynewton brings to the foreground something that’s been eating at me:
What exactly do Google and OpenAI and Microsoft and the rest of the AI bubble think is going to happen here when LLMs disincentivize the creation of the data that feeds them?!
https://www.platformer.news/google-io-ai-search-sundar-pichai/
Working that out a bit:
The premise of the ad-supported web is that you profit by driving traffic to your site. The premise of LLMs is in large part to •supplant• traffic to web sites. LLMs need people to keep creating web sites. (More in the article.)
In private, behind closed doors, is the expectation that…well, what? That people will just keep posting useful bot-visible information for free?
Is there an expectation that AI model trainers will end up paying for content, and ad revenue is supplanted by “LLM training data fee” revenue? (I can’t imagine GoogleAISoft’s investors think that’s the high-ROI lucrative future they’ve buying.)
Do they privately know this LLM stuff is a bubble, and expect it to burst before data source die-out kills it?
At a guess, it’s all just FOMO greed-panic and nobody’s thinking that far ahead. But truly, I wonder! What •do• they think will happen?!
Yeah, I think @sysadmin1138 nailed it.
https://octodon.social/@sysadmin1138/112442908788514243
@inthehands I think they're hoping to make underpaid canyons generate the content. That would be very strange.
@ravenonthill Have cayons ever been paid fairly? Did the Grand Canyon ever make a cent off its fame?
@inthehands DYAC, but it was funny.
@ravenonthill
I figured, and figured I should embrace it!
This situation reminds me of the cartoon strip Zippy The Pinhead, particularly the character Shelf Life - a fast-talking schemer always looking for "the next big thing"
https://en.wikipedia.org/wiki/Zippy_the_Pinhead#Characters_and_story
https://www.zippythepinhead.com/pages/aacast.html
https://zippythepinhead.com/Merchant2/merchant.mv?Screen=PROD&Store_Code=ZTP&Product_Code=1stsl
@inthehands @sysadmin1138 not only privately held data that already exists
Just like you get access to certain services in exchange for you watching ads, in the future companies may compel you to produce new data for the continued use of services
Who wants to be the first company offering free or reduced price phone service in exchange for them recording your calls for AI training purposes?
@crazybutable @sysadmin1138 Oh god, you’re right
@inthehands @sysadmin1138 (I’m thinking of that apple patent of a camera in the TV that makes you stand up and say “McDonald’s!” before the ad unpauses)
Any one party consent state can have an army of people with small recorders in their backpacks recording all nearby conversations in any park, restaurant, theater, or any other public space
People who submit recordings of any conversation can get credits / coupons / etc based on how much useful text is recoverable from them
@crazybutable @inthehands We've been training machine learning vision models for cars for literal years at this point?
@inthehands @sysadmin1138 or sports venues putting clauses in saying they can use any captured footage of you to train their AIs (you already generally agree to a waiver to appear in ads)
@crazybutable @sysadmin1138 I mean, you’re still thinking small. Your phone doesn’t just record your calls; the virtual assistant doesn’t just record your interaction. The mic of every internet-connected device is always on, and records and submits •everything• for training.
@inthehands @sysadmin1138 I mean, I trust the Apple tech stack, that they are not doing that without getting my informed consent, and that they are requiring any app that uses the microphone to gain that consent
Apple may be bastards but they aren’t liars. Of course that could change in the future, I don’t trust that to be true forever.
@inthehands @sysadmin1138 imagine every seat with a microphone when you travel on Spirit airlines
Or “these premises under audio/visual surveillance” sign with tiny buried fine print with what they do with the recordings.
Restaurants get a small kickback for installing the “customer insights” cameras and microphones
@inthehands @crazybutable When I said round two, I meant what's happening right now. You can already hear it in investor calls. What you're talking about is in the next stage, when the regulatory environment (or work/educational surveillance market TOSed enough) permits it, and we've solved the classification problem better.
@inthehands @crazybutable So, of course, today is the day I notice (https://awscommunity.social/@Quinnypig/112452921509401306) that Slack is giving themselves permission to train models on DMs.
@sysadmin1138 @inthehands any possible source of data from real humans
It’s like when explorers landed on islands in the pacific and discovered mountains of guano they could use as fertilizer
Strip that shit and sell it! Great business until it’s all mined and gone
@inthehands if we're to believe the hype I guess the idea is that magic AGI that no longer require training data is the goal before the collapse
@inthehands I think that most of these execs are making decisions in terms of the next quarter's share price, and little else. "How do we train the models when people stop creating data," is a mess they'll leave for the future. And when they get there, you can expect them to continue to do the most dystopian thing they can think of.
@inthehands The next round will be models trained on privately held data, not internet spidering. Think gdrive, box, and discord. Places where people mostly don't have a choice but to engage.
@inthehands @caseynewton interesting angle. As an app developer one of my concerns with big tech is much like how search UX has been de-prioritised & broken, the app ecosystem could face a similar fate. Since many people are now relying on apps for day-to-day activities the consequences are significant.
I think we could end up with not only a broken web UX but broken native device UX too.
That's before considering the impact of use of LLMs for app development, discovery & delivery.
@inthehands @caseynewton It's not hard to imagine big tech trying to supplant apps with their own AI powered 'functions'. Much as search was turned from web search into everything search, AI responses could become not just text and images but actual native code: Generated apps that interact with local and web services for you, using your data (which it coerces you to give it access to) and guess what 'who needs apps anymore'. Great opportunity for them to create another monopoly. Just a thought.
@inthehands @caseynewton I think part of the answer will be that they will become a bit of closed loop in that they will be training off of the content they create. This will reify the type of output they create and we will stop expecting something else. The AI output will become the new standard and it will be self-reinforcing.
@inthehands @caseynewton Maybe they assume that data creation will stagnate anyway from the climate catastrophe and widespread collapse of food/water supply security?