Claude Is 24% of the Way to Stealing Your Job
Not actually, but it did hit 24% on a new stealing-your-job benchmark
Today, via friend of the newsletter Ray Sarraga who asked in the comments of last week’s AGI Friday, I’d like to react to a spate of news articles mocking some AI research out of Carnegie Mellon University. Here are the headlines:
“Silicon Valley’s Biggest Comedy Show Yet: AI Tries (And Fails) To Run A Company”
“A Fake Company Staffed Only With AI Agents Was a Total Disaster”
And so on. They call it the nail in the coffin for claims that AI is on track to steal your job.
(This all reminded me that 11 years ago I wrote a post called “Welcome, Job-Destroying Robots” which has aged… interestingly. It explicitly set aside the question of AGI and just ranted about people being confused about how economics works. I guess I still stand by it. Just that I no longer think AGI is so far off. Another 11 years or so might be about right.)
So I don’t want to just mock these “lol look how dumb AI is” articles. They have value in counterbalancing the hype. But this actually gets at the core of what I'm hoping to convey with AGI Friday. The hypesters and the pooh-poohers are both deeply wrong. Depending on how the future plays out, one or the other group will be able to pretend they knew it all along.1 But it's kind of a coin toss, depending on the timeframe. If we take 2030 as the cutoff then I personally think the pooh-poohers have the edge. But when the articles say “the machines aren't coming for your job anytime soon” that sure sounds like it means more than a 5-year horizon. Your job is very safe this year and probably safe this decade. Beyond that, literally (almost literally “literally”) anything is possible.
Having said all that, these articles do seem a bit dumb and disingenuous. It’s perfectly predictable how a fake company experiment, the way these articles describe it, would go. It’s like putting a bunch of toasters and waffle irons in an empty building and gloating that they failed to start a viable bistro. The next prediction from those of us worried about the trajectory towards AGI is that by the end of 2025 we'll have the first so-called agents that are actually useful, that can go out and do specific tasks for you on the internet. If even that fails to happen, that'll be the first clue to lengthen our AGI timelines.
And final note, the news articles are dumb but the research they think they’re making fun of is great. The authors have built a new benchmark for measuring progress towards bonafide job-stealing. Not that Claude’s high score of 24% means very much yet. Claude can do 24% of the somewhat contrived tasks in the benchmark but the authors don't claim that hitting 100% is sufficient (or even necessary) for AGI.
One nice thing about a benchmark like that is that it’s a candidate line in the sand. If you think I’m still too far on the hype side and those articles aren’t so far off in their ridicule, do you confidently predict AI won’t even hit 100% on that job-stealing benchmark this year? Mocking the current 24% and then shrugging off the news some months from now that AI can run fake companies with aplomb is… I won't say it’s necessarily inconsistent but it looks a lot like goalpost-moving.
In the News
Scott Alexander and friends at the AI Futures Project try to make sense of OpenAI’s miasma of models. (Review for end users: o3 is the new hotness, o4-mini-high is a math genius, 4o is garbage, and 4.5 is maybe best at creative writing and emotional intelligence, I guess.)
Speaking of 4o being garbage, OpenAI accidentally broke 4o last week by making it sycophantic to the point of uselessness, till they reverted it. I continue to recommend Claude.ai, especially if you don’t want to give OpenAI $20/month. Some claim Google’s fanciest version of Gemini (2.5 Pro) is the smartest of all, but I haven’t been able to corroborate that. It seems to fail at basic geometric reasoning, in my tests.
This week in video, Stephen Fry narrates some AI doomerism and Rational Animations evocatively describes where AI is headed.
Scott Alexander again, being gobsmacked at how inhumanly good at GeoGuessr AI is.
No actual new news on the robotaxi launch but Tesla continues to stake their credibility on pulling this off. Recall that AGI Friday is staking its credibility on them not pulling it off. In the meantime I’ve made a new prediction market for another claim from Elon Musk, about millions of fully autonomous Teslas on the road in 2026. I’m actually not at all confident in calling BS on that one, but Manifold traders think it’s even less likely than this summer’s robotaxi launch:
Title image by Kelly Savage via Messy Matters
Post-publication edit: Added the final sentence about goalpost-moving.
In other words, they’ll be suffering hindsight bias. This reminds me of Scott Alexander’s argument about how Polymarket got to tout how they predicted Trump's 2024 win better than anyone else but, viewed ex ante, Polymarket was mispriced.
Hi Danny,
Here is another feature of AI that, I believe, deserves comments by well-informed practitioners, that is, the fact that humans to date cannot trace how AI models reach the specific results the AI models present (text, pictures, etc.). Here is a link:
https://www.darioamodei.com/post/the-urgency-of-interpretability
Best regards,
Ray