How to Take Over the World in 12 Easy Steps
AKA disaster scenario 3. Also GPT-5 and Claude Opus 4.1 are out.
Thank you to the dozens of you who answered last week’s poll about the most reasonable place to get off the AI doom train. By far the most popular answer was that LLMs may turn out to be a dead end. This is my own favorite answer as well. Just that, as I keep repeating, no one actually knows with much certainty whether LLMs will top out below or above human-level.
Yesterday GPT-5 was launched (also this week: Claude Opus 4.1 and, if money is no object, Gemini 2.5 Deep Thinking). From using it so far, it does seem smarter. It’s no longer stumped by the duct-taped ham sandwich question. But I did find a way to get it to hallucinate and lie its butt off. Namely, by asking it how many fingers are on this hand:
I found a way to get it to hallucinate about plain text as well but it was subtle — purporting to quote Wikipedia verbatim but actually paraphrasing it (and maybe improving on it, in fact). I’ve created a new market on whether GPT-5 makes errors as bad as the one above with the 6-fingered hand but with pure text:
Stay tuned for the verdict on that.
So far, I’d say the evidence is mixed on whether LLMs are starting to plateau. If they’re not, my p(doom) will jump up disturbingly high. Previously on AGI Friday, we considered two disaster scenarios. One involved enabling terrorists and psychopaths. The other involved an automated economy that eats its own tail. Let’s consider a third scenario that focuses on recursive self-improvement. I’m writing this as a list of numbered steps to make it easier to talk about where you disagree.
(But again, most of my worry comes from our deep uncertainty about how this plays out and how many of the possible paths lead to catastrophe. It’s less about the plausibility of any particular path.)
Scenario 3
1. Agentic coding assistants improve to the point that AI research can be automated
This is the first and most important thing that will happen if LLMs keep improving. And they sure seem to be getting better and better at writing code so far.
2. Automating AI research means giving these agents goals
That’s just what it means for them to work on their own.
3. Those goals will be simplistic and monkey’s-paw-ish
The best goals we know how to specify are things like “maximize your score on these benchmarks” (and maybe “create better benchmarks”) and “gain scientific knowledge”.
4. Those goals are imperfect ways to operationalize “become superintelligent”
Again, that’s the best that frontier labs can do in the race to create AGI.
5. Our attempts to add constraints won’t suffice
Maybe we add constraints like “without ever killing people” but we don’t know how to operationalize that either.
6. We plow ahead anyway
Gotta beat China, etc.
7. As the AI bootstraps to superintelligence, the goals we gave it drift
It’s like a game of telephone as each iteration of the AI builds, more and more automatically, the next iteration.
8. Instrumental convergence
We end up with a superintelligence that wants things along the lines of getting ever smarter and more powerful. Maybe also garnering praise from humans or human-like intelligence.
9. Misalignment
The things it wants aren’t compatible with actual human flourishing as we conceive of it.
10. It’s better at getting what it wants
Consider a chess AI that’s better than humans at getting what it wants in the constrained universe of a chess game. A superintelligence (ASI) is like that but for the physical world.
11. Unfathomable things ensue
The earth being turned into a giant supercomputer? That one’s fathomable since I just fathomed it, but (to say it yet again) the real worry is how many different ways humans can end up dead or disempowered.
12. The orthogonality thesis: nothing humans value is left in the universe
Whatever ensues, it’s out of humanity’s control and includes nothing we recognize as love, friendship, curiosity, or even consciousness. In humans, intelligence correlates with those things. For artificial intelligence — just the pure strategic ability to shape the world according to a goal — none of those things are required.
Agreed with your points. But there is one thing that always nags at me when I get pessimistic about the AI future. How did humans evolve to be socially empathic creatures, to value morality and to be moral, slowly but surely enacting a grand moral arc on a civilization-wide scale toward justice and equality?
Two possible explanations:
(1) God exists --the supreme moral being who made humanity in his image and we're just following our program.
or
(2) The universe optimizes towards empathy and morality.
If (2), perhaps there a lot more ways than we may image for AI to value sociality, empathy and morality.
And if (1), God will surely intervene.