Does AlphaEvolve Count as Recursive Self-Improvement?
Also AI is on the cusp of crushing me personally at all math. Is that a big deal? Or more like how it can crush me at chess?
Boy howdy is the AI news coming fast and thick. I see three possibilities:
The end of the world (as we know it) is nigh
A lull is nigh
I’m very gullible and can be strung along indefinitely by point releases and new “breakthroughs”
Note the parenthetical for possibility 1. As I keep saying, we don’t know whether AGI will be good or bad for humanity, just that it will turn the world upside down. Check out the diagrams in my previous post on the technological Richter scale. AGI might be amazing. We just don’t have enough surety of that.
What about possibility 3, that we’re strung along indefinitely? If we can rule that out then we have a nice empirical test coming up. To pin it down, if there’s no lull by early 2026, my own probability for AGI this decade is going to go up a lot. And I actually have some credibility to deny possibility 3. I’ve been writing down progress-on-AGI updates publicly, if less prominently than here on AGI Friday, since 2008. And I’ve explicitly noted a lull before. Some highlights:
2008: I record my wager that AI won’t pass the Turing test by 2018
2016: The game of Go falls (another wager, with Eliezer Yudkowsky) and I talk about AI alignment but expect it to be a purely academic subject for the foreseeable future
2021: My 80% confidence interval for when we’ll hit AGI is 2040-2140
2022: I start freaking out about what LLMs can do and start posting lots of updates about their latest tricks
2023: I’m especially impressed by GPT-4
2024: I note the post-GPT-4 lull and say my p(doom) has been inching down
So there you go. I claim that if there’s another lull, I’ll acknowledge it. Hopefully my commitment to posting these updates every Friday won’t distort my incentives.
In any case, right now is very much not a lull. Let’s focus on Google DeepMind’s big reveal of AlphaEvolve this week. I talked last month about “Math in the Crosshairs” and now… I mean it’s not solving Millennium Prize Problems but it just pushed past the human state of the art on a spate of problems of much less consequence, such as “what’s the smallest hexagon into which you can pack 12 unit hexagons?”. Here’s the answer no human could find:
More pragmatically, it has optimized an implementation of a matrix multiplication algorithm and will speed up some of Google’s own machine learning training — including for AlphaEvolve itself — by about 1%.
I think scoffing at that might be like shrugging off the first few hundred infections of the Covid pandemic. But we’ll see. Hitting a wall is still very much on the table. My question is whether that happens before or after all of math falls.
The cases where I personally can hold my own against ChatGPT’s o4-mini-high in particular, when talking about math problems and puzzles, are getting few and far between. I’ve been feeding it puzzles, some of my own creation, and it either solves them with aplomb or makes very human mistakes — generally fewer than me. It was kind of giving me some existential panic so I kept feeding it these things until I found a case where I understood something it didn’t. More reassuringly, it got itself stuck in a robotic loop of denying a mathematically true fact.1 So definitely not AGI, but how hard I had to try to find a case like that, hoo-boy. And this improvement has been sudden. At the end of 2024, no model could solve geometric reasoning problems that any non-technical person could solve easily (at least given pencil and paper).
So here’s my philosophical question: In terms of AGI, is math like chess? In the sense that it's fine that computers can crush grandmasters. Chess AI is like an idiot-savant that can do that one thing. Doesn't mean AI can outmaneuver you in the real world.
I personally sure thought math was more general than that. I mean, the whole freaking universe runs on math. But it’s starting to look (somewhat) more likely that no, math is also a relatively narrowly scoped game. No matter how brilliant you are at that game, it doesn’t necessarily mean you have any common sense in the real world. Ok now that I put it that way maybe that should’ve been obvious!
But, still, it was a big deal in 1997 when Deep Blue beat Gary Kasparov and the equivalent milestone we could be on the cusp of with All Of Freaking Mathematics is going to be a much bigger deal. So, yeah, all this progress on math is shrinking my AGI timelines. But I do also assign some probability to it being at least a bit like chess was. Namely, that math is conquerable without it entailing much AGI progress.
In the News
Still non-news so far, but I’m nervously refreshing Google (and getting AI to do so) on Tesla’s robotaxi launch, now supposedly just a couple weeks away. Recall that I’m staking this newsletter’s credibility on Tesla failing to deliver on this. To be clear, I won’t be surprised if there’s something they call a launch. But there will be humans in the driver’s seats. If the drivers don’t need to watch the road — i.e., level 3 autonomy — then we can quibble.
Also not really news but Manifold’s AI dashboard is nice. Great thing to keep an eye on, like the big countdown-to-AGI clock, currently showing 7 years and 8 months.
Gary Marcus on why the US government’s proposal of a 10-year moratorium on laws about AI is a terrible idea. And, of course, a Manifold market about it.
Post-publication edit: Added a footnote with the math puzzle I’d alluded to.
Here’s the puzzle, courtesy of Spencer Pearson:
Chuck and Swarna are at a perfectly circular lake in the wilderness. Swarna is swimming. Chuck is on the shore, and can't get in the water, because he is carrying a chainsaw. Swarna wants to get out of the lake, but doesn't want Chuck to be next to her when she emerges, because he is carrying a chainsaw. Once she's on land, she can outrun him, because he is carrying a chainsaw. But even with the chainsaw, he can still run faster than she can swim. Call Chuck's speed c. Not the speed of light, this is all non-relativistic.
In terms of c, how fast does Swimming Swarna need to be able to swim to escape Chainsaw Chuck?
If you let ChatGPT search the web, it gets it right. If you don’t, it can get itself stuck.
Why can't AI drive better than you, despite a lot more investment in those domains than in math or chess? Is this all about domains where the rules are clear and the rewards are easy to validate?