We Regret To Inform You There Will Be No Tesla Robotaxis This Summer
Also how to think about self-driving autonomy levels, and my geometric reasoning benchmark falls
Oh hi, there are suddenly drastically more of you since Scott Alexander plugged AGI Friday on Astral Codex Ten. Welcome! Let me start with a quick update from last week about superhuman math: ChatGPT’s “o4-mini-high” has now saturated my personal geometric reasoning benchmark. I defy you to confuse it with spatial reasoning questions, even ones you yourself can’t untangle without pencil and paper. I consider this a big deal since failures like these1 seemed, as recently as 6 months ago, to be central to the claim that AGI was a long way off. It may still be a relatively long way off (or not) but we’ve taken another step closer.
(If you think I’m falling for the hype — and there’s plenty of hype — my challenge to you is to pick your line in the sand. What’s the least impressive thing you’re confident AI won’t be able to do in 2 years? If it’s “be gainfully employed in a full-time remote job” then, well, I kind of agree and merely claim there’s a lot of uncertainty. The disagreement will be more fruitful if we can find more of a leading indicator. And if we can’t then we should admit that AGI in 2 years isn’t totally out of the question.)
Alright, but by popular demand, let’s talk more about self-driving cars. I previously predicted that you’ll get your private self-driving car in 2028. Then I kept obsessively making post-hoc edits to the footnote about autonomy levels so let’s start with that. Skip to the Tesla section if that’s what you’re here for.
Autonomy levels for self-driving cars
There’s an industry standard for talking about how autonomous a car is. Here’s how I think of it:
Level 0 = a totally normal old-school car
Level 1 = old-school assistance like cruise control + lane-keeping assist
Level 2 = self-driving BUT the human may have to yank back control if the car is about to kill you ← Tesla is here
Level 3 = the human can read a book but must be ready to take over quickly if the car beeps ←Mercedes is dipping a toe into here
Level 4 = no human in the driver’s seat, car safely stops if confused ← Waymo is here
Level 5 = the AGI of self-driving, car handles anything a human can
The difference between level 1 and level 2 is whether you’re mostly still driving, vs mostly just supervising. The difference between level 2 and 3 is dangerously subtle. If interventions are rarely needed in a level 2 autonomous car, you may get complacent. It feels the same as level 3. The key is whether you can trust the car to tell you that it needs you to take over. If for some reason you don't take over, a level 3 car will safely stop, albeit perhaps in the middle of the freeway. So “safely” in scare quotes I guess.
At level 4, the car can handle almost anything but if there's an angry moose blocking the road or cops directing traffic around an accident or something else very out of the ordinary, it may need to stop and call a human for guidance on what to do. No tele-operation, just effectively asking a human things like “can I proceed through here or do I need to turn around?”
As of this writing, Tesla is at level 2 — that’s the “supervised” in “supervised full self-driving (FSD)” — and Waymo is at level 4. Mercedes (and BMW and Honda but not in the US yet) are dipping a toe into level 3 but it’s so restrictive it hardly counts. As I said last month:
It’s at level 3 autonomy on certain highways in California and Nevada when there's a traffic jam keeping the speeds below 40mph, with another car it can follow, in daylight in good weather with clear lane markings etc. It’s a start!
Why I Gotta Be a Debbie Downer about Tesla Robotaxis?
First can I just say what a nightmare it is trying to research this on the internet? Talk about hostile epistemic environments. Tesla is either an Enron-style fraud about to implode or Elon Musk is the messiah. (To put my cards on the table, I think it’s plausible Musk has suffered some kind of literal mental illness but I don’t want to underestimate him. He still seems to be something of a force of nature, for good or ill.)
So there are headlines like this:
Tesla's Robotaxi Hits Roads in Two Cities, Logs 15,000 Miles Ahead of Full Rollout
And then you check the details and find they’re testing with Tesla employees only and the thing they’re testing is… just the smartphone app? When you summon a car, it’s just a normal Tesla with a human driver using normal supervised FSD.
Elon Musk has been very explicit in promising a robotaxi launch in Austin in June with unsupervised FSD. Let’s give him some leeway on the timing but I’m ready to stake my own credibility on the prediction that that is not happening.
I should admit that I haven’t tried the latest Tesla FSD version and I don’t deny it’s drastically better than what I tried a couple years ago. It seems it often completes whole trips without human intervention. It’s just that, impressive as that is, it’s still a world away from what’s needed for a robotaxi launch without humans in the driver’s seats.
I’ve actually been beefing with Elon Musk about this for going on 10 years now when Musk tweeted in January of 2016 that “in ~2 years, summon should work anywhere connected by land & not blocked by borders, eg you're in LA and the car is in NY”. I was bullish on self-driving cars back then myself (in 2011 I had predicted level 3 self-driving cars on freeways by 2021) but said I’d eat my hat if Musk was remotely correct in his 2016 prediction. He has famously repeated his “in a couple years” or “next year” every year since then. But I don’t want to mock that too much. Eventually it will be true! And the other extreme is even worse, mocking all the ways it currently falls short and implying that it will never happen. Like Gary Marcus does about AGI!
Let me also admit that I've lost money (well, mana on Manifold anyway) underestimating Musk before (like whether he was serious about launching xAI) so I'm not totally confident he doesn't somehow pull this off. Apparently the stock market still thinks he has a chance?2
But here’s me going out on a limb:
No robotaxi launch in Austin in June (or July or August) with actual level 4 autonomy like Waymo has.
If that does ever happen it will be after Tesla does all the things Musk has mocked Waymo for. Namely Lidar and radar sensors, hi-def pre-mapping of roads, and the phone-a-human feature for when the car is confused.
For gory details and edge cases, see my market on Manifold.
In the News
The AI Futures Project might already be on track for uncannily accurate predictions
An AMA with the AI Futures Project team is happening right now (when I’m sending this out)
Demis Hassabis went on 60 Minutes which I mention because he’s the most impressive of the AI company CEOs and manages to make all the stuff we talk about here sound respectable instead of like science fiction. His estimate is that we’ll be at AGI in the early 2030s — and he’s been consistent about that for forever!3
Another AI company CEO, Dario Amodei, seems to be taking AI alignment more seriously than he previously seemed to be.
By “failures like these” I mean where the AI fails because it doesn’t have a coherent model of the world, or even a mini world involving shapes on a piece of paper. And when it succeeds, it’s hard to argue that it doesn’t “understand”. Looking at solutions it comes up with to math problems, the understanding it exhibits sure feels deep and genuine. Or at least it gets problems solved in a way that would unambiguously demonstrate deep and genuine understanding if done by a human. (Then other times it falls on its face in ways no human possibly could. It’s wild. But the face-falling seems to be happening less and less. Which is also wild.)
Maybe I shouldn't say “he” has a chance to pull it off. Maybe Tesla is still producing cars despite him instead of because of him at this point, I have no idea. I’m trying as hard as I can here to make this about self-driving cars and not Elon Musk.
Random thought for fellow AGI Friday readers: Those cheaply generated AI summaries at the top of Google Search results can't be good for the reputation of AI as a whole, no?
I find that Gemini 2.5 Pro and all the gemini thinking or chain of thought models so far very rarely hallucinates, while Gemini Flash and similar Gemini models nearly always hallucinate to the point where I never trust Flash with any factual lookup, even though both models have access to Google Search.
I read in a wired article about the history of Gemini that while internal factions complained, surveying done of end users found that the addition of AI generated summaries overwhelmingly was preferred by users.
"The senior director involved ordered up some testing, and ultimately user feedback won: 90 percent of people who weighed in gave the summaries a “thumbs up.”
https://www.wired.com/story/google-openai-gemini-chatgpt-artificial-intelligence/
However, as a person who uses AI (never for creative work) daily, I DO NOT TRUST the AI summaries at the top of the search results even on the most common queries that must be queried millions of times a day because they have been so consistently wrong as to make me permanently distrust and mentally skip over these results every time I see them. If I want actual results I tap the AI mode button.
I can see the reasoning this way Socratic argument style.
A) Using the flash model is cheaper since we have a lot of unique google searches and allows us to customize it for every single user.
B) Fair, but shouldn't we use a more computationally expensive model?
A) No, because users are clearly already content with our worse model.
B) The one that generates really inaccurate results? We have a model that doesn't hallucinate right here. Why don't we use it?
A) I can show you a business cost benefit right here that shows that the improved results barely matter to the end user for x y and z reason.
Here's me chiming into the hypothetical
C) Why don't we just generate an expensive computational cost summary for the 1 million most common queries, or some threshold that meets the most users possible?
A) (My guess of what they'd say) We already do that with the flash model, but the generating notes makes users feel more engaged with the AI than if we were to instantly display the results. Additionally, using the flash model allows us to generate individualized results that better reflect the flow of information sources presented to the users and allows us to react to realtime events like if the Pope were to suddenly pass away.
C) Why don't we just cache an expensive query once an hour for the most common search queries? Users aren't expecting personalization there (although that'd be really cool) because users aren't expecting to opt into AI personalization when they use a Google Search (yet) and might prompt a bigger backlash than we already have. We treat it like our extracted data from Wikipedia articles.
A) Cost, benefit, analysis.
C) This small expensive change could do a lot to convert a lot of users to Gemini. Who in their right minds would pay us for Gemini if flash keeps hallucinating on every second result?
A) You're exaggerating.
C) I am frustrated! This is our chance to get AI in front of as many people as possible and we could convert a lot of people who are skeptics into believers by providing good quality results. That's what they come to Google for, to get good quality results and frankly while they're good enough for some long term growth might be hampered if people start mentally skipping over it, even if your cost benefit analysis and statistics show that this is the most wise short term decision!
I have very strong feelings about AI summaries. 😅
"the thing they’re testing is… just the smartphone app"
They are testing both the FSD part and any remote operations (monitoring the car, even remotely stopping/starting it)
"When you summon a car, it’s just a normal Tesla with a human driver using normal supervised FSD."
That's both a safety and legal feature. They want to avoid reporting any issues to the NHTSA while testing. You can argue that it's bending the rules or whatever, but it's not a basis for the technical feasibility of them starting an actual service in June.
As for what will happen in June, in the Q1 2025 earnings call, they spoke of "10-20 cars" and Musk specifically mentioned "June or July". So it will be a tiny start, and they can still claim "victory"