AGI Friday

May 17

"the thing they’re testing is… just the smartphone app"

They are testing both the FSD part and any remote operations (monitoring the car, even remotely stopping/starting it)

"When you summon a car, it’s just a normal Tesla with a human driver using normal supervised FSD."

That's both a safety and legal feature. They want to avoid reporting any issues to the NHTSA while testing. You can argue that it's bending the rules or whatever, but it's not a basis for the technical feasibility of them starting an actual service in June.

As for what will happen in June, in the Q1 2025 earnings call, they spoke of "10-20 cars" and Musk specifically mentioned "June or July". So it will be a tiny start, and they can still claim "victory"

Expand full comment

May 17

Thanks, these are all solid points. Where does this leave you on the probability of driverless Teslas this summer?

Expand full comment

May 17

It’s summer in Texas, so weather conditions will be favourable and we’re discussing a fleet of 10 cars. You can already see videos of people in California and Texas using FSD and it does fine. Of course they will have remote monitoring to avoid accidents. So I would say 90% chance of Tesla claiming victory and 150% chance of critics arguing this is pointless because it is a single city and just 10 cars. 😆

Expand full comment

https://manifold.markets/dreev/will-tesla-count-as-a-waymo-competi?play=true

I don't think remote monitoring can reliably avoid accidents. Network lag would be akin to drunk driving.

But I agree that Tesla is likely to do something they can call a launch and that this will be hotly debated. I'm trying to pin down the question in the Manifold market highlighted in the post:

It's sitting at a 21% chance at the moment.

Expand full comment

Reply (2)

Well, here's Vay https://vay.io/, a car-rental company operating now in Las Vegas, which remotely delivers the car to you. You do your own driving, but can leave it anywhere you want, and then they remotely drive it back.

And Baidu has been operating Apollo Go, a fully autonomous taxi service (but with LiDAR, of course), for some time now. You can see their remote consoles here https://evcentral.com.au/chinas-google-getting-into-electric-cars/

Expand full comment

May 19Edited

Ah, thank you! I hadn't realized there was this much tele-operation happening, or legal. I looked into Vay and it's less than it seems:

1. They don't tele-operate with passengers in the car, they just use it to get rental cars to customers who then drive the cars normally

2. They limit the speed during tele-operation to 26mph (42 km/h)

Like I said, I think the network delay is comparable to drunk driving; so they've calculated that up to that speed that's not too much of a risk, for Las Vegas at least.

I'm still thinking about what this means for what Tesla might do in Austin. Thanks again for pointing me to that.

Expand full comment

Reply (2)

https://chatgpt.com/s/dr_682c20762f70819190ea399fc1ec8364

May 20

Here's how they can have Starlink on each car for additional connectivity

Starlink is now equipped for aeroplanes, so why not for cars?

Expand full comment

https://chatgpt.com/s/dr_682bad91be648191beee7e70e3419303

May 19

I put ChatGPT to search for this. It says it's doable, but there may be issues at higher speeds, as you mentioned. This also depends on the specific areas (Austin) and how good the network is there

Expand full comment

By the way, from the link on manifold, "our Event Response agents are able to remotely move the Waymo AV under strict parameters" is basically tele-operation (on rare occasions, which are not fully reported), which you describe as a "huge scandal" if Tesla did it ;)

Expand full comment

https://arstechnica.com/ai/2025/04/new-study-shows-why-simulated-reasoning-ai-models-dont-yet-live-up-to-their-billing/

The line is not perfectly bright but it's bright enough. They can remotely get a car onto the shoulder at walking speed if it's stuck and safety requires it. The scandal would be remote supervision at driving speed.

Expand full comment

Ray Sarraga

May 2

Hi Danny,

Would you like to comment on the possible AI limitations discussed in the URLs below?

https://futurism.com/professors-company-ai-agents

https://www.nature.com/articles/s41586-024-07566-y

Best regards,

Ray Sarraga

Expand full comment

Reply (2)

May 2Edited

And here are my rough thoughts on the Nature article, which I may also want to turn into a future AGI Friday:

The idea of model collapse is a big deal, and may predict that we peter out below AGI. In domains where the AI can generate its own synthetic data, like playing Chess or Go, it bootstraps itself to unfathomably superhuman capability. Intuitively it seems that that would not work with text. It's like eating its own tail. But it's not obvious. Maybe by doing chain-of-thought or other tricks, it can generate text that's good enough to train on. If so, you can improve the base model, then use the same tricks to generate even better text. So now retrain the base model on that, rinse and repeat. If that works, that's recursive self-improvement that may have no upper bound, or no subhuman upper bound.

The Nature article seems to say that's impossible, but consider the counterpoint -- https://arxiv.org/abs/2406.07515 -- that verifying high-quality data is easier than producing it, so if you have a smart enough LLM to separate the wheat from the chaff, you can avoid model collapse (and I guess potentially bootstrap to superintelligence).

(Or consider the counterpoint -- https://openreview.net/forum?id=5B2K4LRgmz -- that model collapse only happens if you replace the original data with synthetic data. If you just keep appending synthetic data then the AI's performance plateaus rather than degrades. Does it plateau below human level? Who can tell!)

One more article suggesting recursive self-improvement is possible: https://ar5iv.labs.arxiv.org/html/2502.13441

I should also mention how much an LLM (o3, specifically) is helping me with this lit review!

PS: If the point is that an LLM can't bootstrap itself without an external reward signal, like with a well-defined game, then what happens if you turn the real world into a well-defined game? Give an agent internet access and a bank account and tell it to make the balance go up, in whatever ways it can come up with to do that. Maybe instrumental convergence blah blah blah we all die, is what happens.

PPS: Thanks so much for asking about this stuff!

Expand full comment

May 2

Definitely! In fact, I'm working on turning my reaction to that "fake company staffed by AI agents" article into today's AGI Friday. I may do the same for a future AGI Friday about that Nature article and recursive self-improvement.

My general thoughts are that I think articles like these are doing valuable hype-deflation work. This actually gets at the core of what I'm hoping to convey with AGI Friday. The hypesters and the pooh-poohers are both deeply wrong. Depending on how the future plays out, one or the other group will be able to pretend they knew it all along. But it's kind of a coin toss. I mean, not literally, and it depends on the timeframe. If we take 2030 as the cutoff then I personally think the pooh-poohers have the edge. But one of the articles you linked to says "the machines aren't coming for your job anytime soon" and that sure sounds like it means more than a 5-year horizon. So I want to vehemently disagree with that. Your job is very safe this year and *probably* safe this decade. Beyond that, literally (and I mean pretty literally, literally) anything is possible.

Expand full comment

Clive F

Apr 30

Whenever people ask what the line in the sand is for testing if AGI is smarter than humans, I think of that line from one of Robert Heinlein's books (one of the Lazarus Long ones, iirc):

“A human being should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects.”

How many can we tick off for the current LLMs? What's going to fall next? (Apparently robotic brick laying is way tricker than people think, for example - we've been trying and failing since the industrial revolution to produce general purpose brick-laying machines.)

A number are stuck behind Moravec's Paradox, I guess (which gives us my favourite test of human-level AGI, from Wozniack: "enter a random house, and make a cup of coffee, finding all the things you need in there").

Expand full comment

Apr 30

Amen. But what do you think of making the physical/nonphysical distinction? Maybe it's mostly robotics that's lagging? I notice that all the things on Heinlein's list that don't require a physical body are done or seem close. The definition of AGI I like best these days is based on the idea of an artificial drop-in remote worker.

Expand full comment

SorenJ

If you had taken the situation in 2016 and looked at the log-scale of progress for self-driving cars, would you have predicted based on that log-scale that self driving cars were 2 years away?

If so, that might suggest that we are in a similar situation today with regards to AGI, or at least superhuman-software-development-AI. Right now the log-scales of METR suggest we are maybe 2 years away from AI that can fully do the work of a software engineer. But if our situation is more like the 2016 one, then we are probably actually about 10 years away from this, given that the last 20% gets increasingly and increasingly harder to capture. (I was just trying to code with Gemini 2.5 today and the experience was... frustrating to say the least.)

Expand full comment

https://epochai.substack.com/p/the-case-for-multi-decade-ai-timelines

Great question. I do think human-level driving *could've* been here a lot sooner. Waymo has had it for years and is now pretty clearly superhuman. Of course AGI could be similar in these regards as well. Just that there are plenty of reasons to think AGI will be very different from other technologies.

In any case, intuitively I agree that 10 years seems a lot more reasonable than 2 years. Here's Ege Erdil, who I previously characterized as bullish on AGI timelines, making the case yesterday that we're decades away:

Expand full comment

SorenJ

For your geometric reasoning benchmark, I get on problem 0 (original) a minimum of 1 or 2 initially. I found the phrasing confusing, "What is the minimum and maximum number of line intersections in that drawing?" I assumed this meant, "what is the minimum and maximum number of line intersections FROM THE LINE THAT WE DREW IN STEP 4?"

Anyway, I now see that you meant all line intersections possible. But I still think there is another ambiguity. The square is "fully inside", but given that lines have zero thickness, you can make it so that the edge of the square intersects the circle. Then, the exiting of the square and the entering of the circle really only count as one intersection of 3 different lines. So I get a minimum of 3 or 4.

Expand full comment