I’m a bit of an LLM skeptic for real-world applications, but I have to say Claude building that color app from a single prompt was extremely impressive.
(I can’t remember if it was you or ACX who was pondering the question of why there’s such a mismatch in different people’s attitudes to AI, and I agree that it has a lot to do with coding. If you only use LLMs for help with real-world tasks, and if those tasks are niche enough that you couldn’t just get the same answer by googling, then LLM performance still lags quite a bit behind the hype.)
Ha, I'm extremely flattered that you're mixing me up with Scott Alexander but in this case it's presumably just that I talked about Scott's "Ask Machines Anything (AMA)" post as the final bullet item in the Random Roundup of https://agifriday.substack.com/p/crashla
I don't blame you for LLM skepticism, even after seeing impressive examples like that color app. One can just never be sure how much cherry-picking (or outright cheating) is happening when one sees examples like that touted. In this case I promise it's very much representative of what the latest Claude Code can do, at least for small, self-contained apps like that. I think I'm even more impressed by the thorough understanding (or, if you prefer, "understanding") it demonstrates when you ask for changes.
There's a lot more I want to say about that, like how, if the me-from-5-years-ago could see those interactions I'd 100% have called it AGI. Wait, maybe I already said all that in https://agifriday.substack.com/p/goalposts
PS: Let me clarify a key distinction between level 2 and level 3 autonomy.
In the official levels of self-driving, both levels 2 and 3 require you, the human, to be ready to take control in real time. The difference is that at level 2 it's your responsibility to decide when to take control. You're supervising and it's up to you to disengage the AI if it's about to screw up. At level 3 you no longer have to supervise everything it does. You have to be ready to take over at any moment but the AI will get your attention if it needs you. You can read a book or otherwise do your own thing much of the time.
I advocate maintaining that distinction to whatever else we're applying these autonomy levels to.
Writing: Level 2 means you're considering every word the AI generates and using that word if and only if it's what you endorse saying, in your own voice. (Better yet, the plagiarism litmus test: don't use anyone's or anything's exact words without explicitly quoting them.) At level 3 you're still the one in charge and should read every word before publishing since you're vouching for the finished product, but you're not supervising the writing word by word as it's written.
Coding: Level 2 means you're in the integrated development environment (IDE) with all the code. At level 3 you ditch the IDE and just talk to the AI in English. You don't worry about the literal code but you're involved in implementation decisions. At least some of them.
In short, level 2 means the AI is assisting you and level 3 means you're directing the AI.
I continue to think about this and discuss with people. Above I'm harping on the 2-vs-3 threshold: Level 2 means an AI assistant. Perhaps it assists extensively, but the human's fully in the loop on every action if not taking the actions directly. At level 3 you can delegate, without constant supervision. Directed AI.
Level 4 is where the AI is meaningfully autonomous. The human isn't in the driver's seat, literally or figuratively. The human is still needed when the AI gets stuck, but more at the level of answering specific questions the AI may have. Again, that's very literally the case if we're talking about L4 self-driving. See https://agifriday.substack.com/p/waymo
For software engineering, I'd like to retract my original statement in this post that wizards and AI-whisperers like Christopher Moravec have their coding agents at level 4, other than for simple apps. The fact that Christopher can get as close as he does to level 4 and I can't is what makes it still level 3. If the AI were at level 4 -- autonomous coding -- it wouldn't need that kind of human skill to pull it off. Christopher may not be touching actual lines of code anymore but the work he's doing to make his AI agents sing still counts as software engineering.
When, as sure seems inevitable at this point but who knows, all software development is at level 4 then there's essentially no such thing as human software engineers. The AI will still sometimes need human input when building software, but those humans will be product managers or executives or other things, not software engineers.
Bigger picture, and the reason I'm fussing so much about where the exact boundaries are, is because I think level 4 for all software, including AI R&D, has a decent chance of leading to recursive self-improvement which has a decent chance of leading to AGI which has a decent chance of leading to superintelligence which has a decent chance of leading to literally anything. It all adds up to profound and utter uncertainty about the future and whether humans are even in it.
I’m a bit of an LLM skeptic for real-world applications, but I have to say Claude building that color app from a single prompt was extremely impressive.
(I can’t remember if it was you or ACX who was pondering the question of why there’s such a mismatch in different people’s attitudes to AI, and I agree that it has a lot to do with coding. If you only use LLMs for help with real-world tasks, and if those tasks are niche enough that you couldn’t just get the same answer by googling, then LLM performance still lags quite a bit behind the hype.)
Ha, I'm extremely flattered that you're mixing me up with Scott Alexander but in this case it's presumably just that I talked about Scott's "Ask Machines Anything (AMA)" post as the final bullet item in the Random Roundup of https://agifriday.substack.com/p/crashla
I don't blame you for LLM skepticism, even after seeing impressive examples like that color app. One can just never be sure how much cherry-picking (or outright cheating) is happening when one sees examples like that touted. In this case I promise it's very much representative of what the latest Claude Code can do, at least for small, self-contained apps like that. I think I'm even more impressed by the thorough understanding (or, if you prefer, "understanding") it demonstrates when you ask for changes.
There's a lot more I want to say about that, like how, if the me-from-5-years-ago could see those interactions I'd 100% have called it AGI. Wait, maybe I already said all that in https://agifriday.substack.com/p/goalposts
PS: Let me clarify a key distinction between level 2 and level 3 autonomy.
In the official levels of self-driving, both levels 2 and 3 require you, the human, to be ready to take control in real time. The difference is that at level 2 it's your responsibility to decide when to take control. You're supervising and it's up to you to disengage the AI if it's about to screw up. At level 3 you no longer have to supervise everything it does. You have to be ready to take over at any moment but the AI will get your attention if it needs you. You can read a book or otherwise do your own thing much of the time.
I advocate maintaining that distinction to whatever else we're applying these autonomy levels to.
Writing: Level 2 means you're considering every word the AI generates and using that word if and only if it's what you endorse saying, in your own voice. (Better yet, the plagiarism litmus test: don't use anyone's or anything's exact words without explicitly quoting them.) At level 3 you're still the one in charge and should read every word before publishing since you're vouching for the finished product, but you're not supervising the writing word by word as it's written.
Coding: Level 2 means you're in the integrated development environment (IDE) with all the code. At level 3 you ditch the IDE and just talk to the AI in English. You don't worry about the literal code but you're involved in implementation decisions. At least some of them.
In short, level 2 means the AI is assisting you and level 3 means you're directing the AI.
I continue to think about this and discuss with people. Above I'm harping on the 2-vs-3 threshold: Level 2 means an AI assistant. Perhaps it assists extensively, but the human's fully in the loop on every action if not taking the actions directly. At level 3 you can delegate, without constant supervision. Directed AI.
Level 4 is where the AI is meaningfully autonomous. The human isn't in the driver's seat, literally or figuratively. The human is still needed when the AI gets stuck, but more at the level of answering specific questions the AI may have. Again, that's very literally the case if we're talking about L4 self-driving. See https://agifriday.substack.com/p/waymo
For software engineering, I'd like to retract my original statement in this post that wizards and AI-whisperers like Christopher Moravec have their coding agents at level 4, other than for simple apps. The fact that Christopher can get as close as he does to level 4 and I can't is what makes it still level 3. If the AI were at level 4 -- autonomous coding -- it wouldn't need that kind of human skill to pull it off. Christopher may not be touching actual lines of code anymore but the work he's doing to make his AI agents sing still counts as software engineering.
When, as sure seems inevitable at this point but who knows, all software development is at level 4 then there's essentially no such thing as human software engineers. The AI will still sometimes need human input when building software, but those humans will be product managers or executives or other things, not software engineers.
Bigger picture, and the reason I'm fussing so much about where the exact boundaries are, is because I think level 4 for all software, including AI R&D, has a decent chance of leading to recursive self-improvement which has a decent chance of leading to AGI which has a decent chance of leading to superintelligence which has a decent chance of leading to literally anything. It all adds up to profound and utter uncertainty about the future and whether humans are even in it.