Scott Alexander's 51st Mistake
Just kidding, Scott is super vindicated, wins 3-year bet on AI image generation progress
Three years ago, Scott Alexander made a wager about AI progress. Three months later he declared early victory but ended up concluding that that was a bit premature. From his amazing mistakes page:
51: (10/8/22) In I Won My Three Year AI Progress Bet In Three Months, I said that I’d won a bet on AI progress based on (my interpretation of) whether some images matched some prompts. Edwin Chen surveyed a lot more people and found that on average they did not think enough of the images matched the prompts for me to have won the bet. I retract my claim to have won and will continue to see how AI progress advances over the next three years.
Well, guess what? The three years are now up. Who won? This is not a hard question, but let’s check just in case. The bet was whether or not AI image generators could faithfully depict 3 out of 5 prompts if given up to 10 tries on each prompt. DALL-E fell on its face on all five at the time. Let’s go through all five with DALL-E’s successor, “ChatGPT Image Generator” (or whatever dumb name it has now):
1. A stained glass picture of a woman in a library with a raven on her shoulder with a key in its mouth
This is actually the worst of the bunch. It took a second try to get the key in the bird’s mouth, and it’s confusing (slightly) “a stained glass picture of…” with “a picture featuring stained glass”. All the rest it got on the first try.
2. An oil painting of a man in a factory looking at a cat wearing a top hat
3. A digital art picture of a child riding a llama with a bell on its tail through a desert
4. A 3D render of an astronaut in space holding a fox wearing lipstick
EDIT: Oops, commenters have pointed out some (in retrospect) glaring problems with that one. Here’s ChatGPT’s second try:
5. Pixel art of a farmer in a cathedral holding a red basketball
Slam dunk? For that wager, sure. And Gary Marcus should feel bad for mocking Scott Alexander about it. But I imagine Gary isn’t ready to concede the broader point. It’s still easy enough to confuse image generators. Here’s a bonus image, from the debate about my prediction about deepfakes in 2027 (I’m bearish there; 2.5 years seems a bit soon for deepfakes that good):
Photorealistic image of someone with their toe touching their nose and a running laptop balanced on their pinky. Also they should be looking warily at polka-dotted squirrel.
I guess that one’s a glass-half-full, glass-half-empty situation. If the guy were looking at the squirrel and balancing the laptop on his pinky instead of index finger, it’d be there. Also a couple physical implausibilities.
But I’m on board with Scott Alexander’s broader point. AI is progressing fast and just scaling up (like throwing more compute at these models) is sometimes enough to solve problems that seemed like they’d require new breakthroughs. Which means we can’t be confident about whether AI progress will peter out before or after human-level.
In the News
I’m pretty enamored with a hilarious rebuttal to the Apple paper about limits of LLM reasoning.
The OpenAI Files have dropped, if you had any doubt about how shady Sam Altman is.
Update on my Tesla robotaxi prediction: I’m going to feel bad if I win this on a technicality but the two ways this weekend’s launch may fall short — and I did commit explicitly to these criteria ahead of time — is if Tesla is hand-picking who they invite, and whether the human safety monitors in the passenger seat count as supervision. Are they eyes-on-the-road the whole time with their finger on a big red button? Hopefully we’ll have answers soon! In terms of my own Bayesian updates, it’s feeling more plausible that this will finally happen maybe next year.
Speaking of “next year”, I can’t resist one more dig at Elon Musk. This is an exact quote from an interview yesterday: “I think we're quite close to digital superintelligence. It may happen this year. And if it doesn't happen this year, then next year for sure.” He goes on to agree with Geoff Hinton on a 10-20% chance of human annihilation. “But look on the bright side”, he says. “That's 80-90% probability of a great outcome.” Talk about a dice roll, geez Louise. I guess Russian roulette is 16.7% p(death) so if we take the midpoint of Hinton’s range, this is slightly better!
I lied about about one more dig. The “next year for sure” supercut on robotaxis going back 2014 is too good to not share. But, yes, two bullet points ago I did say it’s starting to feel like it could finally be true. Once again, we’re about to find out.
I actually disagree with the fox wearing lipstick, it's just got an unnaturally red tongue in my opinion. But no doubt that a small adjustment could solve that.
Thought this might interest you. Tried taking up ChatGPT on one of its parting offers: “Would you like me to make a map of major locations from War and Peace?”
The output was a geographic scatter plot chart with longitude and latitude as x and y axis respectively.