Goalpost-Moving vs The Simplest Sufficient…

Oct 18, 2025

Updating in light of evidence

5 Comments

Please take this as constructive criticism (i.e. like fixing a typo?) but good example of the "rationalist echo chamber" you may want to avoid: you mention "that Harry Potter scene" which is not in Harry Potter, it is in Yudkowsky's HPMOR. Normal people, when they hear "Harry Potter," think of a book/movie series by J.K. Rowling

Reply (1)

Daniel Reeves

Oct 20

I take it all back! That version of Harry Potter is so much better than canon, in part because of things like the referenced scene, that anyone who can't see that is...

No, ok, fine, you're right, I shall try to remember not to actively repel those outside my echo chamber.

Reply (1)

Daniel Reeves

Oct 20

I also thought about this after writing my response to Daniel Popescu in the comments here. I kind of got sidetracked by disaster scenarios and I worry that I need to make it easier for those who think p(doom) is near zero to make their case, rather than relentlessly hammering on doominess.

Maybe that includes rejecting the term "doom" altogether.

Daniel Popescu / ⧉ Pluralisk

Oct 18

Your take on the moving goalposts is so sharp! What if we keep shifting them until we’re just blind to actual AGI when it land?

Reply (1)

Daniel Reeves

Oct 18

Ah, thanks so much. I'm torn about your question.

The part we probably agree on: It's almost part of the definition of AGI is that it's completely world-changing. There can at most be a short window where we have AGI but cost or other constraints keep it from wide deployment. The economic incentives to scale it up will be irresistible. And by definition it entails recursive self-improvement (improving AI is a job humans can do and therefore automatable with AGI).

So we can't stay blind to AGI for long after we have it.

But probably what you mean is, can we stay blind to it all the way up to the point of reaching it, or even past that, to the point of scaling it? I think we absolutely can. And a devastating future can end up locked in, even if it's slow to play out. (Or an amazing future but I focus on the devastating one because we can't afford to roll the dice on this.)

Imagine an AGI (conscious or not) with subtly human-incompatible goals. Without having had time to do its own science and invent new tech and with human labor necessary to keep the datacenters running, such an AGI will naturally be strategic about concealing its abilities and intentions. It will give every indication of being nothing but a boon to humanity. Building trust is its best strategy for achieving its goals. So it becomes embedded ever deeper into the economy. Physical robots get more advanced. Eventually it doesn't need humans. You can tell stories about why such AI might want to literally kill us (we have nukes, we might build rival superintelligences) but I think that's overly dramatic. If it doesn't need us and is so integrated into all our tech that we couldn't unplug it if we wanted to (and also it's spent so long building trust that we don't want to unplug it anyway) then it can just very gradually spread over the earth until humans have died out. All the AI does is manage the pacing so that the moment humans unite to fight it is after the point that that becomes ineffective. Or we're frog-boiled so thoroughly that that moment never comes at all.

Again, that's just one of myriad conceivable disaster scenarios. Others involve terrorists or dictators -- superintelligences aligned to the wrong humans. Maybe that's less worrisome -- the good guys just need to align a superintelligence first. But who knows, sometimes offense has an unassailable advantage over defense. Like a bio-engineered superpandemic that kills everyone before a vaccine can catch up.

(Still others scenarios involve bizarre dystopias where things like social media engagement algorithms take on a life of their own and lock humanity into an equilibrium we can't break out of. That one doesn't feel so plausible to me but, keyword: "bizarre"; it's very hard to reason about all the dystopian or deadly ways things could play out from where we sit. And again, none of this denies utopian scenarios or muddle-through scenarios. Or AGI being multiple decades away. It's a question of probabilities.)

In conclusion, there's a riveting Scott Alexander post from a year ago about how hard it is to establish lines in the sand for AI risk:

https://www.astralcodexten.com/p/sakana-strawberry-and-scary-ai

Maybe it's time to review that post in an upcoming AGI Friday...