The State of Vibe-Coding

Dec 20

And vibe-coding our way to probability distributions over how this AI thing may play out

7 Comments

Fantastic walkthrough of the AI coding revolution! The jump from barely useable grids to fully functional apps in under a year is genuinely wild. I've been experimenting with Cursor for abot 2 months now and the difference between what it could do then versus now feels like multiple generation leaps compressed into weeks. The part about recursive self-improvement stalling before things get too dangerous is something I dunno many people are taking seriouslly enough when modeling timelines.

Expand full comment

Reply (1)

Daniel Reeves

Thanks! I was worried people would dismiss me as too enamored with these new toys to be objective. I'm still extremely grumpy about the status quo with voice interfaces, so perhaps that lends me some credibility that I wouldn't be saying this about vibe-coding if it weren't the real deal.

Re: Recursive self-improvement: I mostly agree that the possibility of stalling between AGI and ASI is under appreciated, with caveats:

1. It makes sense to worry asymmetrically, given what's at stake. Like if you say, "this airplane is more than twice as likely to get us where we're going as it is to explode" then I sure am gonna focus on the "might explode" part.

2. This probably wants to be its own AGI Friday topic but I kind of want to define AGI as "that which can bootstrap itself to ASI". A supposed AGI that can't recursively self-improve just isn't really the AGI we're worried about.

Expand full comment

Daniel Reeves

PS: I added a feature last night to make the URL encode the probability distribution. Here's Nate Silver's:

https://richter.dreev.es/?d=YgDZimq1Fd7d47

Here's what it took to implement that feature:

ME: can we come up with a clever encoding of a probability distribution? that we can put in the URL querystring? what do you suggest?

GEMINI: [describes a perfectly good approach using a 50-character hex string]

ME: how many characters would we need if we used something like base62 encoding? (is 62 the right number? how many characters can cleanly go in a URL without having to percent-encode them? 26 lowercase + 26 uppercase + 10 digits + ...?

GEMINI: [advocates for base64 encoding with dashes and underscores as well, and suggests ways to otherwise squeeze the encoding as much as possible, which of course I'm immediately nerd-sniped by]

ME: i'm intrigued. can we do even better in the common case by custom-ordering the cells? like the middle 3 cells of row 10-epochal and the first and last cells of 7-decennial are often 0 pips. all cells of 6-annual are almost always 0 pips.

GEMINI: [spits out an implementation plan that I don't read]

ME: it's no harder or messier with base62, right? i think it's a little cleaner looking without "-" and "_"

GEMINI: [stops asking questions and starts writing code]

ME: you forgot to test your code

GEMINI: [fixes its code]

ME: regression: the initial placement of pips has them overlapping

GEMINI: [fixes that]

Expand full comment

Rainbow Roxy

13h

Love this perspecive, it's so insightful! I was just thinking the other day how I'd love to build a little tracker for my Pilates routine, and your point about the new models makes me think it's finally within reach without having to argue with the AI.

Expand full comment

SorenJ

Dec 20

Grok is always in a weird place, in some sense it is on par with the other 3, but I never actually use it

Expand full comment

Reply (1)

Daniel Reeves

Dec 20

I'm skeptical that Grok is on par with big three from Anthropic, Google DeepMind, and OpenAI. What makes you say so? Do you literally never use Grok? (me neither, to be clear; I'm expressing skepticism, not conviction)

Expand full comment

Reply (1)

SorenJ

On benchmarks it's on par, and on lmarena it is as well. I mostly use LLMs for physics/math. It is pretty good at those. I don't literally never use Grok, but I only rarely use it. The vibes of it are off in a weird way though despite being "smart." It feels like it does a poor job at maintaing a consistent role or sense of self.

Expand full comment