Jevons's Paradox and the Burden of Clarifying Your Ideas
In which I debate with DHH on whether programming is more like agriculture or steam power
A year ago, David Heinemeier Hansson wrote a somewhat fatalistic post about programmers being replaced by AI.1 Maybe, says DHH, programmers are about to be mostly automated away like farmers were during the industrial revolution.
A short history of coal
I know Wikipedia explains this perfectly well, and feel free to skip ahead to the actual AI part, but I’ll set the stage in my own words. In the 1800s, the economist WS Jevons wrote a book called “The Coal Question” in which he argued against predictions that demand for coal was about to fall off a cliff. Why did people think that? Because James Watt had just invented his steam engine and people figured we’d switch from coal to steam. Just kidding, steam is not a fuel. In fact, steam engines were powered by coal and Watt’s steam engine was drastically more fuel-efficient. Same work, less coal.
(Random aside: Is it “Jevons’s paradox” or “the Jevons paradox”? The internet seems to have converged on the latter but as another person with a name ending in S, this is my grammatical death-hill.)
Jevons pointed out what’s obvious in retrospect: Instead of doing the same work with less coal, people would find more and more uses for the steam engine and use more coal than ever.
An even shorter history of agriculture
Needing less coal may have paradoxically increased demand for coal, but that’s not how it played out with agriculture. Beasts of burden got thoroughly replaced by technology and even for humans, almost all the farming jobs went away. It used to be that almost everyone had to spend most of their working hours producing food. Now it’s a tiny sliver of the population.
The actual AI part
My contention is that programming is like coal and industrial energy, not horses and farming. In general, what makes Jevons’s paradox apply or not is whether there’s unbounded appetite for the output of a new technology. In the case of food, there’s only so much we can eat (try as we might to push the limits). In the case of energy, we’ll just keep dreaming up new uses forever. The cheaper it is, the more applications become practical. Computer code is clearly in the same category.
Back in the 1950s we used The Power of Programming to automate away the vast majority of human programming work. Namely, we invented compilers and high-level languages. In fact, as Clive Freeman put it, COBOL was originally intended to be the English-language-like thing which business analysts could write code in — no pesky programmers needed. Alas, it’s like in this classic xkcd:2
In terms of Jevons’s paradox, the easier it gets to write code, the more code the world wants and needs. Every time programmers are able to do more in less time, the demand for them grows.
Until AGI of course, when all bets are off.
One more Jevons example
People were talking about Jevons’s paradox a lot back in January in the context of Nvidia’s stock plummeting the week after DeepSeek launched a language model that seemed to rival the frontier models but needing much less training compute. It was the same point: if we can do more with less compute, that just means Nvidia’s GPUs are that much more valuable. Nvidia’s stock price plummeting made no sense as a reaction to DeepSeek.
Predictions
To be concrete, I’m predicting a few things:
Pre-AGI, the job market for programmers will boom.
Even when LLMs can do basically all the coding, knowing what to ask for and how to shepherd the AI along will be a valuable human skill and the humans doing it will still recognizably be programmers.
When that’s no longer true, that’s AGI, and the world turns completely upside-down.
PS, clarification based on further discussion with DHH:
In the original Jevons paradox it’s the coal itself, not the coal miners, whose demand skyrocketed when much more could be done with much less coal. Coal miners have been automated away the same way assembly line jobs have been.
In the News
As a quick update from last week, I reported that Claude was 24% of the way to “stealing your job” and Manifold traders expect that to climb to 43% in a year:
If you’re into highly technical YouTube videos about AI, Welch Labs has a new one.
Zvi Mowshowitz has fascinating thoughts on cheating at school with AI
Random tip: If you ask an LLM to, say, extract a bunch of headers from a document for you, you can’t really trust it not to have missed any. But if you say “can you write a Python program to do that” then, gobsmackingly, it will, and the result will be pretty trustworthy. It’s starting to feel like anyone not using these tools on a daily basis is… unserious.
Via Christopher Moravec, and relevant to the first prediction we’re waiting on for assessing claims that we could have AGI in 2027: Microsoft is launching AI agents.
OpenAI gives up on its evil plan to murder its nonprofit parent. That might not be the most balanced characterization. I like this one by Scott Alexander:
One reason I respect Sam Altman is that back in 2016, when he founded an AI charity to bring a positive singularity to the world, he realized that it would later be extraordinarily tempting to turn it into a normal profit-focused company and get rich. So he tied himself to the mast by designing a nonprofit structure capable of thwarting all the machinations his future self could throw at it. A few years later, he gave into temptation, tried to turn it into a normal profit-focused company, and failed, because the structure he designed was really good.
Self-driving trucks are here:
Thanks to David Yang, David Heinemeier Hansson, Clive Freeman, and Christopher Moravec for helpful discussions about Jevons’s paradox.
DHH wrote an even more fatalistic post last month, which I’d like to counterargue in a future AGI Friday. Short version: rage, rage against the dying of the light.
Thanks to Ghillie Dhu for the pointer.
I suggested "Jevons' paradox" per mail, but then Danny said:
> Well, there are competing conventions. I strongly prefer the one that doesn't introduce ambiguity. Like are there two people named JEVON (no s) and together they have a paradox? That's when I would write "Jevons' paradox". I especially care about this since Reeve and Reeves are both common names.
I think in that case, if we strive to eliminate all ambiguity(*), I would still prefer "the Jevons paradox", for aesthetic reasons. "Jevons's paradox" just looks a bit inelegant—but grammatically, it would be perfectly sound as far as I can tell. (English isn't my native language.)
(*) I understand where that's coming from, but I personally don't think that unambiguousness should be one of the top 3 priorities of language. MAYBE it should be in the top 5? Definitely somewhere in the top 10.
But I did study literature, so obviously I'm a big fan of ambiguity in language. Not to mention that it will eventually develop on its own, because language and meaning and usage etc. are never static.
> Random tip: If you ask an LLM to, say, extract a bunch of headers from a document for you, you can’t really trust it not to have missed any. But if you say “can you write a Python program to do that” then, gobsmackingly, it will, and the result will be pretty trustworthy. It’s starting to feel like anyone not using these tools on a daily basis is… unserious.
This frustrates me, because this approach (probably?) can’t work on arbitrary tokens like language or poetry as well where you’re judging things like sentence structure or tone compatibility.
However, every LLM manufacturer must know at this point exactingly the relative accuracy and hallucinations caused by running a certain non-chain of thought or chain of thought model for a certain context window.
If I’m not mistaken current chain of thought models attempt to fit as much into their larger context windows as possible resulting in progressively slower, expensive, AND less accurate output as the tokens are spat out.
I’m sure SOMEONE has already done this like most everything I mention in my comments on this blog but why don’t chain of thought models spin up cheap non chain of thought models up to the length of the context window it knows can still have accurate results for mission critical outputs like headers, and treats the main “thread” as a dispatcher for other smaller threads?
Even better, when a “sub calculation” would clearly faire better in a non-llm context like R why not spin up a R script using a plugin to process data?
Lastly, I don’t see why (there must be a good reason) why many chain of thought models just allow themselves to keep getting slower over time for users by allowing such giant context windows, and not instead making a judgement call on whether certain sets of tokens require a bigger context window, and how many compute cycles could be saved by using chain of thought models for one off “calculations” and having cheaper models be the ones that actually talk to end users only seeing the context windows that pertains to the prompt and whatever the chain of thought model spit out.
Wait, wait, wait, this isn’t how deep research models work, no? I haven’t looked into their architectures but never bothered figuring it’s as black box as everything else.