65 Comments
User's avatar
Devesh's avatar

The 80% problem maps directly to what we've seen in production. The gap isn't model capability — it's the compound cost of verification.

That last 20% isn't linear. Each incremental % requires exponentially more human oversight, edge case handling, and rollback infrastructure. The economics flip somewhere around 75-85% depending on domain complexity.

Most teams underestimate this until they've built it twice.

Addy Osmani's avatar

Yeah I think what can be particularly insidious is that the first 70-80% feels so effortless that teams underestimate the verification infrastructure needed for the remainder. By the time they realize the exponential cost, they're already committed to the approach.

Building it twice is unfortunately common (I believe) - the first time to learn where the real costs hide.

Devesh's avatar

"Committed to the approach" is the trap. We hit this exact point at Kult around 82% accuracy on shade matching. Leadership had already announced the feature. Engineering had invested months. The remaining 18% became a sunk cost negotiation rather than a design decision.

What finally worked: We stopped trying to close the gap with the same system. Built a hybrid where the AI handles the 80%, flags uncertainty, and routes edge cases to structured human review. The economics shifted overnight.

Twice is mercy. Some teams build it three times before they realize the 100% target was never the right goal.

James Williams's avatar

The building code twice or thrice is what we miss. This underpins the notion that the decision of what to do, and what direction to take, is more important than typing code. And it always has been.

Esborogardius Antoniopolus's avatar

We should not take the opinion of Andrej too seriously. He is an AI researcher, and most AI researchers are generally at most passable software engineers, if not outright bad ones.

It is the same old age problem of scientists code.

Addy Osmani's avatar

Yeah the distinction between "scientist code" and production engineering is fair (this is one reason I included the Claude Code team perspective too), but the patterns he describes - assumption propagation, abstraction bloat, sycophantic agreement - these show up regardless of whether you're doing research or building enterprise systems.

Janusz Hain's avatar

Is Claude Code team trustworthy as well? They are selling the product, they have a reason to hype it up. Also what kind of PRs do they deliver, it is also an important question when saying "We deliver X PRs per day". Are these new big features, small features or mostly bug fixes?

Mark S. Carroll's avatar

Addy, this nails the shift I keep seeing: the bottleneck didn’t disappear, it moved from typing to comprehension. The scary part is not that agents are wrong. The scary part is that they’re plausibly right at a speed that trains humans to rubber-stamp.

“Comprehension debt” is the exact phrase. The real superpower in 2026 is not generating more code. It’s knowing what good looks like, writing success criteria, and saying “we don’t need this” before the agent ships a thousand lines of confident architecture cosplay.

Addy Osmani's avatar

You nailed the actual risk, Mark - not that agents make obvious mistakes, but that they generate plausible-looking solutions at a pace that conditions teams to skip critical thinking. The superpower is definitely the ability to say "no" before the agent builds it.

Mark S. Carroll's avatar

Appreciate that, Addy. Plausible speed is the trap. Once agents can produce convincing work on demand, the scarce skill is no longer coding. It's discernment. The edge belongs to teams that can set constraints, spot nonsense early, and kill bad ideas before they harden into “strategy.” Otherwise we automate our way into self sabotage.

Jim Hodapp's avatar

I’m mostly worried about the incentives that competitive markets place on engineers as well, to rubber stamp too much because there’s not enough time to slow down enough to do a more thorough job, or to slow down sometimes to go to the mental gym to keep one’s mental muscles strong. Nobody is stopping in the tech industry to ask why we are constantly trying to move faster. To what end - so we can all be dumb and sick?

Mark S. Carroll's avatar

Well said. Once plausible output gets cheap, the pressure to keep moving gets even more dangerous. Teams need room for review, restraint, and actual judgment, or else “faster” just becomes a fancier word for mass-produced scorched earth.

Jim Hodapp's avatar

Thanks, Mark.

Indeed, why not optimize for human satisfaction and empowerment instead? In the end, if AI ends up replacing us all as professional engineers in the workforce, we’re only going to have each other left.

We need to figure out how to slow down the purely greedy processes of competition. It’s one thing to research and cure cancer at a faster pace, and it’s entirely another thing to go faster relentlessly in the name of getting rich.

The consequences of not figuring this out before we lose our leverage and positions of power, I believe, are catastrophic.

Very Tired's avatar

Confident architecture cosplay lol.

Mark S. Carroll's avatar

Every blue moon or so I have a moment.

Dan's avatar

Subscribed! Very informative, high quality assessment of where we are at with agentic coding at the beginning of 2026. Well written, including some well-placed truisms like

- "We got faster cars, but the roads got more congested."

- "If your ability to “read” doesn’t scale at the same rate as the agent’s ability to “output,” you aren’t engineering anymore. You’re rubber stamping."

Addy Osmani's avatar

Thanks for subscribing, Dan! Much appreciated :)

The truisms emerged from watching teams hit the same walls repeatedly. The "rubber stamping" moment is where things tend to break down - if review becomes performative rather than substantive folks end up losing the plot a bit :)

Thomas Junghans's avatar

Thank you for this overview! Excellent read!

Addy Osmani's avatar

Thank you! I'm glad to hear it was helpful!

Harshal Shah's avatar

Great article, a lot of points really resonated with me.

Addy Osmani's avatar

Happy to hear it resonated, Harshal!

TheNeverEndingFall's avatar

What would you tell someone who wants to learn to become a software developer starting today?

(I see three responses, heavily concentrated along (1) and (3)

1) Don't bother. The job as you know it won't exist in 2–3 years. It's like learning to ride a horse in 1906.

2) Still do it, it's risky like learning to ride horses in 1906. The occupation will massively shrink in quantity and compensation, but you may be one of the ones that make it.

3) The best time to learn coding/software engineering.)

Addy Osmani's avatar

Great question. I'd probably lean toward (2) with optimism skewed toward (3) for the right people.

The fundamentals matter more than ever: understanding systems, architecture,

debugging, problem decomposition. AI doesn't eliminate the need for these - it amplifies them. If you learn to code today with AI as a learning accelerator (not a crutch), you can cover more ground faster than previous generations.

But the job is changing imo. If your goal is to write syntax all day, that's disappearing. If your goal is to solve problems with software, we need more people who can do that well - especially people who understand both the technical fundamentals AND how to orchestrate AI effectively.

The field won't disappear, but I do think it will transform (and we don't know exactly what shape that is going to take yet). Learning now means you grow up as a native in this new paradigm rather than having to unlearn old habits later.

Dr Jim Polk's avatar

Brilliant article Addy...gives me lots of food for thought. Thank you.

Addy Osmani's avatar

Happy if it was helpful at all :)

Addy Osmani's avatar

Thanks for sharing!

Paweł Twardziak's avatar

Wow! Extremely interesting to me! I love your essays around AI!

Addy Osmani's avatar

Happy if they are helpful in any way!

Paweł Twardziak's avatar

They are, indeed, undoubtedly!

Kimberley Modeste's avatar

Great article as always, Addy. We constantly see Agentic systems optimize locally, but not structurally. Without deterministic guardrails, context decay turns “correct” diffs into long-term drift. This is why post-hoc review can’t keep up, governance has to run continuously at the point of change. That’s the problem we’ve built Mault to solve. (Mault.ai)

Addy Osmani's avatar

Yes! "optimize locally, not structurally" - that's a great way to frame it. Context decay is real, especially in longer agentic sessions where early decisions compound into architectural drift.

I think that continuous governance at the point of change probably makes sense.

Post-hoc review alone creates that 91% increase in review times we're seeing. Will check out Mault - interested to see your approach! :)

Kimberley Modeste's avatar

Addy, I’d love to provide you with a free Mault Pro account and get your feedback. I’ll DM you with it shortly.

Nick Coleman's avatar

Thanks Addy, a great read.

Addy Osmani's avatar

Thanks for the kind words, Nick!

Lars Faye's avatar

Anthropic has been all over the place in their releases. One moment they say it's writing 100% of the code, another moment they say that it's writing 20% of the code:

https://www.anthropic.com/research/how-ai-is-transforming-work-at-anthropic

One moment they say AI is elevating the industry, another moment they're saying it's detrimental to skill development:

https://www.anthropic.com/research/AI-assistance-coding-skills

Going off empirical evidence, combined with all the other studies and research that has come out, I believe that these are the rare moments that they are speaking truthfully instead of trying to make headlines.

Rishav Mitra's avatar

Until it deletes everything (Replit XD)

Very Tired's avatar

Based.on my experience re the above, I am trying overall prompts to always ask about uncertainties, never assume, and build the minimal solution. Still wrestling with all this.

Robin Harris's avatar

It's the willingness to sell the solution that creating this overselling hype that will have serious ramifications in ways we cannot comprehend. When we finally realize we have cannibalized human intelligence and abilities for something that was to replace us but cannot. A defect will show up in production that is so pervasive but fixing it will be so difficult because the code was not built for humans to read or maintain.

No good coder ... is just a coder. There are so many nuances to being able to solve real world problems for HUMANS that cannot be measured. It breaks my heart how greed and this is about nothing else, has made fundamentals seem optional.