29 Comments
User's avatar
Devesh's avatar

The 80% problem maps directly to what we've seen in production. The gap isn't model capability — it's the compound cost of verification.

That last 20% isn't linear. Each incremental % requires exponentially more human oversight, edge case handling, and rollback infrastructure. The economics flip somewhere around 75-85% depending on domain complexity.

Most teams underestimate this until they've built it twice.

Addy Osmani's avatar

Yeah I think what can be particularly insidious is that the first 70-80% feels so effortless that teams underestimate the verification infrastructure needed for the remainder. By the time they realize the exponential cost, they're already committed to the approach.

Building it twice is unfortunately common (I believe) - the first time to learn where the real costs hide.

Devesh's avatar

"Committed to the approach" is the trap. We hit this exact point at Kult around 82% accuracy on shade matching. Leadership had already announced the feature. Engineering had invested months. The remaining 18% became a sunk cost negotiation rather than a design decision.

What finally worked: We stopped trying to close the gap with the same system. Built a hybrid where the AI handles the 80%, flags uncertainty, and routes edge cases to structured human review. The economics shifted overnight.

Twice is mercy. Some teams build it three times before they realize the 100% target was never the right goal.

Esborogardius Antoniopolus's avatar

We should not take the opinion of Andrej too seriously. He is an AI researcher, and most AI researchers are generally at most passable software engineers, if not outright bad ones.

It is the same old age problem of scientists code.

Addy Osmani's avatar

Yeah the distinction between "scientist code" and production engineering is fair (this is one reason I included the Claude Code team perspective too), but the patterns he describes - assumption propagation, abstraction bloat, sycophantic agreement - these show up regardless of whether you're doing research or building enterprise systems.

Addy Osmani's avatar

Thanks for sharing!

Thomas Junghans's avatar

Thank you for this overview! Excellent read!

Addy Osmani's avatar

Thank you! I'm glad to hear it was helpful!

Harshal Shah's avatar

Great article, a lot of points really resonated with me.

Addy Osmani's avatar

Happy to hear it resonated, Harshal!

Paweł Twardziak's avatar

Wow! Extremely interesting to me! I love your essays around AI!

Addy Osmani's avatar

Happy if they are helpful in any way!

Paweł Twardziak's avatar

They are, indeed, undoubtedly!

Kimberley Modeste's avatar

Great article as always, Addy. We constantly see Agentic systems optimize locally, but not structurally. Without deterministic guardrails, context decay turns “correct” diffs into long-term drift. This is why post-hoc review can’t keep up, governance has to run continuously at the point of change. That’s the problem we’ve built Mault to solve. (Mault.ai)

Addy Osmani's avatar

Yes! "optimize locally, not structurally" - that's a great way to frame it. Context decay is real, especially in longer agentic sessions where early decisions compound into architectural drift.

I think that continuous governance at the point of change probably makes sense.

Post-hoc review alone creates that 91% increase in review times we're seeing. Will check out Mault - interested to see your approach! :)

Kimberley Modeste's avatar

Addy, I’d love to provide you with a free Mault Pro account and get your feedback. I’ll DM you with it shortly.

Nick Coleman's avatar

Thanks Addy, a great read.

Addy Osmani's avatar

Thanks for the kind words, Nick!

TheNeverEndingFall's avatar

What would you tell someone who wants to learn to become a software developer starting today?

(I see three responses, heavily concentrated along (1) and (3)

1) Don't bother. The job as you know it won't exist in 2–3 years. It's like learning to ride a horse in 1906.

2) Still do it, it's risky like learning to ride horses in 1906. The occupation will massively shrink in quantity and compensation, but you may be one of the ones that make it.

3) The best time to learn coding/software engineering.)

Addy Osmani's avatar

Great question. I'd probably lean toward (2) with optimism skewed toward (3) for the right people.

The fundamentals matter more than ever: understanding systems, architecture,

debugging, problem decomposition. AI doesn't eliminate the need for these - it amplifies them. If you learn to code today with AI as a learning accelerator (not a crutch), you can cover more ground faster than previous generations.

But the job is changing imo. If your goal is to write syntax all day, that's disappearing. If your goal is to solve problems with software, we need more people who can do that well - especially people who understand both the technical fundamentals AND how to orchestrate AI effectively.

The field won't disappear, but I do think it will transform (and we don't know exactly what shape that is going to take yet). Learning now means you grow up as a native in this new paradigm rather than having to unlearn old habits later.

Rishav Mitra's avatar

Until it deletes everything (Replit XD)

Michael Utz's avatar

One thing I think is overlooked in this discussion is the variety of ways that people read and understand code. I, for instance, have ALWAYS had trouble just looking at a wall of code and conceptualizing what's going on.

I have to step through it. I have to change a boolean value manually and watch the logs. That may mean I'm not a 10x developer, but if industry-wide adoption is the goal for AI, then they need lil' 1x engineers like me to feel like it's making their job easier.

And, frankly, it just isn't. If I can parallelize the work of conceptualizing the code and creating it, I find myself moving faster. The extent to which LLMs can assist me on that journey is the extent to which I have found them valuable or useful.

Danilo Velasquez's avatar

I've been thinking on the same problem a lot. I think the way most people think on the tools are skewed on "replace your engineering team" rather than to "enhance yours".

Product managers and engineering managers already think like that: how do I make myself clear, how do I delegate, how do I recognise a good fit, whereas engineers don't have developed this skill.

I'm a way we are all becoming platform engineers. We own the platform, AI live in it. How do we make it thrive?

The Baffled Reader's avatar

Very interesting article, thank you! I understand the concept of running multiple AI agents ("orchestrator" role), but I wonder about the real long-term productivity gains, as multitasking (referred to as ‘micromanagement tax’) is known to be counterproductive.

Devesh's avatar

The commitment trap you mention is real.

We hit this exact pattern with our shade matching system. By the time we realized the confidence scoring was masking failure cases, we'd already integrated it into three different product flows.

The pivot cost wasn't the code — it was recalibrating customer expectations. They'd gotten used to "AI always has an answer." Teaching them (and our CS team) that "I'm not sure, let me show you options" is actually better... that took longer than the rebuild.

The second build was faster but the trust rebuild was slower.

Nick W's avatar

Agentic AI is the gateway to unlimited automation which is a must when wanting to save time and make more profits, refer to, Agentic: https://promptengineer-1.weebly.com/agentic.html

Also, Agentic AI Prompt Vault: https://promptengineer-1.weebly.com/agentic-ai-prompt-vault.html

Francisco d’Anconia's avatar

Whatever it is that you’re afraid that the agent is going to do, just build a workflow that relentlessly tests for those things and rejects them as automatically as you can.