The 80% problem maps directly to what we've seen in production. The gap isn't model capability — it's the compound cost of verification.
That last 20% isn't linear. Each incremental % requires exponentially more human oversight, edge case handling, and rollback infrastructure. The economics flip somewhere around 75-85% depending on domain complexity.
Most teams underestimate this until they've built it twice.
Yeah I think what can be particularly insidious is that the first 70-80% feels so effortless that teams underestimate the verification infrastructure needed for the remainder. By the time they realize the exponential cost, they're already committed to the approach.
Building it twice is unfortunately common (I believe) - the first time to learn where the real costs hide.
"Committed to the approach" is the trap. We hit this exact point at Kult around 82% accuracy on shade matching. Leadership had already announced the feature. Engineering had invested months. The remaining 18% became a sunk cost negotiation rather than a design decision.
What finally worked: We stopped trying to close the gap with the same system. Built a hybrid where the AI handles the 80%, flags uncertainty, and routes edge cases to structured human review. The economics shifted overnight.
Twice is mercy. Some teams build it three times before they realize the 100% target was never the right goal.
We should not take the opinion of Andrej too seriously. He is an AI researcher, and most AI researchers are generally at most passable software engineers, if not outright bad ones.
It is the same old age problem of scientists code.
Yeah the distinction between "scientist code" and production engineering is fair (this is one reason I included the Claude Code team perspective too), but the patterns he describes - assumption propagation, abstraction bloat, sycophantic agreement - these show up regardless of whether you're doing research or building enterprise systems.
Great article as always, Addy. We constantly see Agentic systems optimize locally, but not structurally. Without deterministic guardrails, context decay turns “correct” diffs into long-term drift. This is why post-hoc review can’t keep up, governance has to run continuously at the point of change. That’s the problem we’ve built Mault to solve. (Mault.ai)
Yes! "optimize locally, not structurally" - that's a great way to frame it. Context decay is real, especially in longer agentic sessions where early decisions compound into architectural drift.
I think that continuous governance at the point of change probably makes sense.
Post-hoc review alone creates that 91% increase in review times we're seeing. Will check out Mault - interested to see your approach! :)
What would you tell someone who wants to learn to become a software developer starting today?
(I see three responses, heavily concentrated along (1) and (3)
1) Don't bother. The job as you know it won't exist in 2–3 years. It's like learning to ride a horse in 1906.
2) Still do it, it's risky like learning to ride horses in 1906. The occupation will massively shrink in quantity and compensation, but you may be one of the ones that make it.
3) The best time to learn coding/software engineering.)
Great question. I'd probably lean toward (2) with optimism skewed toward (3) for the right people.
The fundamentals matter more than ever: understanding systems, architecture,
debugging, problem decomposition. AI doesn't eliminate the need for these - it amplifies them. If you learn to code today with AI as a learning accelerator (not a crutch), you can cover more ground faster than previous generations.
But the job is changing imo. If your goal is to write syntax all day, that's disappearing. If your goal is to solve problems with software, we need more people who can do that well - especially people who understand both the technical fundamentals AND how to orchestrate AI effectively.
The field won't disappear, but I do think it will transform (and we don't know exactly what shape that is going to take yet). Learning now means you grow up as a native in this new paradigm rather than having to unlearn old habits later.
One thing I think is overlooked in this discussion is the variety of ways that people read and understand code. I, for instance, have ALWAYS had trouble just looking at a wall of code and conceptualizing what's going on.
I have to step through it. I have to change a boolean value manually and watch the logs. That may mean I'm not a 10x developer, but if industry-wide adoption is the goal for AI, then they need lil' 1x engineers like me to feel like it's making their job easier.
And, frankly, it just isn't. If I can parallelize the work of conceptualizing the code and creating it, I find myself moving faster. The extent to which LLMs can assist me on that journey is the extent to which I have found them valuable or useful.
I've been thinking on the same problem a lot. I think the way most people think on the tools are skewed on "replace your engineering team" rather than to "enhance yours".
Product managers and engineering managers already think like that: how do I make myself clear, how do I delegate, how do I recognise a good fit, whereas engineers don't have developed this skill.
I'm a way we are all becoming platform engineers. We own the platform, AI live in it. How do we make it thrive?
Very interesting article, thank you! I understand the concept of running multiple AI agents ("orchestrator" role), but I wonder about the real long-term productivity gains, as multitasking (referred to as ‘micromanagement tax’) is known to be counterproductive.
We hit this exact pattern with our shade matching system. By the time we realized the confidence scoring was masking failure cases, we'd already integrated it into three different product flows.
The pivot cost wasn't the code — it was recalibrating customer expectations. They'd gotten used to "AI always has an answer." Teaching them (and our CS team) that "I'm not sure, let me show you options" is actually better... that took longer than the rebuild.
The second build was faster but the trust rebuild was slower.
Whatever it is that you’re afraid that the agent is going to do, just build a workflow that relentlessly tests for those things and rejects them as automatically as you can.
The 80% problem maps directly to what we've seen in production. The gap isn't model capability — it's the compound cost of verification.
That last 20% isn't linear. Each incremental % requires exponentially more human oversight, edge case handling, and rollback infrastructure. The economics flip somewhere around 75-85% depending on domain complexity.
Most teams underestimate this until they've built it twice.
Yeah I think what can be particularly insidious is that the first 70-80% feels so effortless that teams underestimate the verification infrastructure needed for the remainder. By the time they realize the exponential cost, they're already committed to the approach.
Building it twice is unfortunately common (I believe) - the first time to learn where the real costs hide.
"Committed to the approach" is the trap. We hit this exact point at Kult around 82% accuracy on shade matching. Leadership had already announced the feature. Engineering had invested months. The remaining 18% became a sunk cost negotiation rather than a design decision.
What finally worked: We stopped trying to close the gap with the same system. Built a hybrid where the AI handles the 80%, flags uncertainty, and routes edge cases to structured human review. The economics shifted overnight.
Twice is mercy. Some teams build it three times before they realize the 100% target was never the right goal.
We should not take the opinion of Andrej too seriously. He is an AI researcher, and most AI researchers are generally at most passable software engineers, if not outright bad ones.
It is the same old age problem of scientists code.
Yeah the distinction between "scientist code" and production engineering is fair (this is one reason I included the Claude Code team perspective too), but the patterns he describes - assumption propagation, abstraction bloat, sycophantic agreement - these show up regardless of whether you're doing research or building enterprise systems.
Very good read , here is my perspective as well - https://open.substack.com/pub/amitabhsharan/p/the-one-question-that-matters-in
Thanks for sharing!
Thank you for this overview! Excellent read!
Thank you! I'm glad to hear it was helpful!
Great article, a lot of points really resonated with me.
Happy to hear it resonated, Harshal!
Wow! Extremely interesting to me! I love your essays around AI!
Happy if they are helpful in any way!
They are, indeed, undoubtedly!
Great article as always, Addy. We constantly see Agentic systems optimize locally, but not structurally. Without deterministic guardrails, context decay turns “correct” diffs into long-term drift. This is why post-hoc review can’t keep up, governance has to run continuously at the point of change. That’s the problem we’ve built Mault to solve. (Mault.ai)
Yes! "optimize locally, not structurally" - that's a great way to frame it. Context decay is real, especially in longer agentic sessions where early decisions compound into architectural drift.
I think that continuous governance at the point of change probably makes sense.
Post-hoc review alone creates that 91% increase in review times we're seeing. Will check out Mault - interested to see your approach! :)
Addy, I’d love to provide you with a free Mault Pro account and get your feedback. I’ll DM you with it shortly.
Thanks Addy, a great read.
Thanks for the kind words, Nick!
What would you tell someone who wants to learn to become a software developer starting today?
(I see three responses, heavily concentrated along (1) and (3)
1) Don't bother. The job as you know it won't exist in 2–3 years. It's like learning to ride a horse in 1906.
2) Still do it, it's risky like learning to ride horses in 1906. The occupation will massively shrink in quantity and compensation, but you may be one of the ones that make it.
3) The best time to learn coding/software engineering.)
Great question. I'd probably lean toward (2) with optimism skewed toward (3) for the right people.
The fundamentals matter more than ever: understanding systems, architecture,
debugging, problem decomposition. AI doesn't eliminate the need for these - it amplifies them. If you learn to code today with AI as a learning accelerator (not a crutch), you can cover more ground faster than previous generations.
But the job is changing imo. If your goal is to write syntax all day, that's disappearing. If your goal is to solve problems with software, we need more people who can do that well - especially people who understand both the technical fundamentals AND how to orchestrate AI effectively.
The field won't disappear, but I do think it will transform (and we don't know exactly what shape that is going to take yet). Learning now means you grow up as a native in this new paradigm rather than having to unlearn old habits later.
Until it deletes everything (Replit XD)
One thing I think is overlooked in this discussion is the variety of ways that people read and understand code. I, for instance, have ALWAYS had trouble just looking at a wall of code and conceptualizing what's going on.
I have to step through it. I have to change a boolean value manually and watch the logs. That may mean I'm not a 10x developer, but if industry-wide adoption is the goal for AI, then they need lil' 1x engineers like me to feel like it's making their job easier.
And, frankly, it just isn't. If I can parallelize the work of conceptualizing the code and creating it, I find myself moving faster. The extent to which LLMs can assist me on that journey is the extent to which I have found them valuable or useful.
I've been thinking on the same problem a lot. I think the way most people think on the tools are skewed on "replace your engineering team" rather than to "enhance yours".
Product managers and engineering managers already think like that: how do I make myself clear, how do I delegate, how do I recognise a good fit, whereas engineers don't have developed this skill.
I'm a way we are all becoming platform engineers. We own the platform, AI live in it. How do we make it thrive?
Very interesting article, thank you! I understand the concept of running multiple AI agents ("orchestrator" role), but I wonder about the real long-term productivity gains, as multitasking (referred to as ‘micromanagement tax’) is known to be counterproductive.
The commitment trap you mention is real.
We hit this exact pattern with our shade matching system. By the time we realized the confidence scoring was masking failure cases, we'd already integrated it into three different product flows.
The pivot cost wasn't the code — it was recalibrating customer expectations. They'd gotten used to "AI always has an answer." Teaching them (and our CS team) that "I'm not sure, let me show you options" is actually better... that took longer than the rebuild.
The second build was faster but the trust rebuild was slower.
Agentic AI is the gateway to unlimited automation which is a must when wanting to save time and make more profits, refer to, Agentic: https://promptengineer-1.weebly.com/agentic.html
Also, Agentic AI Prompt Vault: https://promptengineer-1.weebly.com/agentic-ai-prompt-vault.html
Whatever it is that you’re afraid that the agent is going to do, just build a workflow that relentlessly tests for those things and rejects them as automatically as you can.