The reality of AI-Assisted software engineering productivity
What the data really shows about AI coding tools in 2025
tl;dr: AI functions as a situational force multiplier - providing modest, uneven boosts that augment rather than transform engineering productivity. Individual developers and those working on “new” projects see speed boosts with AI tools, but these gains aren't (yet) translating to overall team productivity:
AI excels at greenfield projects but struggles with complex legacy codebases
84% of devs use AI tools; only 60% view them favorably, down from 70% in 2023
Studies show 20-30% productivity improvements, far from “10x” claims
Most use basic autocomplete features, not full autonomous coding agents
66% cite AI's "almost correct" solutions their biggest time sink due to debugging
Adoption Soars, trust plummets: the 2025 developer sentiment
AI coding assistants have rapidly become part of the developer toolkit – but confidence in their output has declined.
According to Stack Overflow’s 2025 Developer Survey (49,000+ devs globally), 84% of respondents are using or planning to use AI tools in their development process, up from 76% a year prior. Over half of professional developers now use AI coding tools daily. This represents a remarkable adoption curve – AI pair programmers went from novelty to normalcy in under two years.
Developers are primarily leveraging these tools for help with coding problems and tedious tasks: the survey found the top uses of AI were searching for answers (54% of respondents), generating code or synthetic data (36%), learning new concepts (33%), and even writing documentation (30%). In short, AI is touching many parts of the dev workflow.
Paradoxically, as usage increased, positive sentiment has fallen. Stack Overflow reports that favorable views of AI tools dropped from over 70% in 2023 to just ~60% in 2025. In practice,46% of developers say they don’t trust the accuracy of AI output – a sharp rise in skepticism from 31% last year. The data suggests many developers have encountered the limitations and flaws of these tools firsthand.
The number-one frustration, cited by 66% of devs, is AI solutions that are “almost right, but not quite,” which often leads to time-consuming debugging. Another 45% specifically complained that debugging AI-generated code is more work than it’s worth. This sentiment came through loud and clear in the survey and echoes across developer forums: AI helpers often accelerate typing but can inject subtle bugs or nonsense that soak up time.
“AI solutions that are almost right, but not quite, are now my biggest time sink. The code looks plausible but I end up spending more time fixing those ‘helpful’ suggestions.” – Survey respondent, cited by Stack Overflow
Crucially, most developers are not (yet) using AI to fully automate programming or “agentically” build entire applications.
The Stack Overflow survey surveyed “vibe coding” – meaning letting an AI generate whole programs from high-level prompts – and found that nearly 72% said vibe coding is not part of their professional work, with an additional 5% “emphatically” avoiding it.
In other words, roughly 77% of developers do no whole-app generation on the job. Most are using AI in a more incremental, assistive capacity (like code completion, example generation, or Q&A), not as autonomous project-builders.
This aligns with the finding that while 52% say AI “agents” have affected how they work, the primary benefit cited is personal productivity boosts (69% saw an increase in their own throughput) – not fundamental changes to how software is delivered. And despite all the “AI will replace programmers” media chatter, 64% of developers do not see AI as a threat to their jobs (though that’s down slightly from 68% last year, indicating a bit more unease).
In summary, right now AI-assisted coding is mainstream, but wariness is high. Developers appreciate the time-savers but have learned to “trust, but verify” every output. As Stack Overflow’s report put it, more developers are using AI tools, but their trust in those tools is falling. This cracks in the foundation set the stage: why aren’t these tools living up to the wild productivity promises?
Hype vs reality: Why “10× Engineers” remain unicorns
Amid the exuberance, many experienced engineers have pushed back on the notion that AI is making devs “10× more productive” overnight.
A notable example is Colton Voege’s essay, “No, AI is Not Making Engineers 10× as Productive – Curing Your AI ‘10× Engineer’ Imposter Syndrome.” Voege addresses the anxiety some developers feel seeing social media posts claiming that “real engineers” are now using LLMs to churn out 10–100× more output by spinning up numerous agent instances in parallel. He admits even he momentarily wondered if he was being left behind.
But after deep experimentation with various AI coding approaches, his conclusion was that the 10× claims don’t withstand scrutiny:
“I wouldn’t be surprised to learn AI helps many engineers do certain tasks 20–50% faster, but the nature of software bottlenecks means this doesn’t translate to a 20% productivity increase – and certainly not a 10× increase.”
In other words, AI can speed up coding tasks, but overall engineering outcomes (features delivered, systems deployed) are constrained by many other factors. Writing code is often not the slowest part of software development; tasks like designing architecture, clarifying requirements, code reviewing, testing, fixing bugs, and coordinating with teammates don’t magically compress just because you can generate a function faster.
Voege walks through a simple reality check: “10× productivity means what you used to ship in a quarter you now ship in a week and a half”. That would require every step – product planning, code reviews, QA, deployments – to happen 10× faster, which is implausible in any real-world team. As he dryly notes, “You can’t compress the back-and-forth of 3 months of code review into 1.5 weeks… This simply cannot be done.”
The human processes around coding have not accelerated at anywhere near the rate that AI can spit out code. Pull requests still need careful review (often more so if AI wrote the code), test suites still must run, and users still have evolving needs. A senior engineer on Hacker News echoed this, saying “all the other stuff involved in building software makes the 10× thing unrealistic in most cases.”
Importantly, others point out that the loudest “AI makes us 10× faster” claims tend to come from biased sources – tech CEOs, investors, or consultants – rather than rank-and-file developers in the trenches. There are strong incentives for startup founders to over-hype productivity (to attract funding) and for bosses to suggest huge gains (to pressure employees or justify AI investments). Thus a kind of echo chamber can form, detached from ground truth. Meanwhile, front-line engineers’ actual experiences are more “varied and much more muted in their praise” – they see AI as a useful autocomplete and sometimes a “magic” assistant, but also one that often needs you to take the wheelback when it veers off course.
To be clear, developers are seeing meaningful boosts from AI – just not an order of magnitude. Voege concedes that “AI helps with boilerplate” and routine coding, estimating perhaps a 20–50% speed-up on certain sub-tasks for many engineers. Likewise, Simon Willison, a well-known developer and AI blogger, says he is “a huge proponent of AI-assisted development” and finds that LLMs make him 2–5× more productive for the coding portions of his work.
But he immediately qualifies that coding is only a fraction of his job, so the overall productivity gain is much smaller. This sentiment is common: using an AI code editor or vibe-coding tool can often help crank out a unit test file or convert some data format in seconds, which is awesome, but it might shave only an hour off a week-long project that is bottlenecked by design discussions, integration testing, and production debugging.
So, where does this leave us? The hype has been tempered by reality: AI coding tools are best viewed as assistants that save you keystrokes and sometimes ideas, not silver bullets that remove engineering toil altogether.
The true value may lie in preventing wasted effort (by quickly retrieving solutions or generating scaffolding) rather than in simply cranking out more questionable code faster.
What the data Says: Mixed results from studies and surveys
Beyond anecdotes and surveys, 2024 and 2025 produced some rigorous studies on AI’s impact on developer productivity. These range from controlled experiments to large-scale data analyses. However, keep in mind model quality and when the studies were conducted. Let’s break down the key findings:
Controlled trials: modest speed-ups in Enterprise settings
Google’s Internal RCT (2024) – Google conducted a randomized controlled trial on ~100 of its own software engineers to measure AI’s impact using multiple in-house AI coding tools (code completion, smart paste, and a natural language-to-code assistant). The task was a realistic, “enterprise-grade” coding assignment integrating with Google’s build and test systems (adding a new logging feature across 10 files, ~474 LOC). The result: developers using AI completed the task ~21% faster on average than those without AI. The AI group finished in ~96 minutes vs 114 minutes for the control group. So, about a one-fifth time savings. Notably, this was less dramatic than some earlier studies in simpler scenarios – a point we’ll revisit. Google’s study also found that, somewhat surprisingly, the senior developers saw slightly larger gains than junior devs in this experiment. They speculate that seniors leveraged the AI more effectively on complex codebase tasks, whereas juniors might have been overwhelmed or not known how to best use it. However, the sample of seniors was small, so that could be noise. The key takeaway is that even in a high-context enterprise environment, AI tools provided a measurable but moderate productivity boost (~20%), not an earth-shattering one. The researchers also emphasized that code quality was not evaluated – so faster doesn’t necessarily mean better code, just that tests passed more quickly.
Multi-Company Industry RCT (2024) – Another large study (published via SSRN) spanned three organizations – Microsoft, Accenture, and a Fortune 100 enterprise – and nearly 5,000 developers, measuring the effect of GitHub Copilot in real work settings. It found an average 26% increase in productivity for developers with Copilot access. In practical terms, the authors frame it as “turning an 8-hour workday into 10 hours of output”. This was determined by metrics like tasks completed (pull requests merged), code written, and build success rates. Importantly, they reported no drop in code quality or increase in errors – Copilot users actually had slightly higher successful build rates, implying the AI suggestions often prevented certain mistakes. However, the benefits were not evenly distributed: “newer, less experienced developers reaped the most benefits,” seeing as high as a 35–39% speed-up, whereas seasoned developers saw smaller (8–16%) improvements. Essentially, Copilot acted like an “always-available mentor” for juniors – helping them write code they might otherwise struggle with – while senior devs used it more selectively for boilerplate and got modest gains. This contrast with Google’s finding about seniors could be due to different tasks or simply that in the wild, junior devs lean on AI more heavily. Regardless, the 26% average boost from this study is often cited as evidence that AI can significantly accelerate coding when integrated well into team workflows. (Caveat: The study was careful to control for many factors over months of usage, but one can imagine enthusiastic participants might also work differently knowing they’re in a trial.)
Upwork Freelancer experiment (2023) – A well-known earlier experiment by Peng et al. (2023) hired 95 freelance programmers on Upwork to build a web server, and found the group with access to Copilot completed the task 55% faster than the control group. Another referenced study even reported “2× faster” completion with Copilot for certain tasks. These results, while impressive, were on relatively contained tasks and often with less experienced coders. They represent something like a best-case scenario (single focused task, no legacy code, motivated participants). In real teams, you might not see such big jumps – which is exactly what the Google and multi-company trials, with more context and longer duration, confirmed (around 20–30% gains, not 50%).
On the whole, controlled experiments suggest AI can provide a notable productivity uplift (roughly 20–30% faster coding) in both enterprise and broad industry settings – when properly used. This is consistent with what I’ve observed at Google.
That’s nothing to scoff at: a quarter more output is significant at scale. But it’s a far cry from 10×, and the effect depends on context. The multi-company study’s authors explicitly note that newcomers benefit more (which makes intuitive sense – AI can help fill knowledge gaps), whereas seasoned devs adopt it more slowly and use it for narrower cases.
Perhaps the most striking controlled study came from a different angle – not showing a boost, but a slowdown:
A reality check: When AI slows experienced devs
In July 2025, a group of researchers (Becker et al. via METR) published results of a randomized controlled trial on 16 experienced open-source developers working on their own large OSS projects. These were devs with years of experience on repos >1M lines, solving real issues from their bug trackers.
The twist: tasks (~2 hours each) were randomly assigned to “AI-allowed” or “AI-disallowed” conditions for each developer, who would either use state-of-the-art AI tools (they mostly used Cursor Pro with Claude 3.5/3.7) or work completely solo on each issue. The outcome was surprising: when using AI, these seasoned devs took 19% longer on average to complete the tasks. In other words, AI made them slower. The authors dubbed it a snapshot of “early-2025 AI capabilities” in a realistic setting – and it wasn’t very flattering.
Even more intriguing was the perception gap: the developers expected AI would speed them up by ~24%, and even after the experiment, they believed they had been faster by ~20% when using AI. In reality, the screen recordings and time logs told a different story – a significant slowdown. This finding generated a lot of discussion. How could skilled devs be slower with an advanced code assistant?
The paper offers some explanations. They analyzed 20 factors and identified a few likely causes for the slowdown.
Overhead of integrating AI suggestions: Developers spent extra time verifying, debugging, and adjusting the AI’s output. The AI might produce tangential or incorrect code that then needed human correction. Essentially, “hallucinations” and missteps introduced extra cycles.
Familiarity and inefficiency in use: These devs were relatively new to the specific AI tools. Only one participant had >50 hours experience with Cursor; notably, that one experienced user did see a positive speedup, suggesting a learning curve effect. Others may have used the AI sub-optimally or gotten stuck following it down wrong paths.
Task complexity and context: The issues required deep understanding of a large codebase. AI can struggle with such context unless carefully guided. The devs possibly had to rewrite or heavily edit AI’s code to fit project conventions, eating up time.
Cognitive interruptions: Switching between one’s own thought process and the AI’s suggestions can incur “context switching” overhead. If the AI outputs something that’s partially useful but needs fixes, the dev must reconcile it, which can be slower than writing a correct solution directly (especially for experts who know the codebase).
False sense of security: Developers might accept AI output too readily and then debug for longer when it’s wrong, rather than writing a simpler correct solution. The study noted that even after experiencing the slowdown, devs still felt like AI helped – a kind of cognitive bias because the AI made things feel easier even if it wasn’t actually faster.
The METR authors are careful to say this doesn’t prove “AI never speeds up devs” – just that in this particular realistic scenario, current tools didn’t help. They acknowledge AI is evolving fast and that better prompting or more experienced users might achieve positive results. But the study is a valuable counterpoint to optimistic lab results. It underscores that the effectiveness of AI assistance varies wildly with context. Give it a newbie doing a well-defined task and it shines; give it a veteran in a messy codebase and it might slow them down.
On Hacker News, this report sparked debate. Some engineers remarked that it matched their intuition: “LLM-based coding tools seem to actually hurt programmers’ productivity [in complex scenarios]. ‘Hallucinations’ aren’t going away…they just sometimes happen to generate something usable”. Others pushed back, sharing their personal wins with Copilot or agents for certain languages or simpler tasks.
One commenter wrote: “Whether productivity is tanking or not, I will find it incredibly hard to stop using LLMs… I must note though, it might be too soon to put a mark on productivity – it’s a function of how well new technologies are integrated into processes, which happens over years, not months.”.
Another pointed out that familiarity matters: “Our one dev with >50h of Cursor experience saw a speedup – so maybe there’s a high skill ceiling to using these tools effectively”. In essence, early adopters believe things will improve as we learn to co-work with AI, but at least in early 2025, the “AI Productivity Boom” hasn’t universally materialized.
The AI productivity paradox: More Code ≠ More Productivity
Perhaps the most comprehensive look at AI’s impact on engineering came from the 2025 DORA/Faros “AI Productivity Paradox” report. This research analyzed telemetry from over 10,000 developers across 1,255 teams (using data from source control, task trackers, CI pipelines, etc.) to see how high AI adoption correlates with team and organizational performance. The findings reveal a fundamental mismatch between individual output and organizational outcomes:
Teams with heavy AI tool use completed 21% more tasks and merged 98% more pull requests – confirming that AI users tend to crank out more code and work items. However, their PR review times ballooned by 91%, creating a new bottleneck at the human approval stage. Essentially, AI let devs throw code over the wall faster, but the walls (code review, QA) then piled up higher. The report invokes Amdahl’s Law: the slowest part of the pipeline dictates overall speed. Without speeding up code review and deployment processes, the extra code just queues up waiting for humans.
AI-enabled developers indeed parallelize more: high AI teams saw devs touching 9% more distinct tasks per day and 47% more PRs per day. This indicates more context-switching and multi-threaded work. The report suggests a new “operating model” is emerging where devs orchestrate multiple AI-assisted threads of work rather than focusing on one at a time. This isn’t entirely negative – it might mean devs can juggle more things (review one AI-suggested PR while another runs tests, etc.) – but it challenges the conventional wisdom that context-switching is always bad. In an AI world, some increased context-switching might be normal as devs oversee multiple semi-autonomous efforts. Still, it can also be mentally taxing.
Code quantity vs. quality: The Faros data found code structure and hygiene might improve with AI (they observed slightly fewer code smells and higher test coverage in some cases), but bug rates actually increased. Specifically, AI adoption was associated with a 9% increase in bugs per developer and astonishingly a +154% increase in average PR size! So PRs got much larger when AI was involved, likely because AI can generate big chunks quickly. Larger PRs are harder to review and more bug-prone. Indeed, more bugs slipped through. The tooling may encourage a spray of code that isn’t fully digested by the author, putting more burden on downstream QA.
No overall acceleration at the org level: When looking at big-picture metrics (like DORA’s four key DevOps metrics of deployment frequency, lead time, change fail rate, and MTTR, as well as overall throughput), the analysis found no significant correlation between AI adoption and better outcomes at the company level. In other words, companies with lots of AI usage didn’t ship faster or more reliably than those without, once you aggregate the data. The individual team boosts were getting absorbed by cross-team dependencies and bottlenecks.
This disconnect is what Faros calls the “AI Productivity Paradox”: AI is everywhere, yet impact isn’t. By 2025, 75% of engineers use AI tools, yet most orgs see no measurable performance gains in delivery. The report offers insightful reasons why these gains haven’t materialized, crystallized into four patterns:
Adoption is very recent: Widespread usage (60%+ of devs using weekly) only took off in the last 2–3 quarters in most companies. The tooling and practices are immature; teams are basically “beta-testing” AI in real time. There hasn’t been enough time to re-engineer processes around it.
Usage is uneven across teams: Even if overall company adoption is high, it varies team by team. Some teams may be “AI super-users” cranking out code, while others are more traditional. Since software delivery is often cross-team, one fast team won’t dramatically speed up a whole project if the adjacent teams or downstream reviewers aren’t equally augmented. It’s the “weakest link” effect again.
Adoption skews toward newer engineers: The data showed new hires and less-tenured engineers use AI the most, whereas many senior engineers and veterans are using it less or not at all. (Not to confuse tenure with age or skill – this specifically means new to the company, often they lean on AI to navigate unfamiliar codebases). Seniors may be more skeptical of AI’s help on complex tasks, or simply creatures of habit. The implication is that the people designing systems and making big architectural decisions (often senior staff) are using AI least, while juniors cranking out code use it most. So the type of work AI is doing is likely more on the periphery (small feature PRs, minor fixes) than at the core architectural level. This limits the impact on major outcomes, at least for now.
Usage remains shallow (autocomplete-overdrive): Most developers are using only the most basic AI capabilities – primarily code autocomplete in the IDE. Advanced uses like integrated AI chat for troubleshooting, AI-assisted code review, or autonomous agents creating MR requests are rare. The report explicitly notes “advanced capabilities… remain largely untapped”. So, despite all the talk of “agentic AI” that can file pull requests or automatically fix bugs, the reality is that here in 2025 the typical dev just has a smarter autocomplete that occasionally writes a function for them. That’s useful, but it’s incremental. The full transformative potential of AI (if it exists) isn’t being realized because the tooling and adoption of those capabilities are nascent.
The Faros report suggests that to get real value, organizations need to deliberately adapt: invest in training developers on effective AI use, update code review practices (maybe even use AI to help review the AI-generated code), improve test automation to catch the extra bugs, and foster knowledge sharing of successful AI workflows.
A handful of “rare companies” were seeing tangible performance gains, and they were the ones that treated AI not as a plug-and-play gadget, but as a strategic initiative with “five enablers – workflow design, governance, infrastructure, training, and cross-functional alignment” backing it. In plain terms: If you don’t change your development process and upskill people, throwing AI in the mix might just create faster chaos, not faster delivery.
Where AI helps the most (and least)
The utility of AI coding assistance can vary dramatically depending on the scenario. Let’s break down use-cases where developers are seeing clear gains versus areas where AI still struggles or even hinders:
✅ Greenfield projects & prototyping: When starting a new app or feature from scratch (with little existing code or legacy constraints), AI can be a turbocharger. Developers often report that “vibe coding” – letting the AI generate a substantial initial codebase or component – works best in greenfield situations or throwaway prototypes. The AI is less likely to conflict with established patterns because there aren’t any yet, and the cost of mistakes is lower. One engineer described the first time using AI on a new project like “sipping rocket fuel” – you get a burst of speed early on. Boilerplate for common frameworks (spinning up a React frontend or a basic Express server) is done in seconds. In hackathons or early-stage startup projects, some developers essentially use AI as an extra pair of hands to churn out a minimum viable product quickly. The benefit here is psychological as much as practical: AI-generated code can help you iterate faster, since you can quickly scaffold something, run it, and then refine. As Voege noted, it’s good at the generic stuff (especially in well-trodden domains like JavaScript/React). So you feel super-productive initially. However, even in greenfield work, AI might not architect the solution optimally – it’s great for stubs and examples, but a human still needs to guide the overall design.
✅ Boilerplate and repetitive code: Perhaps the most agreed-upon strength: AI excels at writing the boring bits. This includes things like: unit tests that follow a pattern, boilerplate CRUD methods, converting one data structure to another, writing serialization/deserialization code, glue code between APIs, etc. If you have examples to imitate, AI will mimic them. David Cramer (engineering leader at Sentry) gave a practical tip – if you have to write new tests similar to existing ones, generate them with AI to save time, then “dive in and change what you need to”. This speeds up the rote parts of coding. Similarly, routine functions (parsers, format converters, simple algorithms) can be knocked out quickly by prompting the AI. The key is that you as the developer know exactly what needs to be done and roughly how – you just let the AI fill in the syntax and edge cases. Many devs report significant time saved not having to search Stack Overflow for that one-off regex or not having to manually write dull boilerplate. It’s like having an encyclopedia and snippet library on tap.
✅ Documentation and learning: An underrated use of AI in development is as a learning and explanation tool. Over 44% of devs learning a new language or tech in the past year used AI help to do so. Tools like Cursor can explain code, translate one language to another, or answer “how do I do X in framework Y” much faster than combing documentation. When encountering a new API, developers often paste an error message or function signature into the AI chat and get a quick explanation or sample usage. This accelerates the “research” phase of coding. Rather than scouring Google and skimming docs for 30 minutes, an AI might give you the gist in 30 seconds (sometimes even with a runnable example). This isn’t direct “code productivity” but it reduces time spent stuck or reading manuals. The Stack Overflow survey indicates searching for answers is the #1 use of AI tools by devs – essentially AI as a smart assistant for Q&A. That said, distrust of AI answers is high (for good reason – they can sound confident but be wrong), so developers are double-checking anything important. But as a supplement to official docs, AI chat can be a great tutor or rubber duck.
✅ Onboarding to codebases: New hires or contributors unfamiliar with a large codebase have found AI assistants helpful in navigating and understanding the code. Because AI models can ingest a lot of context you can do things like: ask “what does this module do?”, “summarize how data flows from class A to B”, or “where in the code is the logic for X handled?” and get pointed in the right direction. The Faros report noted that newer engineers lean on AI to navigate unfamiliar code and accelerate early contributions. This suggests a good use-case: easing the steep learning curve of complex systems. Instead of constantly pestering senior team members with questions, a junior dev can ask the AI and often get useful answers (again, with caution about accuracy). Even something like, “generate a quick example usage of internal library Z from our repo” can give a template to work from. AI won’t have the true architectural understanding a senior dev has, but it can index the codebase and surface relevant bits quickly. In essence, it’s like an interactive documentation/search tool for your own code. This can save time and help newer team members become productive faster – a legitimate productivity gain at the team level if it shortens onboarding time.
✅ “Hands-on” debugging aids: We are seeing developers use AI as a debugging assistant in creative ways. For instance, some will paste a stack trace or error log and ask the AI for likely causes or solutions. Others have started to integrate AI into their monitoring – e.g., feeding an issue or bug description to an internal AI agent (like Sentry’s MCP tool that David Cramer mentioned) to get analysis on what might be wrong. One HN user described a workflow: “when I get a well-written bug report or detailed logs, my instinct is to feed it to an agent and let it figure it out in the background while I work on other things”. They claimed this parallel approach often surfaces insights or even fixes. This hints that AI’s value isn’t only in writing new code – it can also help understand existing broken code. These uses can trim down debugging time, which is historically a huge part of development. However, this area is still emerging; AI can misdiagnose issues too, so it’s another tool in the toolbox rather than a magic debugger.
On the flip side, situations where AI assistance struggles or backfires:
❌ Large, complex legacy codebases (brownfield): In mature enterprise codebases with lots of domain-specific context, custom patterns, and interdependent components, AI often flounders. Developers note that an AI might write code that doesn’t fit the existing architecture or misses subtle requirements, causing integration headaches. Colton Voege pointed out that AI “is not good at keeping up with the standards and utilities of your codebase” and tends to fail if you use non-mainstream libraries. It might call APIs that almost do what’s needed but not quite, or use outdated approaches. In such environments, integrating an AI-generated piece can take as long as writing it manually because you must rework it to match the codebase’s idioms. David Cramer’s experiment at Sentry, where he tried to rely 100% on agents to build a real service, ended up confirming this: “you cannot use these agents to build software today… they don’t replace hands-on-keyboards. Most importantly, they don’t replace engineering.” He found that for non-trivial new features in a complex system, the agent kept producing “absolutely unmaintainable” code or got stuck, and he eventually had to “hit eject” and do it the traditional way. The AI could generate lots of code, but it wasn’t the right code. Duplicate code, unused code, and incorrect abstractions were common when the agent tried to extend a complex project. This highlights that in brownfield development, deep understanding of the existing system is critical, and AI doesn’t truly understand – it guesses based on patterns. Until we have AI that can deeply ingest and reason about millions of lines of bespoke code (and maybe have the product context), human engineers will still be needed to ensure coherence and correctness in large systems. Thus, the productivity boost in brownfield scenarios is much smaller – some engineers estimate only a 10–30% speed-up at best in these cases, and sometimes a slowdown if the AI suggestions lead you astray.
❌ “Agentic” autonomous coding: 2025 saw a lot of hype around coding agents – AI that can iterate on its own, e.g. writing code, running tests, reading the results, and refining. In theory, you could tell an agent “build me X feature” and it will write code, compile, test, fix bugs, and so on with minimal intervention. In practice, as Armin Ronacher documented in “Agentic Coding Things That Didn’t Work,” these workflows are fragile and often more trouble than they’re worth. Armin enthusiastically tried features like slash-commands to automate tasks, background hooks, and “YOLO” modes that let the AI run wild on his codebase. He ultimately abandoned most of these complex setups. Why? They didn’t consistently yield good results and added complexity to his workflow. “Most of my attempts didn’t last… I ended up doing the simplest thing: just talk to the machine more, give it more context… That is 95% of my workflow.” In other words, all the fancy autonomous behaviors were less useful than a straightforward interactive chat with the AI. He would dictate or write what he wanted in detail (often via speech-to-text to be more verbose) and guide the AI step by step. The agent features either failed or he forgot to use them. This sentiment is echoed by many: current coding agents are cool demos, but in day-to-day use they can go off the rails and require constant babysitting. They might run the wrong command, misunderstand a test failure, or get stuck in loops. So, fully hands-off coding is not reliable in 2025. It’s still a human-in-the-loop game, where the human provides direction and judgment. The most effective “automation” remains partial – e.g., using AI to autofix simple lint errors or generate a PR draft, but not expecting it to deliver a shippable feature without human oversight. As Cramer concluded after his two-month agent experiment, “I wasted three days trying to get the agent to design a feature I could’ve done in an afternoon… what matters is simply: you cannot use these agents to build software today.” He emphasizes that AI in its current form will not replace the keyboard or the need for engineering skill. Instead, its value is in augmenting engineers, not substituting them.
❌ Unvalidated “almost-right” code: This ties to the trust issue. When an AI produces code that looks plausible, there’s a temptation to accept and run with it. But as 66% of devs noted, almost-right can be worse than wrong. An obviously wrong answer you’ll discard immediately, but an almost-correct snippet might slip through and later cause a subtle bug. This leads to scenarios where developers unknowingly introduce issues or technical debt by over-relying on AI output. For example, one might use an AI-generated algorithm that works on typical cases but fails on edge cases – and if the dev doesn’t thoroughly test it (trusting the AI got it right), that bug goes to production. Or AI might use a deprecated function that mostly works but breaks in a future update. Without vigilance, AI can actually decrease code quality. The Faros study’s finding of increased bug density on AI-heavy teams underscores this. So any productivity gain from writing code faster could be wiped out by time spent fixing the resulting bugs. Many teams have learned to treat AI suggestions with the same scrutiny as code from a junior developer: useful, but must be reviewed line by line. This of course eats into the productivity gains. As one senior dev on HN noted, “I wouldn’t use it to write anything lengthy… overall it has improved my productivity, though I could see how it might hurt junior engineers [who trust it too much].” The key is that experience is needed to validate AI output. Inexperienced devs who blindly trust AI may produce code faster, but potentially worse code – leading to a net negative productivity once you account for QA and maintenance. This is why some in the community worry that AI assistance could become a crutch that impedes learning (junior devs might cargo-cult AI code without understanding it). The optimists argue it’s akin to Stack Overflow – you still have to know enough to integrate the answer. Regardless, blind trust in AI is a recipe for trouble; successful use requires a healthy dose of skepticism and old-fashioned testing.
❌ Tasks requiring creative insight or novel solutions: Language models are fundamentally pattern mimickers. When faced with a truly novel problem that doesn’t map well to known examples, AI tools often flail or produce very generic suggestions. For instance, designing a new algorithm or inventing an architecture for a brand-new paradigm – these high-level creative engineering tasks are not something current AIs excel at. They can help by brainstorming or enumerating options (which might inspire the human), but they are unlikely to produce an innovative solution outright. Thus, for the most intellectually challenging parts of engineering – deciding what to build, why and how at a conceptual level – human engineers are still very much in the driver’s seat. The code assistants come into play more in the later stage of how to implement this logic in syntax. So one could argue AI hasn’t changed the nature of software design; it’s just sped up the mechanical aspects of coding. Real productivity leaps would require AI to contribute at the design/problem-solving level, which we aren’t seeing yet except in trivial ways.
In summary, AI currently shines for well-defined, repetitive, or insulated tasks – writing boilerplate, tests, simple functions, and answering how-to questions. It falters in situations requiring holistic understanding of large systems, creative problem-solving, or strict correctness and maintainability. This delineation suggests why startups and individual projects might feel a bigger benefit (they can afford to move fast and break things with AI-generated code), whereas big mature products can’t tolerate mistakes as easily and thus can’t unleash AI without caution.
Managing developer attention with AI suggestions
A recurring design challenge is managing developer attention. When assistants surface too many suggestions, utility saturates and developers start to ignore them. Chen et al. observe that higher‑frequency suggestion modes led participants to copy suggestions into code less often despite productivity gains - see the Chen et al., 2025 paper. In program‑design workflows, developers also struggled to keep up with LLM‑originated changes and experienced information overload, underscoring the need for careful gating of what to show and when such as in the Zamfirescu‑Pereira et al., 2025 paper and ACM DOI.
Actionably, future agents should time and target interventions based on what the developer is focused on. That includes monitoring context such as current activity in the IDE and recent interactions to offer well‑timed, context‑aware suggestions rather than a constant stream. Chen et al., 2025 recommend showing contextually relevant information and timing suggestions based on user workflow. Empirically, Pu et al in their paper show that subtask‑boundary heuristics program execution, code‑block completion, and user comments were effective triggers, while some signals for idleness and code selection created false positives and disruptions.
Bottom line for the write‑up. Emphasize that proactivity pays off when it reduces intent‑expression and interpretation effort, especially in debugging and refactoring, but it must be attention‑aware. Favor interventions at subtask boundaries, allow users to tune frequency, and defer to the engineer’s flow during implementation.
Adapting workflows: How developers are integrating AI
To harness AI effectively, developers are evolving their workflows and tools. Some notable trends and best practices emerging in 2025:
“AI Pair-programming” via chat: Rather than relying solely on inline code completions, many developers keep an AI chat window open as they work. They treat it like a colleague they can rapidly iterate with. For example, they might paste a function and say “hey, can you refactor this to use approach X” or “find the bug in this code” or “write unit tests for this function.” This interactive approach often yields better results than expecting the AI to do everything autonomously. As Armin Ronacher noted, the simplest and most effective use of these tools is just to talk to them more. He even uses voice input to stream-of-consciousness describe what he wants, because speaking can be faster than typing and encourages providing more context. The AI then responds with code or answers. This conversational coding style is becoming more common, especially with the advent of voice-enabled coding assistants and editor plugins. It’s like having a rubber duck that talks back with suggestions. The benefit is you can guide the AI step by step, rather than letting it guess the whole solution. This mitigates misunderstanding and lets you course-correct in real time. Tools like Cursor or VS Code Copilot Chat allow highlighting code and asking questions or for modifications, which fits naturally into a developer’s flow. The takeaway: treat the AI as a collaborator, not an autonomous coder. Continuous back-and-forth yields better outcomes than one-shot prompts.
Personal prompt libraries & reusable recipes: Developers who regularly use AI are building up a set of “prompt patterns” or semi-structured commands that work well for their needs. For instance, a prompt to “explain this code”, a prompt to “optimize this function without changing its API”, or a prompt to “generate a SQL query for X based on these tables.” Some IDE plugins let you save these as shortcuts (Armin experimented with slash commands for common tasks). While he found many of them ended up unused, the idea of having custom AI commands is still compelling for some – e.g., a /doc command that when you highlight a function, it generates documentation comments for it. Or /tests to scaffold test cases for a given code snippet. Armin discovered limitations in implementation (lack of parameterization, etc., which frustrated him), but the concept may improve as tooling matures. Even without formal slash commands, some devs keep a text snippet of their favorite prompts to copy-paste when needed (for example, the precise phrasing to get an AI to produce output in a desired format). This is analogous to having shell scripts or editor macros – you learn how to “code” the AI with prompts and reuse what works.
AI-aware code reviews: One interesting development is engineers beginning to use AI during code review. For instance, if they receive a pull request (possibly AI-generated or not), they might ask an AI to summarize the changes or identify potential issues. GitHub has started previewing an “AI-assisted code review” that will highlight risky code or suggest improvements. While still early, this could help deal with the onslaught of larger PRs from AI-generated code. It’s almost a necessity: if AI doubles the code written, to avoid review becoming the bottleneck, perhaps AI needs to aid in review too. Some developers already manually use AI chat: paste a diff and say “review this diff for any bugs or style issues.” The AI might catch things or at least provide a second opinion. However, caution is required – an AI code reviewer might miss context or enforce pedantic rules. But this area will likely grow, as it directly addresses the slowest link (human review) that Faros identified. If AI can pre-filter changes and auto-approve trivial ones (some companies already auto-merge PRs below X lines or with low risk), that could free humans to focus on complex changes, boosting throughput.
Smaller, incremental changes (batch size adjustments): Some teams are adjusting their practices to better accommodate AI. One tactic is encouraging smaller, more incremental commits/PRs when using AI. Since AI can spew a lot of code quickly, there’s a temptation to do a huge change in one go. But that leads to the 150% larger PRs and long reviews. Instead, savvy devs are learning to break tasks into smaller sub-tasks, get AI to help with each, and commit in pieces. This ties into an old best practice (small batches) that becomes even more important with AI output. Smaller AI contributions are easier to verify and less likely to introduce big bugs. It’s the principle of keeping the human in control by not letting the AI run away. For example, instead of “Implement the payment system” in one shot, do “Implement the payment API client” (review it), then “Implement the payment processing function”, etc. This also helps psychologically; the dev stays engaged and doesn’t lose track of what the AI is doing.
Investing in tests & tooling: A pattern emerging in teams using AI is doubling down on automated testing and CI quality gates. Since AI code may have unknown flaws, having a robust test suite is your safety net. Some companies require that any AI-generated code must come with tests (often AI-written tests!) to ensure it’s exercised. Others run new static analysis or AI-driven analysis on the code to catch issues. Essentially, if AI increases the volume and velocity of code, the verification and validation steps must catch up too. So, teams are adding more linting, more types (in typed languages), and more checks to avoid regressions. This isn’t glamorous productivity stuff, but it’s necessary to actually realize net gains. If AI lets you write code 30% faster, but you have no tests and spend an extra 40% of time debugging in production, you lost the game. Smart teams realize this and shore up their quality pipelines. In a way, AI is forcing better engineering discipline: you can’t rely on intuition that the code is right if you didn’t write it fully yourself – you write tests to be sure.
Knowledge sharing and training: Developers are learning tips and tricks from each other on how to coax the best out of AI. Internal brown-bags or Slack channels dedicated to AI tools are common now in companies. People share prompt techniques (“Ask it like this and it will include import statements correctly” etc.), or warn about pitfalls (“Don’t use it for X, it always messes up thread safety”). Treating AI proficiency as a skill that can be taught and learned is important. Google’s study and others noted how experience with the tool mattered: one person with a lot of AI usage under their belt performed better. So ramping everyone up on AI “literacy” can improve overall team productivity. We’re seeing new roles or informal leads for this – e.g., an “AI champion” on a team who stays updated on the latest features (like VS Code adding a new AI refactoring command) and helps teammates use them.
In essence, teams that get value from AI are those that treat it as an evolving capability to be managed, not a magic box. They iterate on how they integrate AI into their dev process, much like adopting any new tool. We’re at an interesting juncture where even seasoned engineers are somewhat “junior” at using AI tools – there’s a learning curve, and those who climb it reap more benefits.
One overarching theme: keeping the human in charge. All these workflow adaptations – from interactive prompting to extra testing – are about channeling AI’s strengths while mitigating its weaknesses, under human guidance. As David Cramer aptly advised fellow engineers:
“Ignore the claims about vibe coding and claims that you don’t need to know how to write code. Instead look for ways to augment what you do. Those tests you need to write for that new API route? They look awfully similar to those other tests: so generate them.”
He emphasizes that AI won’t replace the craft of engineering, and that’s okay. You still need to understand code – AI just helps you generate and verify it faster. And if using AI makes you miserable (some devs find “vibe coding” dull because it takes the fun out of writing code), it’s okay not to push it to the max. “It’s okay to sacrifice some productivity to make work enjoyable,” Colton Voege reminds, noting that forcing yourself to code in a way you hate (whether that’s writing everything by hand or wrangling an AI for every line) can lead to burnout and worse outcomes. In other words, productivity isn’t everything – developer happiness and creativity matter too, and there’s a balance to strike.
Proactive AI Agents for debugging and refactoring
Recent studies suggest that proactive AI agents are particularly useful during debugging and refactoring. In a CHI 2025 study of a proactive coding assistant, participants engaged with the AI most often during implementation 38.2 percent of all AI interactions and debugging 26.4 percent, with lower rates for analyze, design, organize, and refactor stages. This reveals a general trend to favor proactive intervention for implementation and debugging phases. Source Pu et al., 2025 paper and ACM DOI.
Participants also reported that increased AI proactivity led to higher efficiency, while prompt‑only tools demanded more effort to use. One participant contrasted the proactive modes with a prompt‑only baseline saying they “had to keep on prompting and asking.” Pu et al., 2025 paper. Independent work from Chen et al. found similar pain points with purely reactive chat assistants, with participants noting “I really wasn’t sure what to ask for with the non‑proactive chat.” Chen et al., 2025 paper, Microsoft Research summary and ACM DOI.
Critically, proactive suggestions were perceived as least disruptive during debugging and refactoring, and most disruptive during implementation. In Pu et al., disruptions clustered in implementation 32.7 percent of disruptions vs. far fewer in debugging 7.27 percent and refactor 1.82 percent, reinforcing that proactivity should be applied thoughtfully: welcome during “fix” or cleanup phases, restrained during heads‑down feature work. Pu et al., 2025 paper.
What’s the engineering leadership perspective?
The LeadDev AI Impact Report 2025 surveyed 883 engineering leaders across the US, UK, Europe, and beyond to capture how AI tooling is changing teams and process.
While 59% of leaders report feeling more productive with AI tools, the reality is more nuanced. Automated code generation dominates usage at 48%, followed by summarization (39%) and documentation (36%), while critical areas like code review (17%) and testing (7%) lag significantly. The strongest perceived gains appear in very small teams (fewer than 5 engineers), where 59% cite improvements exceeding 10%. However, 60% of organizations cite the lack of clear metrics as their biggest AI-related challenge, with only 18% currently measuring impact systematically.
Conclusion: A clear-ryed, data-driven perspective
By now, the consensus among many software engineers – especially the characteristically skeptical Hacker News crowd – is that AI coding tools provide useful boosts but not miracles. The data backs this up:
Adoption is high and growing, because these tools do help developers work faster or reduce tedious work.
Productivity gains in the 20-30% range are being observed in controlled settings and some real teams. That’s significant, but it’s a linear gain, not exponential.
Individual experiences vary: novices might feel supercharged and reach near-senior output on certain tasks, while some veterans find the AI more distraction than help and proceed cautiously.
Trust is a major issue, with nearly half of developers not trusting AI’s output. This lack of trust is warranted given the tendency of models to err – yet ironically, developers can be overconfident after using AI, as the METR study showed. Navigating between over-reliance and under-utilization is the new skill to master.
Output ≠ Outcome: Without adapting processes, more code faster can just mean more code waiting in review or more bugs to fix. Organizations are learning that they must invest in how AI is rolled out (training people, updating workflows) to see a true productivity payoff.
So, is AI making engineers more productive? Yes, but modestly and unevenly. Forget the splashy “10x” headlines – the reality is a story of incremental improvements and second-order effects. It’s telling that the biggest benefits cited are often qualitative: e.g., developers feeling less mental load on grunt work, being able to focus on higher-level problems while the AI handles boilerplate, or learning new tech quicker with AI assistance. These are real improvements to the developer experience, even if they don’t neatly show up as 1000% more output.
From a manager or CTO perspective, the message is to be realistic. If your engineers report a 30% productivity gain with AI, that’s actually in line with the best studies – be happy with that, and skeptical of any claims far above it without extraordinary proof. Also, look at where that 30% is coming from: it might be 30% more code written, which as we’ve discussed doesn’t automatically equal 30% more value delivered. Monitor your bug rates, review times, and developer satisfaction alongside raw output.
For engineers themselves, the advice is pragmatic: experiment with these tools, keep what works, discard what doesn’t. There’s a lot of noise and “grift” in the AI tooling space, so focus on concrete improvements in your workflow. If using an AI assistant helps you write your code more efficiently or enjoyably, great – use it. If it sometimes slows you down, figure out why (are you trusting it too much? Using it on the wrong problems? Spending too long crafting the perfect prompt?) and adjust accordingly. It’s a learning process for everyone.
Crucially, continue honing core software engineering skills. AI might change the nature of coding over time, but in 2025 it’s clear that understanding how to design a system, how to debug, how to test, and how to maintain code are still vital. In fact, those skills become more important when an AI is doing the easy stuff, because the human must handle the hard stuff. As Cramer wrote, “they don’t replace engineering” – AI won’t turn a bad programmer into a great one, but it can make a good programmer faster. Think of it like power tools: a nail gun lets a skilled carpenter frame a house faster, but an unskilled person with a nail gun can also just make a big dangerous mess faster. The skill and judgment remain paramount.
So to the question likely on every engineer’s mind: Will AI take my job or make my role obsolete? The data so far suggests no – but it might change your job somewhat. Developers are still needed to conceive ideas, break down problems, review AI’s work, and ensure the final product meets real-world requirements. AI is not replacing those creative and analytical parts; it’s just shaving some of the manual labor off the edges. A majority of developers (64% still feel secure) that AI isn’t a threat to their jobs, though that confidence wavered slightly this year. The best approach is probably to embrace the tools and make them complement your skills. In other words, be the engineer who’s 1.3× as productive with AI, rather than the one who refuses to use it and falls behind. Even a pessimist can appreciate a 30% speed-up on the dull parts of coding.
Finally, a note on mindset: The initial magic of AI coding wears off, and what’s left is figuring out how to incorporate this capability sustainably. We’re after the honeymoon phase now – on the ground, assessing what these tools can really do for us. And the picture is not one of a revolution rendering programmers irrelevant, but of a gradual evolution in how programmers do their work.
The canonical, comprehensive take at this point is: AI coding tools are helpful assistants that, when used wisely, can make engineers moderately more productive and happier by automating some drudgery – but they are not a substitute for human insight, and they introduce new challenges (verification, coordination) that must be managed.
In short: The future of coding is likely human+AI, not AI-alone. Embrace the helper, but keep your hands on the wheel and your engineering fundamentals sharp.
That’s how you’ll truly reap the productivity gains without getting lost in the hype.
I’m excited to share I’m writing a new AI-assisted engineering book with O’Reilly. If you’ve enjoyed my writing here you may be interested in checking it out.











Finally a realistic view on using AI tools for coding.
P.S. you should fix the link for your new book as it contains "," rather than "."
Very interesting!! Though two things come to mind: 1. The results discussed here focus on output, and not experience. Is there value if devs ENJOY coding more with AI, even if no productivity gains? 2. Free time whilst the agent executes - this can be a cumulative gain but isn't strictly part of the implementation time.