Treat AI-Generated code as a draft
Keep human eyes, judgment, and ownership at the center of AI written code
tl;dr: Treat AI-generated code as a draft. It can write the first version, but never outsource the reading. No human review means no reliable trace from behavior back to intent. When you stop reviewing AI drafts, you stop knowing why the code works at all. Practically, hold AI-written code to the same standards as human team mates.
Never outsource the reading - always review AI’s first draft
AI can write a first version of code, but humans must do the reading and reviewing to ensure intent and quality.
If you stop reviewing AI-generated drafts, you stop knowing why the code works (or if it truly does) – there’s no reliable trace from behavior back to intent. In other words, LLMs don’t ship bad code, teams do. When no one takes responsibility for checking AI-written code, bad code slips through not because the model failed, but because the workflow failed to demand a higher standard [1].
Treat the AI’s output as untrusted input – it might be syntactically correct and even pass tests, but it hasn’t earned your trust until a human verifies it. AI models often produce plausible-looking but subtly flawed code, including hallucinated functions or insecure patterns [2]. So never merge code that hasn’t been read and understood by a human. As one engineer put it, blindly trusting AI output without verification risks immediate bugs and “systematically degrades our ability to catch these errors” because the very skills needed to validate code atrophy from disuse [3].
In short, always insist on a human-in-the-loop: AI can draft, but only a human can ensure the code’s behavior matches the intended purpose.
Blind reliance on AI erodes critical thinking and skills
Engineering leaders worry that developers who blindly accept AI-generated code will lose their critical thinking abilities.
The concern isn’t hypothetical – early research bears it out. Studies have found that heavy use of AI assistants correlates with lower brain engagement and reduced critical thinking performance [4]. In practice, developers dependent on AI may skip fundamental tasks like reading documentation or debugging errors themselves. One veteran engineer confessed that using AI’s instant answers made him “worse at his own craft.”
He stopped reading docs (“why bother when an LLM can explain it instantly?”) and even stopped analyzing errors – instead, he’d copy-paste stack traces into the AI and paste the AI’s answers back into the code. “I’ve become a human clipboard,” he lamented [5]. This kind of cognitive offloading means the developer isn’t reasoning through problems anymore; the AI is doing the thinking, and the human is just transcribing. The result is not only diminished skill, but also less vigilance – if developers assume the AI is always right, they may miss subtle bugs or security issues they would have caught before. In fact, the ease and polish of AI output can lull engineers into a false sense of security, lowering their skepticism during reviews [6].
The irony is that AI was supposed to boost productivity, but over-reliance can make individuals less capable. “We’re not becoming 10× developers with AI, we’re becoming 10× dependent on AI,” as one author observed – trading long-term understanding for short-term speed [7]. The takeaway: to maintain your engineering sharpness, you must stay intellectually engaged with the code. Use AI as a tool, not a crutch – always challenge and verify its solutions rather than accepting them blindly.
Skipping the learning process in favor of speed hurts growth
Many teams have leapt straight into using AI for speed, bypassing the learning and understanding that should accompany its use.
The promise of AI coding tools is high velocity – generate, generate, generate – but this often comes at the expense of developers truly grasping what they’re building. When you rely on AI to write code you don’t fully understand, you are skipping the essential learning process that makes you a better engineer [8].
The mistakes, trial-and-error, and research that traditionally accompany coding aren’t just hurdles – they are the training ground where critical skills develop. By outsourcing the heavy lifting to AI, junior devs in particular may never acquire the depth of knowledge to assess or improve the code being produced.
This creates a vicious cycle: you produce poor code because you use AI without experience, and you never gain experience because you keep using AI[8]. As one commentator bluntly asked, if your role is reduced to just prompting AI for code you don’t understand, what value are you adding?[9].
We’ve largely skipped the phase where AI could be used as a learning aid or tutor, and jumped straight to using it as an auto-coder for output. Ideally, developers would use AI to improve understanding – for example, asking an AI to explain a tricky piece of code, or to suggest why a solution works – and even do a local “self review” with the AI before handing code to others. But in practice, many are just hitting “accept” on suggestions and moving on. This means they might deliver a feature faster, but with only shallow knowledge of how it works or why certain patterns were used.
Over time, that lack of understanding accumulates into a serious skill gap. Senior engineers worry about newcomers who can pump out code with AI assistance yet struggle to debug or extend it, because they never learned the underlying concepts. Indeed, engineering leaders report that while juniors now ship features faster than ever, when something breaks “they struggle to debug code they don’t understand” [10].
The craft of software engineering is about far more than producing code that runs – it’s about knowing why the code is written that way, and how to evolve it. If we sidestep that journey, we risk creating a generation of programmers who can only operate with an AI on autopilot. To counteract this, treat AI output as an opportunity to learn: don’t just copy-paste answers, read them, question them, and ensure you could explain them to a colleague. Use AI to accelerate your work, not bypass your growth as an engineer [11][12].
I’ve heard of seniors sending back PRs where its clear AI was used but the person didn’t understand what they were doing. When a junior submits an AI-generated PR, the review becomes the primary venue for mentorship. Ask Socratic questions that force them to explain the AI’s output. This ensures understanding, not just functionality. Reviews become about comprehension, not just correctness.
Code reviews are straining under AI-generated code
Traditional code review practices are struggling to cope with AI-generated code, leaving teams unsure how to maintain quality.
Code reviews have always been the safety net for catching errors and ensuring code quality. But AI assistance changes the game: AI can produce much larger diffs in an instant, often touching many lines or files, which means reviewers face more volume and potentially more complexity in each pull request. In fact, studies found that pull requests heavy with Copilot-generated code take about 26% longer to review on average, because reviewers must untangle unfamiliar patterns and double-check for AI-specific mistakes [13].
Reviewers also report a psychological effect: when examining code they didn’t write, especially if it’s syntactically polished, their confidence drops – they take longer to validate logic and may second-guess their understanding [14]. AI can churn out code that looks clean and modern (consistent naming, proper formatting) which can lower reviewers’ skepticism [6]. It’s easy to assume the code is sound if it “looks professional,” making it more likely that subtle bugs or design flaws slip through.
Another complication is lost intent. In a traditional review, the reviewer can discuss “what the author meant to do” – there’s a human intention to compare against the implementation. With AI-generated code, the code’s author might not fully grasp the intent behind every line, because they didn’t write it in the conventional sense. The original prompt given to the AI is essentially the spec, but reviewers often don’t see that prompt [15]. This means a reviewer is left guessing at the requirements and whether the AI’s solution actually meets them, rather than just reviewing whether the code works.
As one report noted, reviewers are no longer assessing what the developer meant to do, but rather what the model actually did [16]. Traditional code review checklists (focused on style, obvious logic errors, etc.) aren’t enough, because AI code can fail in non-traditional ways – e.g. using an outdated algorithm that a junior dev wouldn’t know, or introducing an edge-case bug that isn’t immediately obvious.
Teams are also encountering review overload. An AI pair programmer can generate code faster than a human, which means a single developer can open very large pull requests or many pull requests in short time. This “velocity” can overwhelm the team’s capacity to give thorough reviews. It’s akin to slop in code form – flooding the reviewer with so much output that it’s hard to pinpoint the issues [17]. In such cases, some organizations have instituted new policies: for example, if a PR is more than 30% AI-generated (by lines or content), it might trigger a required extra level of review or a more senior reviewer [18].
The idea is to acknowledge that AI-heavy code needs different scrutiny levels, not business-as-usual. Another emerging practice is labeling AI contributions: explicitly marking in the pull request or commit message that “this code was assisted by AI.” This can cue reviewers to be extra vigilant. Indeed, experts recommend tagging and tracking AI-generated code for accountability – it helps reviewers know what to look for and helps teams trace bugs later (“was this bug from AI-written code?”)[19].
However, openly tagging AI involvement comes with a cultural challenge: developers must feel psychologically safe to disclose AI usage. If people fear judgment for using AI (“will my team think I’m lazy or less competent?”), they may hide it – and that’s worse for the team. Hidden AI usage means the team doesn’t know where potential risk lies and can’t adjust their reviews accordingly [20]. To counter this, forward-thinking teams encourage transparency without stigma.
Using AI should be treated like using any tool – it’s fine to use it, but you must own the output. As one guide put it, never blame the AI for bugs or quality issues; the engineer who committed the code owns it, period [21]. If everyone embraces that mindset, then saying “I used Cursor to help with this module” is simply a factual statement, not an admission of guilt. It allows the team to collectively ensure the AI-generated sections get proper attention.
Right now, our code review tools and norms are still catching up to these needs. We don’t yet have widespread automated detectors for AI code in PRs, and most diff viewers don’t show the AI’s prompt or reasoning. So, we need to rely on process and team agreements to fill the gap – explicitly calling out AI-written code, reviewing tests more rigorously, and possibly setting size limits to what we’ll accept from an AI without breakpoints for human review.
If questionable code is making it past PR unchallenged, the issue is not just AI – it’s that the review process isn’t robust enough to catch these problems [22]. It’s a call to action that code review practices must evolve alongside AI adoption.
66% of developers in a Stack Overflow survey said the most common frustration with AI assistants is that the code is “almost right, but not quite. 45% of developers in the Stack Overflow survey reported that time spent debugging AI-generated code was their biggest time sink. Quality gates and validation are now the critical path
Best practices: treating AI-generated code as a draft
To use AI coding tools effectively, we must adjust our habits and processes. Think of AI output as a first draft from a junior developer – valuable, but in need of careful review and refinement. Here are some pragmatic best practices to ensure that AI-generated code boosts productivity without sacrificing quality or understanding:
Never merge code you don’t understand. If an AI helped produce some code, the onus is on you (the developer) to read every line and make sure you get it. You should be able to explain what the code does and why. If there’s any part of the AI-generated snippet that you can’t follow, treat that as a red flag – either refine the prompt, have the AI explain it, or rewrite that part yourself. Some open-source projects explicitly require that contributors certify they understand the code they submit, even if AI wrote it. In professional settings, the same principle applies: take full ownership of any code you commit, regardless of who (or what) authored it[21]. In practice, this means running the code, writing or reviewing tests for it, and stepping through its logic before it ever hits your team’s repository.
Treat AI code like an intern’s code – don’t trust, verify. AI doesn’t possess context or wisdom; it’s more like a very fast, eager junior developer. It will confidently produce a solution, but that solution might be overly simplistic, miss edge cases, or use patterns that are out of place for your codebase. As a best practice, approach AI contributions with healthy skepticism. Check boundary conditions, look for off-by-one errors, thread safety issues, or other corner cases that a less-experienced coder might overlook [23][24]. Often, AI will do exactly what you asked, not necessarily what you truly need. So cross-verify the output against the requirements. If it’s a complex or critical piece of code, consider manually reimplementing it after seeing the AI’s draft – you might catch nuances the AI missed. Remember the mantra for AI output: “Don’t trust. Verify.” [25]
Use AI as a coding assistant, not an author – incorporate it into your own thinking. Instead of just asking AI to spit out code and blindly pasting it, use it in a conversational, explanatory way. For example, you can ask the AI to explain the code it just suggested, or to generate comments for it. You can have it suggest test cases for the code, which you then run to see if the code truly works. AI can also help by summarizing a large diff or identifying potential problem areas in a PR (some advanced code review tools now offer AI-generated summaries). All these uses keep you, the human, in the driver’s seat. You’re leveraging AI to augment your understanding, not replace it. One recommended practice is to review tests first for AI-generated changes [26] – ensure there’s a solid test suite covering the new code. If tests are weak or missing, that’s your cue to write more before trusting the code. Also, use strict linting and static analysis on AI code: AI might not follow your team’s idioms out-of-the-box, so enforce style and architecture rules with automated tools [27][28]. If the AI suggests something that doesn’t fit your usual patterns, don’t hesitate to refactor it. Essentially, make AI your pair programmer who writes draft code and gives ideas, but you still make all final edits and decisions.
Thoroughly test and secure AI-generated code. It’s crucial to apply the same (or higher) level of testing to AI-written code as you would to handmade code. Write unit tests and integration tests to cover the functionality. Specifically look for edge cases and potential failure modes – AI is notorious for handling the “happy path” but ignoring unusual inputs or error handling. Also consider security: common vulnerabilities like SQL injection, XSS, insecure deserialization, etc., might slip in if the AI drew from a code example with a flaw [29][30]. Use security linters or scanners (tools like Semgrep or Bandit can catch obvious issues [31]). If the AI generated any dependency or configuration, ensure you review those for secrets or insecure defaults. Treat the AI’s code as if you hired a contractor whose work you don’t fully trust – double-check everything, because ultimately your team is accountable for any bugs or security holes, no matter who wrote the code.
Leverage AI for self-review before seeking peer review. One productive pattern is to ask the AI to critique its own output before you open a pull request. For example, after getting a code suggestion, you might prompt, “What potential issues do you see in this code? Any edge cases or improvements?” The AI might point out a condition you didn’t consider or a more idiomatic approach. It’s like a spell-check for logic – not infallible, but it can catch low-hanging fruit. This doesn’t replace a human review, but it can help you clean up the draft so that your peers aren’t distracted by obvious problems. Think of it as you collaborating with the AI to polish the code, then handing it to your team. This also helps you learn, as the AI’s review comments can highlight areas you need to think about. Just remember to verify any AI feedback; sometimes it might “hallucinate” problems that aren’t real, so use your judgment.
If an AI-generated change is too large or confusing, break it down. Don’t let the AI’s speed force you into merging giant, monolithic changes. If Cursor spews out 500 lines of mixed modifications, it might be better to treat that as a prototype. Perhaps run the code to see if the approach works, then reimplement the solution in smaller, comprehensible pieces. One developer likened an initial AI-generated draft to a spike solution – a quick and dirty implementation to prove a concept[32]. You wouldn’t merge a spike into production; you’d refine it. Similarly, take the AI draft and iteratively improve it: maybe split that big PR into multiple commits or pull requests that are easier to review. Often the second draft (written with the insight gained from the first) is much cleaner and more maintainable[32]. This disciplined approach prevents the “gish gallop” effect where the AI dumps so much code that reviewers can’t effectively review it. By breaking it down, you ensure that each piece gets adequate human attention.
Document and label AI contributions when sharing with the team. In your pull request description or code comments, it can be helpful to note which parts were generated by AI or if you relied heavily on an AI for a solution. For example: “Used Gemini/Opus/GPT to generate the initial implementation of this sorting algorithm; reviewed and modified the result.” This kind of transparency helps reviewers know where to focus. It’s not about blaming the AI or you but about context. In fact, marking AI-generated code with clear comments or annotations is encouraged as a way to create accountability and traceability[33]. If an odd bug appears later, the team can trace it back and see, “Oh, this chunk was AI-written based on prompt X” and that might make debugging easier. Of course, do this in a supportive culture (see next section) – the goal is to collectively safeguard quality, not to call someone out. Some teams even keep a log of AI-assisted changes for auditing purposes [33]. At the very least, consider sharing the prompt you used with your reviewers, e.g. in a PR comment. That way the reviewer understands what you asked for and can judge if the AI’s code actually matches the intent[15]. This prompt-as-spec technique can bridge the gap between intention and implementation.
In summary, treating AI code as a draft means applying all the same rigor you would to a human novice’s code: you review it deeply, test it thoroughly, and don’t assume anything is correct until proven. The AI can drastically speed up writing boilerplate and even suggest solutions, but you are the engineer – you must integrate those suggestions into the codebase responsibly.
Establish team agreements for AI-generated code
To successfully integrate AI into development, teams should set clear guidelines – essentially a “contract” – on how to handle AI-generated code. This is a new frontier, and misalignment can cause friction or quality issues. A team working agreement might include rules, responsibilities, and cultural norms around AI usage. Here are some key elements teams are adopting:
Ensure accountability doesn’t lapse. Make it explicit that whoever integrates AI-generated code into the codebase is responsible for it, full stop. No pointing fingers at the AI. If a bug is introduced, it’s treated like any other bug you’d introduce. This principle, supported by industry guides, says developers must take full ownership of any code they commit, regardless of who wrote it, and test AI-generated code as thoroughly as their own [21]. Management should reinforce that using AI is not an excuse for lower quality. Code reviewers and approvers also share responsibility – if you approve a change, you’re vouching for it as usual. Essentially, AI doesn’t change the definition of “code owner.”
Define how and when AI should be used. As a team, discuss what types of tasks are appropriate for AI assistance. For example, you might agree that AI is great for generating unit tests, boilerplate, scaffolding, or exploring multiple approaches – but perhaps you’ll avoid using it for core complex algorithms without additional review. Some teams may forbid AI use for security-sensitive code or critical algorithms, unless a senior engineer supervises closely. Others might say it’s fine to use AI for anything as long as you follow the other rules (understand it, test it, etc.). The key is to set expectations. This also ties into ethical and legal considerations (e.g. ensuring AI output doesn’t include copied licensed code, or doesn’t introduce biases), but that’s another essay in itself. The point is, an agreed policy prevents misunderstandings like one dev merging huge AI-written chunks that others aren’t comfortable with.
Emphasize transparency and psychological safety. The team contract should encourage developers to be open about AI involvement. For instance, a guideline could be: “If AI assisted significantly in a change, mention it in the PR.” Leaders must foster an environment where this admission is seen positively (as due diligence), not negatively. A lack of transparency can lead to “shadow AI” in your codebase – code that is AI-written but nobody realizes it, making debugging and maintenance harder [20]. To avoid that, make transparency the norm. One practice is adding a simple comment in the code like // Code generated with AI assistance or using a tag in PRs. The team might also agree on documenting prompts in the project wiki or in the code review for future reference [33]. If someone feels they don’t fully understand an AI-generated section, they should feel safe to say so and ask for help or extra review [33]. It’s far better to admit “I’m not 100% confident in what Copilot produced here” than to pretend everything is fine. Psychological safety ensures people speak up, which ultimately protects the code quality and the developers’ growth.
Integrate AI-awareness into the review process. Teams should update their code review checklists or definitions-of-done to account for AI. For example, a review checklist might add items like “If code was AI-generated, has the author provided the prompt or described the intent?” or “For AI-generated code, double-check for common issues (edge cases, security, style consistency).” Some organizations formalize this by requiring an extra pair of eyes on AI-heavy code, as noted earlier[34]. Training sessions can help too – a team might do a brownbag meeting on “typical AI mistakes” so all reviewers know what to watch for (e.g. unnecessary complexity, missing null checks, etc.). The team could also adopt tools to assist, like AI-powered code analysis that flags likely problematic code patterns. Ultimately, the whole review culture may shift to treat AI contributions with a bit more rigor. As a shared rule, you might say: No AI-generated code gets merged without thorough human review, no exceptions. It seems obvious, but stating it sets the tone that speed will not trump quality.
Support continuous learning and skill development. To address the critical thinking atrophy issue, a team agreement can explicitly encourage practices that keep skills sharp. For instance, pair programming sessions where one person doesn’t use AI and explains their thought process, or rotations on challenging bug fixes without AI. Or even simply encouraging developers to occasionally implement things “the hard way” first, before using AI to optimize. Some companies have gone as far as tracking how AI impacts debugging time and making sure employees still know how to troubleshoot without the tool[10]. An agreement could be: “We use AI to speed up routine tasks, but we still expect engineers to understand and be able to manually handle the complex parts.” By acknowledging this in your team principles, you validate the importance of human expertise. Leads and managers in particular should lead by example – demonstrating in code reviews that they scrutinize AI-generated code just as they would any code, asking thoughtful questions. Junior devs will take cues from that and learn that AI is not a get-out-of-thinking-free card.
In essence, a team’s AI code agreement is about maintaining quality, clarity, and trust. Everyone should know how AI is being used and agree on the standards its output must meet. This “contract” might be a living document that evolves as you gain experience. The goal is to prevent the scenario where AI quietly degrades your codebase or your engineers’ skills. Instead, with rules in place, AI can be harnessed as a powerful accelerator with guardrails. It forces conversations now about topics that were previously implicit (like “do you understand what you committed?”) – now we make them explicit.
Conclusion: AI is not a replacement for understanding
AI coding tools are here to stay, and they excel at generating drafts – the scaffolding, the boilerplate, even complex code that might take a human much longer to write from scratch. Embracing them can lead to huge gains in productivity and free developers from drudgery. But the moment we start treating AI-generated code as “fire-and-forget,” we undermine the very benefits we seek.
The true value of AI in software engineering comes when we pair its speed with our judgment. That means always reviewing AI output with a critical eye, staying curious about why the code works, and insisting on clarity and correctness. When you treat AI-generated code as a draft, you acknowledge it’s a work in progress – to be massaged and perfected by human insight.
By maintaining high standards for code quality and developer education, we ensure that AI is a tool that augments our capabilities rather than atrophying them. We keep the “why” and “how” in focus even as the “what” is delivered to us on a platter. In practical terms: don’t stop reading code.
Whether written by an intern, an AI, or a seasoned colleague, code must be understood to be trusted. If you never outsource the reading and thinking, you retain the ability to connect a code’s behavior back to the intent behind it – which is the essence of software engineering.
Use AI to move faster, by all means, but keep your hands on the wheel.
The code that lands in production should always have a human’s eyes (and heart) behind it. That way, we get the best of both worlds: the efficiency of AI-generated first drafts and the reliability of human-reviewed, well-understood final code.
I’m excited to share I’ve released a new AI-assisted engineering book with O’Reilly. There are a number of free tips on the book site in case interested.







Great post. All devs using AI should read it.
All valid thoughts, the critical ability of human thinking on the system context is vital and keeping human in the loop is vital step and that needs to be practiced in AI-assisted/ AI-first development models. As more and more developers engaged with these tools, it becomes absolute necessary to design effective guardrails.