6 Comments
User's avatar
Colleen Avarene's avatar

The parallel review experiment stopped me cold — 93.4% of flagged issues caught by exactly one tool, none by all four. That's not a coverage problem, that's a fundamental argument against single-reviewer trust at any level, human or machine.

The tiered risk framing is the part I wish more people were talking about. I work with small business owners who are just starting to use AI agents in their operations, and the "it depends on who you are" insight is the one that saves them from bad advice written for enterprise teams. A solo photographer's booking agent doesn't need the same review architecture as a bank's transaction system. But the fear-based discourse treats them identically.

One thing I'd push on: the "humans move upstream" conclusion assumes the humans involved have the judgment to know what's load-bearing and what isn't. For teams that were already rubber-stamping before agents arrived, the volume increase just makes the existing gap visible. The constraint was always judgment — agents just removed the camouflage.

Really appreciate the honesty about where we actually are instead of where we wish we were.

Caleb Mellas's avatar

Thanks for writing about this, Addy.

I’m seeing the same thing for my org / team. I analyzed our last 90 days of tickets and 68% of the time is spent on review + test + deploy.

The bottleneck is all the other parts of the SDLC besides writing code.

Love the triage into risk levels idea for PRs, gonna start doing that soon!

Also love “first time for human to review” concept and how we need to move that left / earlier than code review.

I honestly thing code review has become the last line in the sand for: “does this thing do what it intended to do and not bring down our systems or shoot us in the foot for later”. Used to work with a couple hand written PRs, doesn’t hold up anymore like you said.

I know you didn’t share all the answers but it does really help to hear what you are seeing which confirmed a lot of my thoughts. Appreciate it!

Ankit Jain's avatar

Great read. I have a lot of thoughts to put in a comment, but broadly it checks well, have also written a few pieces in this direction (won't blast here).

Knowledge sharing is an important piece that separates solo projects from enterprise engineering teams collaborating, some thoughts I shared with CIO:

https://www.cio.com/article/4179485/ai-killed-the-code-review-what-happens-to-knowledge-sharing.html

Stephen Metcalfe's avatar

I published my own booklet on this subject last week, for the same reason. Suffice to say this is becoming a popular subject to discuss.

You make some good points: the whole aim of PRs has changed, and we should be checking intent first. Agents write good-looking code that is fully tested, but that still doesn't mean it is fit for purpose.

Jason's avatar
10hEdited

reviewing code is a chore, and it is also error prone. If you can have the agents find and call out common mistakes (such as copy-paste errors, buffer overruns, unallocated memory etc) then this will save developers a lot of time to work on features instead and improve the quality of the end product. So long as each of the agents suggestions is reviewed and understood by a developer, this can only add value, save annoying headaches further on down the line, and sits alongside something like IDE compiler warnings.

oleg koval's avatar

It’s true what you said regarding the solutions (tiered reviews, small PRs, different tools). It’s true but there’s one important thing to consider - the power plays involved within review don’t go away, they merely change venue.

The gatekeeping now moves to who is considered acceptable with regard to the use of AI and whose prompts are considered to be correct. Engineers who’ve built reputations on denying work in favor of review now stand exposed and only dig themselves in deeper. Purists get their identity through rejection of AI as an indication of quality.

One could solve the process issues you mentioned. It is humans after all, who will simply move the battles elsewhere. Wondering if your tiered review approach considers the implications thereof.