Reviewing AI pull requests without rubber-stamping them
A short field guide for senior engineers: what to check first when the diff was written by an agent, and what to ignore.
The first AI-generated pull request I merged without reading carefully introduced a subtle bug in a date comparison. The tests passed. The diff looked clean. The agent had even written a thoughtful PR description. I learned the obvious lesson and a less obvious one: code review for agent output is a different shape than code review for a teammate.
What changes when an agent writes the diff
A human teammate carries context between PRs. They remember the conversation from last week, the half-finished refactor in the other branch, the reason a function is structured oddly. An agent does not, unless you give it that context inside the prompt. The shape of the diff reflects that. Agents reach for plausible patterns. They are confident in tone. They are weakest at the seams where this change meets existing code that was not in their window.
Review accordingly. The local logic inside a new file is usually fine. The boundary, where the new code calls into the old code, is where bugs live.
What I check first
- The seams. Every place the diff touches an existing function signature, an existing type, or an existing table. Did the agent guess at the contract, or did it read it? A quick
git blameon the surrounding lines tells you whether it had context. - Silent fallbacks. Agents love
try/catchwith a default value. That is sometimes correct and often a swallowed error. Read every catch block in the diff. - Tests that pass for the wrong reason. A new test that exercises the new code is good. A new test that mocks the boundary you were worried about is a yellow flag. Run the test with the mock removed and see what happens.
- Names that drift. If the agent introduced a new term for a concept the codebase already has a name for, that is a sign it lacked context. Rename now or pay later.
What I stop checking
Formatting. Import order. Whether the comment style matches the rest of the file. Those are real concerns and they are also the things linters and formatters were built for. Spending review attention on them is the most expensive way to catch the cheapest bugs.
I also stop rewriting the diff in my head. If the structure is reasonable and the behavior is correct, ship it. The "I would have written this differently" instinct is the same instinct that makes some senior engineers a bottleneck for human teammates. Agents do not need the ego stroke and your future self does not need the merge conflict.
The two-pass rhythm that works for me
First pass, fast: read the PR description, scan the file list, look at the seams. If anything feels wrong, stop and ask the agent to explain its reasoning before reading further. The cost of a question is low. The cost of debugging your own confused review later is high.
Second pass, slow: read the actual logic for the parts that matter. Run the code locally if the change is not trivial. Trust nothing that you have not seen execute.
This is the same rhythm good engineers use on human PRs. The difference is that with an agent, the first pass catches more, because the failure modes are more predictable.
Backlog Viewer is built this way. When the agent edits a BACKLOG.md file on your behalf, we surface the diff before it lands in your repo, so the review pass takes seconds instead of a context switch.
Try it
A calm place to edit your BACKLOG.md
Your file stays in GitHub. We just make it nicer to manage.
Sign in with GitHub