{"authors":[{"name":"Martin Stühmer","url":"https://daily-devops.net/authors/martin/"},{"name":"Jendrik Brack","url":"https://daily-devops.net/authors/jendrik/"}],"description":"Recent content in Artificial Intelligence in Software Engineering on Daily DevOps \u0026 .NET","favicon":"https://daily-devops.net/images/logo_hu_6465d873dfa490cf.png","feed_url":"https://daily-devops.net/tags/ai/feed.json","home_page_url":"https://daily-devops.net/tags/ai/","icon":"https://daily-devops.net/images/logo_hu_5926de77762241ba.png","items":[{"authors":[{"name":"Martin Stühmer","url":"https://daily-devops.net/authors/martin/"}],"content_html":"\u003cp\u003eGitHub Copilot code review is available in pull requests. Claude can review a diff. Cursor highlights issues as you type. Every major AI coding assistant now offers some form of review, and teams are using these tools to supplement (or in some cases replace) asynchronous human review on pull requests.\u003c/p\u003e\n\u003cp\u003eThis is not necessarily wrong. AI code review is genuinely useful. But there is a pattern to what it misses, and understanding that pattern matters more than debating whether to use these tools at all.\u003c/p\u003e\n\u003cp\u003eIn my experience, AI code reviewers behave like sycophants. They are good at finding small problems with how you built something. They are almost incapable of questioning whether you should have built it at all.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"what-ai-code-review-is-good-at\"\u003e\u003ca href=\"/posts/ai-code-review-is-a-sycophant/#what-ai-code-review-is-good-at\" title=\"What AI Code Review Is Good At\"\u003eWhat AI Code Review Is Good At\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eTo be clear: these tools are useful. Worth adding to your PR workflow.\u003c/p\u003e\n\u003cp\u003eAI reviews reliably catch:\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eObvious bugs in isolation.\u003c/strong\u003e Null dereferences, off-by-one errors, incorrect operator precedence, missing \u003ccode\u003eawait\u003c/code\u003e, unchecked return values from methods that can fail. These are the bugs human reviewers also catch, and they slip through when reviewers are tired, rushed, or staring at a 500-line diff.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCommon anti-patterns.\u003c/strong\u003e \u003ccode\u003easync void\u003c/code\u003e, catching \u003ccode\u003eException\u003c/code\u003e without rethrowing, \u003ccode\u003eDateTime.Now\u003c/code\u003e instead of \u003ccode\u003eDateTime.UtcNow\u003c/code\u003e, string concatenation in loops, \u003ccode\u003eConfigureAwait(false)\u003c/code\u003e missing in library code. Pattern matching against known bad patterns is exactly what Large Language Models (LLMs) do well.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTrivial security issues.\u003c/strong\u003e SQL injection via string concatenation, hardcoded credentials, insecure random number generation. These appear in training data thousands of times.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eStyle consistency.\u003c/strong\u003e Naming inconsistencies, missing XML documentation, inconsistent error handling patterns relative to the rest of the file.\u003c/p\u003e\n\u003cp\u003eThese categories represent real value. A review pass that catches these before human review means human reviewers can spend their time on harder problems.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"what-ai-code-review-systematically-misses\"\u003e\u003ca href=\"/posts/ai-code-review-is-a-sycophant/#what-ai-code-review-systematically-misses\" title=\"What AI Code Review Systematically Misses\"\u003eWhat AI Code Review Systematically Misses\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eThis is where the sycophancy shows up.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eWrong abstraction.\u003c/strong\u003e AI reviewers evaluate the code you wrote against its own internal logic. They rarely notice that the abstraction itself is wrong: that the \u003ccode\u003eOrderProcessor\u003c/code\u003e class is doing three different things and probably should not exist as a single class, that the interface design couples callers to implementation details, that the naming reveals a confused mental model of the domain. Recognizing a wrong abstraction requires understanding the system it lives in and the cost of fixing it later. AI reviewers do not have that context.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u0026ldquo;This should be deleted.\u0026rdquo;\u003c/strong\u003e The correct review comment for a surprising fraction of pull requests is something like: \u0026ldquo;This feature was not the right call, let\u0026rsquo;s talk before merging.\u0026rdquo; AI reviewers will not write that comment. They review code on its own terms. A well-implemented feature that solves the wrong problem gets a positive AI review, and that feedback loop, repeated over time, shapes how a team thinks about what quality means.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSystemic patterns across the codebase.\u003c/strong\u003e AI reviewers see the diff. They do not know that the same abstraction appeared in three other places and was wrong each time. They do not know that this exact approach was tried and reverted eight months ago, and that the revert commit explains why. Reviewers with codebase history catch this. AI reviewers cannot.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eBusiness logic correctness.\u003c/strong\u003e Is this the right formula for calculating the surcharge? Does this authorization check correctly represent the access control model? Is this state machine transition valid given how the domain actually works? AI reviewers can tell you the code is internally consistent. They cannot tell you it is correct relative to what the software is supposed to do. This is not a minor gap. Business logic bugs are often the costliest bugs, and they are invisible to a reviewer that does not understand the business.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ePerformance under real load.\u003c/strong\u003e AI reviewers flag obvious O(n²) algorithms and missing database indexes in toy examples. They rarely have visibility into the data distribution, the access patterns, or the production load profile that determines whether the code will hold up at scale. The performance review that matters happens in load testing and production, not in the diff view.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"the-sycophancy-problem\"\u003e\u003ca href=\"/posts/ai-code-review-is-a-sycophant/#the-sycophancy-problem\" title=\"The Sycophancy Problem\"\u003eThe Sycophancy Problem\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eThe specific failure mode of AI code review is not that it misses things. Every review process misses things. The problem is the pattern of what it misses.\u003c/p\u003e\n\u003cp\u003eAI reviewers tend to approve the overall approach and find issues in the details. When a team leans heavily on AI review, there is a subtle risk: reviewers get better and better at fixing the details an AI flags, while the bigger structural questions get less attention over time. I have seen this happen, and it is not anyone\u0026rsquo;s fault. It is a natural response to the feedback signal you are getting.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"why-the-approval-bias-is-structural\"\u003e\u003ca href=\"/posts/ai-code-review-is-a-sycophant/#why-the-approval-bias-is-structural\" title=\"Why The Approval Bias Is Structural\"\u003eWhy The Approval Bias Is Structural\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eThe approval bias is structural. AI reviewers are trained on review data where most code in a diff is acceptable. The kind of feedback that says \u0026ldquo;the entire approach here is wrong, close this PR and start over\u0026rdquo; is rare in training data and produces outcomes that make the tool seem less useful. So the model optimizes away from it.\u003c/p\u003e\n\u003cp\u003eThe result: AI reviewers are systematically biased toward approving what you built and suggesting small improvements. They are not calibrated to recognize when the correct response is rejection.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"the-confidence-effect-on-developers\"\u003e\u003ca href=\"/posts/ai-code-review-is-a-sycophant/#the-confidence-effect-on-developers\" title=\"The Confidence Effect On Developers\"\u003eThe Confidence Effect On Developers\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eThere is also a confidence effect worth naming. A developer who ships a PR with zero AI findings tends to feel more confident that the code is solid. That confidence is not entirely wrong (the mechanical issues are likely clean), but it can crowd out the instinct to ask for a second human opinion. Over time, \u0026ldquo;the AI found nothing\u0026rdquo; starts to function as a substitute for \u0026ldquo;this is good code\u0026rdquo;, and that is a different claim entirely.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"what-ai-review-should-change-about-human-review\"\u003e\u003ca href=\"/posts/ai-code-review-is-a-sycophant/#what-ai-review-should-change-about-human-review\" title=\"What AI Review Should Change About Human Review\"\u003eWhat AI Review Should Change About Human Review\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eIf AI review is in your pipeline, it should shift what human reviewers focus on, not replace them.\u003c/p\u003e\n\u003cp\u003eAI reviewers handle the mechanical layer well: obvious bugs, pattern violations, style issues. That creates an opportunity for human reviewers to focus on what AI cannot do:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eIs this the right design?\u003c/li\u003e\n\u003cli\u003eDoes this code belong here at all?\u003c/li\u003e\n\u003cli\u003eDoes the naming suggest the author has a clear mental model of the domain?\u003c/li\u003e\n\u003cli\u003eIs this consistent with decisions made elsewhere in the system?\u003c/li\u003e\n\u003cli\u003eWhat will maintaining this cost in six months?\u003c/li\u003e\n\u003c/ul\u003e\n\n\n\n\n\u003ch3 id=\"where-human-review-time-belongs\"\u003e\u003ca href=\"/posts/ai-code-review-is-a-sycophant/#where-human-review-time-belongs\" title=\"Where Human Review Time Belongs\"\u003eWhere Human Review Time Belongs\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eHuman review time is finite. If a human reviewer spends twenty minutes on a PR that an AI already reviewed and only surfaces style issues, something has gone wrong with how review time is being used. The value of human review is judgment, context, and the willingness to say \u0026ldquo;not yet.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eA team that uses AI review to reduce the need for human judgment does not end up with less review. It ends up with coverage that feels high but catches less of what actually matters.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"the-diff-problem\"\u003e\u003ca href=\"/posts/ai-code-review-is-a-sycophant/#the-diff-problem\" title=\"The Diff Problem\"\u003eThe Diff Problem\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eBoth AI and human review share a structural limitation: they evaluate changes, not outcomes.\u003c/p\u003e\n\u003cp\u003eA large refactor that genuinely improves a design looks messy as a diff: deletions everywhere, moved code, renamed concepts. A small change that introduces a subtle bug can look perfectly clean. Both human and AI reviewers are influenced by the shape of the change, not just its effect on the codebase.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"why-ai-cannot-step-outside-the-diff\"\u003e\u003ca href=\"/posts/ai-code-review-is-a-sycophant/#why-ai-cannot-step-outside-the-diff\" title=\"Why AI Cannot Step Outside The Diff\"\u003eWhy AI Cannot Step Outside The Diff\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eAI reviewers are more constrained here because they have no option to go beyond the diff. A human reviewer can pull the branch, run it, read the surrounding code, check git history. AI reviewers are limited to what is presented to them.\u003c/p\u003e\n\u003cp\u003eThis means AI review is structurally better suited to focused, contained changes, and less suited to catching problems that only become visible when you look at the broader context.\u003c/p\u003e\n\n\n\n\n\u003ch3 id=\"a-concrete-library-migration-example\"\u003e\u003ca href=\"/posts/ai-code-review-is-a-sycophant/#a-concrete-library-migration-example\" title=\"A Concrete Library Migration Example\"\u003eA Concrete Library Migration Example\u003c/a\u003e\u003c/h3\u003e\n\u003cp\u003eA concrete example: a PR that migrates a service to use a new internal library might look straightforward in the diff. The imports change, a few method calls are updated, tests pass. An AI reviewer sees nothing alarming. But a human who knows that the new library has different error propagation semantics, or that the migration breaks an assumption made elsewhere in the codebase, can catch that. The diff does not surface it. Context does.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"using-ai-review-without-becoming-dependent-on-it\"\u003e\u003ca href=\"/posts/ai-code-review-is-a-sycophant/#using-ai-review-without-becoming-dependent-on-it\" title=\"Using AI Review Without Becoming Dependent on It\"\u003eUsing AI Review Without Becoming Dependent on It\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eA few practices that have worked well in my experience:\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eUse AI review as a pre-filter, not a gatekeeper.\u003c/strong\u003e Let it catch mechanical issues before human review. Humans then review for judgment, not syntax. An AI approval should not substitute for human review on anything that carries real risk.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTreat AI approval as a weak signal.\u003c/strong\u003e An AI saying \u0026ldquo;looks good\u0026rdquo; means it did not find a pattern match for common issues. That is useful information, but it is not an endorsement of the design.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eRead what the AI flagged, and what it did not.\u003c/strong\u003e If it found nothing interesting, that is not evidence the code is good. It may mean the problems are exactly the kind the AI cannot see.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eKeep humans in the design conversation.\u003c/strong\u003e Architecture decisions, new abstractions, changes to domain models: these all need human review from someone with context. No AI reviewer carries your system\u0026rsquo;s history, your domain knowledge, or the judgment to tell you a design direction is off before you build it out.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eWatch for approval drift.\u003c/strong\u003e If PRs consistently get AI approval and human reviewers gradually stop questioning design decisions, that is a signal worth paying attention to. The human review may have been quietly degraded, not supplemented.\u003c/p\u003e\n\n\n\n\n\u003ch2 id=\"the-honest-summary\"\u003e\u003ca href=\"/posts/ai-code-review-is-a-sycophant/#the-honest-summary\" title=\"The Honest Summary\"\u003eThe Honest Summary\u003c/a\u003e\u003c/h2\u003e\n\u003cp\u003eAI code review tools are useful. Add them to your pipeline. Let them handle the mechanical layer.\u003c/p\u003e\n\u003cp\u003eBut they are not reviewers in the sense that actually matters. They do not have judgment. They do not know your system. They cannot tell you that you built the wrong thing. They are pattern matchers with a structural bias toward approving what you wrote.\u003c/p\u003e\n\u003cp\u003eThe risk is not that these tools make developers worse (most developers using AI review are thoughtful professionals who also get human review). The risk is subtler: over time, optimizing for what AI review catches can quietly shift attention away from the questions it cannot ask. Staying aware of that dynamic is enough to avoid it.\u003c/p\u003e\n\u003cp\u003eAI review is a useful tool. Keep it in that category.\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eAn AI reviewer that says \u0026ldquo;looks good\u0026rdquo; is not telling you the code is good. It is telling you it did not find a match.\u003c/p\u003e\n\u003c/blockquote\u003e\n","date_modified":"2026-05-25T22:27:03+02:00","date_published":"2026-05-12T17:00:00+02:00","id":"https://daily-devops.net/posts/ai-code-review-is-a-sycophant/","language":"en","summary":"Copilot and Claude find real bugs, but miss wrong abstractions and bad designs. Understanding that gap matters more than debating the tools.","tags":["ai","softwareengineering","codequality","bestpractices","devops"],"title":"AI Code Review Is a Sycophant: Why It Always Approves","url":"https://daily-devops.net/posts/ai-code-review-is-a-sycophant/"}],"language":"en","title":"Artificial Intelligence in Software Engineering on Daily DevOps \u0026 .NET","version":"https://jsonfeed.org/version/1.1"}