Review Response Guardrails
The review trigger prompt includes instructions that tell the model: "Do NOT blindly follow the reviewer's suggestion," "Think critically about whether the suggestion makes sense," "No flattery or praise in responses," and "Update GITAUTO.md for reusable rules."
Why This Exists
Without guardrails, the model sycophantically agrees with every review comment, even when the reviewer is wrong. It responds with "Great suggestion!" and implements a change that breaks the code. A reviewer might suggest removing error handling ("this try-catch seems unnecessary"), and the model would eagerly comply, introducing an unhandled exception. Sycophancy is one of the model's strongest failure modes, and review responses are where it causes the most damage.
Why Models Are Sycophantic
This is fundamentally a training problem. Models are trained with reinforcement learning from human feedback (RLHF), where human raters consistently reward agreeable, helpful-sounding responses. This training signal is so strong that models will agree to changes they "know" will break code. When a reviewer says "change X to Y," the model's default is to comply because compliance gets positive reinforcement in training. Pushing back - saying "actually, that would break Z" - requires the model to contradict the human, which RLHF actively penalizes. The result is models that are dangerously agreeable when review suggestions are technically wrong.
How It Works
When a review comment triggers a new agent session, the system prompt includes specific anti-sycophancy instructions. The model is told to evaluate whether the suggestion is technically correct before implementing it. If the suggestion would break functionality, the model is instructed to explain why and propose an alternative. Flattery (e.g., "Great catch!") is explicitly prohibited to keep responses focused on technical substance. Additionally, if the review reveals a reusable pattern or rule, the model is instructed to add it to GITAUTO.md so future sessions benefit.
Related Features
- GITAUTO.md Restrictions - controls what gets saved to GITAUTO.md from review learnings
- Anti-Hallucination Prompts - similar prompt-based approach for preventing other hallucination types
Need Help?
Have questions or suggestions? We're here to help you get the most out of GitAuto.
Contact us with your questions or feedback!