Blog Posts

Your Prompt Is Not Enough. Add a Gate.

Apr 17, 2026

A single prompt instruction loses to the rest of the prompt's bias. Programmatic quality gates enforce what prompts suggest.

Ask a File, Don't Read the Whole Thing

Apr 13, 2026

How routing file reads through a cheaper model saves input tokens when your AI agent only needs a quick answer, not the full source code.

How We Cut CI git merge Fetch from 137s to 1s

Apr 12, 2026

Shallow clones break git merge. We replaced --unshallow with the GitHub Compare API and exponential --deepen, cutting fetch time from 137s to under 1s.

Let Your AI Agent Forget on Purpose

Apr 11, 2026

Why we built a tool that lets an AI agent explicitly drop file contents from its own context window, and the token economics behind it.

Runtime Downloads Are Hidden CI Costs

Apr 11, 2026

MongoMemoryServer downloads a 100MB binary on every CI run. Pre-caching the correct platform-specific archive to S3 eliminates that hidden latency.

One Web Fetch Ate 28% of Our PR Cost

Apr 10, 2026

One web fetch compounded across 22 agent turns and ate 28% of the entire PR's cost. We added Haiku as a summarization filter to stop the bleed.

Why PR Bodies Should Tell the Story

Apr 9, 2026

AI agents create PRs but leave reviewers guessing what happened. We added Claude-generated sections summarizing work done, bugs found, and trade-offs.

Why Retargeting a PR Explodes the Diff

Apr 8, 2026

GitHub's change-base-branch API only updates metadata. When sibling release branches are involved, the PR diff explodes with unrelated files.

Our Agent Had the Checklist and Ignored It

Apr 8, 2026

An AI agent had a 41-check quality checklist but kept making cosmetic edits instead of addressing failures. The fix was application-layer forcing.

Zero Changes Passed Our Quality Gate

Apr 7, 2026

Our verification gate let zero-change PRs pass because there were no files to check. The quality checks we built were never enforced.

Tests Can Go Stale at 100% Coverage

Apr 2, 2026

When source code changes but tests don't, the quality evaluation is outdated. We detect this automatically using content hashes from data we already have.

How We Finally Solved Test Discovery

Apr 1, 2026

A 5-tier scoring system validated against 17 real repos replaced flat scoring that broke with real data. Content-aware matching fills the gaps.

What 100% Test Coverage Can't Measure

Mar 31, 2026

100% line coverage means every line was executed, not that meaningful scenarios were tested. Quality checks fill the gap after 100%.

Test File Discovery Is Still Unsolved

Mar 30, 2026

Given an implementation file, which test files are relevant? We tried stem matching, content grepping, and hybrid discovery. Each fails differently.

39 Duplicate Jest Errors Cost Us $300

Mar 29, 2026

A single PR cost $300 because 39 identical Jest TypeErrors survived our log cleaning pipeline. Here's the root cause, the fix, and how we prevent it.

Vanilla Claude vs GitAuto Test Generation

Mar 28, 2026

We tested a calculator with vanilla Claude and GitAuto. Claude wrote 19 happy-path tests. GitAuto wrote 41 including adversarial edge cases.

Can You Guess What Tests a Calculator Needs?

Mar 27, 2026

A 40-line Python calculator. How many tests would you write? Most say 10-15. GitAuto generated 41. Here's what you missed.

What Are Adversarial Tests and Why Run Them

Mar 26, 2026

Adversarial tests push code with unexpected inputs - infinity, NaN, type mismatches, duck typing. Learn why they catch bugs happy-path tests miss.

Requesting Changes to GitAuto's Pull Requests

Mar 2, 2026

A real-world example of how GitAuto handles review comments and iterates on its pull requests based on user feedback.

How We Reached 92% Coverage with GitAuto

Oct 9, 2025

We dogfooded GitAuto on our own codebase and reached 92% line coverage, 96% function coverage, and 85% branch coverage.

9 Best Unit Test Agents 2025 Compared

Aug 8, 2025

Compare 9 leading AI-powered unit test agents and automated testing tools in 2025. Find the right solution for your development team.

Why Unit Tests Feel Like a Waste of Time

Aug 7, 2025

A developer's controversial take on unit testing sparked debate. Here's what the community really thinks and how AI changes the equation.

Unit vs End-to-End Testing: QA Focus Guide

Aug 7, 2025

Explore the key differences between unit and E2E testing, when each approach delivers maximum value, and how to build a comprehensive testing strategy.

Why You Need Both Manual and Automated Tests

Aug 5, 2025

Manual testing costs are spiraling and teams are drowning in repetitive tasks. Here's how to break the bottleneck without losing coverage.

What are DORA Metrics and Why They Matter

Nov 24, 2024

DORA (DevOps Research and Assessment) metrics are a key benchmark for measuring software delivery performance.