Quality Check Scoring
GitAuto evaluates test quality beyond raw coverage numbers. After tests pass and coverage targets are met, a separate evaluation scores the tests across multiple quality categories. Each source file and its test files are analyzed together to identify gaps that coverage metrics alone cannot detect - missing edge cases, security vulnerabilities, performance regressions, and more.
Why Coverage Is Not Enough
100% line coverage means every line executes during tests. It does not mean the tests verify correct behavior. A function that parses user input can achieve full coverage with a single valid input string, while missing SQL injection, XSS, null bytes, Unicode edge cases, and boundary values entirely. The lines run, but the dangerous paths are never tested.
Quality check scoring fills this gap by evaluating what the tests actually verify, not just what code they happen to execute. A test file that covers 100% of lines but ignores adversarial inputs scores poorly on the adversarial category, signaling that the tests need strengthening.
The 7 Quality Categories
Each source file is evaluated against checks organized into these categories:
- Adversarial - null/undefined inputs, empty strings and arrays, boundary values, type coercion, large inputs, race conditions, Unicode and special characters
- Security - XSS, SQL injection, command injection, code injection, CSRF, auth bypass, sensitive data exposure, input sanitization, open redirects, path traversal
- Performance - quadratic algorithms, heavy synchronous operations, N+1 queries, large imports, redundant computation
- Memory - event listener cleanup, subscription and timer cleanup, circular references, closure retention
- Error Handling - graceful degradation, user-facing error messages
- Accessibility - ARIA attributes, keyboard navigation, screen reader support, focus management
- SEO - meta tags, semantic HTML, heading hierarchy, alt text
Not every check applies to every file. A backend utility function has no accessibility or SEO concerns. The evaluator marks inapplicable checks accordingly so they do not penalize the score.
Change Detection with Blob SHA
Quality evaluation uses Claude to analyze source and test files together, which costs tokens and takes time. Running it on every PR for every file would be wasteful when most files have not changed. GitAuto uses Git blob SHAs to detect changes efficiently.
Three values are tracked per source file: the implementation file's blob SHA, the test file's blob SHA, and a hash of the quality checklist itself. Re-evaluation triggers only when at least one of these changes. If the source code changes, the existing tests may no longer cover the new logic. If the test file changes, the quality scores may have improved or regressed. If the checklist adds new check items, previously passing files need scoring against the expanded criteria.
When none of the three values change, the stored quality scores are reused. This makes quality checking practical even for repositories with thousands of source files - only modified files pay the evaluation cost.
How Scoring Drives PR Creation
When quality gaps are found, GitAuto creates PRs to strengthen the tests. The evaluation results identify specific categories where tests are weak - for example, a file with no adversarial tests or missing error handling coverage. These specific gaps become the scope of the generated PR, so the agent knows exactly what types of tests to add rather than guessing.
Quality Gate Enforcement
Creating a PR is not enough - the agent must actually improve the tests. When the agent declares the task complete, a quality gate verifies the work. For quality-focused PRs (created by the scheduler or dashboard), this gate has two layers:
- Zero-change rejection- if the agent made no changes to the test file, the task is rejected immediately. The scheduler already determined the tests were weak when it created the PR, so "no changes needed" is not a valid completion.
- Post-change evaluation - after the agent makes changes and all other checks pass (linting, type checking, test execution), the quality evaluation runs again on the updated test files. If the tests still fail quality checks, the agent must iterate.
The post-change evaluation runs last to avoid wasting an LLM call when the agent will need to retry anyway due to lint or test failures. If the agent genuinely cannot improve the tests after a retry, the system allows completion to prevent infinite loops.
Related Features
- Quality Checklist - the full list of 44 checks across 9 categories used during scoring
- What 100% Test Coverage Can't Measure - blog post on why quality checks exist beyond coverage
- Tests Can Go Stale at 100% Coverage - blog post on the content hashing pattern used for change detection
- Coverage Enforcement - enforces line, branch, and function coverage targets before quality scoring begins
- Untestable Detection - identifies code that cannot be meaningfully tested, preventing false quality gaps
Need Help?
Have questions or suggestions? We're here to help you get the most out of GitAuto.
Contact us with your questions or feedback!