CI Log Deduplication
When a CI run fails, the error log is included in every API call so the model always has the failure context. If the log contains duplicate errors (e.g., 39 test files all failing with the same TypeError), those duplicates multiply token costs across every iteration. GitAuto deduplicates identical errors before sending them to the model and saves oversized logs to disk for on-demand reading.
Why This Exists
Jest and similar test runners execute each test file independently. When a shared module has an error, every test file that imports it produces an identical failure with the same stack trace. A repo with 39 test files importing one broken module generates 39 copies of the same error. Our log cleaning pipeline already stripped ANSI codes, removed node_modules from stack traces, and extracted the Jest summary section, but it treated each copy as unique. The result: a 390K character log embedded in every API call, costing hundreds of dollars on a single PR.
How It Works
The deduplication pipeline has three stages, each in its own module:
- Extract Jest summary section- pulls out the "Summary of all failing tests" block and header commands, discarding verbose per-test output
- Strip node_modules from stack traces - removes internal framework lines that add characters without useful information
- Deduplicate identical errors- groups failures by their error message and stack trace content. When multiple test files produce the same error, only one example is kept with a count (e.g., "39 tests failed with this same error")
For logs that are still large after cleaning (over 50K characters), the full log is saved to a file in the cloned repository. A 5K character preview is included in the initial message with a pointer to the full file. The agent can then read or search the file on demand instead of carrying the entire log in every API call.
Impact
In the case that triggered this feature, 39 identical Jest TypeErrors inflated a CI log to 390K characters (roughly 100K tokens). After deduplication, the same log is under 10K characters. Over 8 retry iterations, this saves millions of input tokens per PR.
Related Features
- CI Log Cleaning - the upstream cleaning pipeline that removes ANSI codes and extracts summaries
- Token Trimming - removes oldest messages when the conversation exceeds the context window
Need Help?
Have questions or suggestions? We're here to help you get the most out of GitAuto.
Contact us with your questions or feedback!