CI Log Cleaning
GitAuto runs a multi-stage pipeline to clean CI logs before feeding them to the model. The pipeline removes non-diagnostic output from test runners, deduplicates repetitive linter warnings, strips ANSI escape codes, and reduces log verbosity. A 10,000-line raw log might shrink to 200 lines of actionable information.
Why This Exists
Raw CI logs are extraordinarily noisy. They contain ANSI color codes (\x1b[31m sequences), thousands of lines of passing test output, the same linter warning repeated for every file, progress bars, download indicators, and framework boilerplate. Cleaning the logs first means the model receives a focused signal: just the errors, relevant warnings, and failure context. This dramatically improves fix accuracy and reduces the number of iterations needed to resolve CI failures.
Why Models Struggle With Raw Logs
A 10,000-line log full of ANSI codes, progress bars, and passing test output fills the context window with noise, leaving less room for useful information. The real failure might be on line 8,743, but the model has to process thousands of irrelevant tokens to reach it. Worse, repetitive patterns (the same linter warning 50 times) mislead the model into over-prioritizing the repeated issue - it has no way to distinguish 50 identical warnings from 50 distinct problems. Benchmarks evaluate models on clean, pre-processed inputs, so models receive no training signal for extracting signal from noisy logs.
How It Works
The cleaning pipeline has 4 stages:
- Test runner noise removal - strips non-diagnostic output like warnings summaries, passed test listings, progress indicators, and timing details, keeping only failure summaries and stack traces.
- Linter warning deduplication - when the same lint rule fires on 50 files, the log keeps one example and a count instead of repeating the full warning 50 times.
- ANSI code stripping - removes all terminal escape sequences so the model sees clean text instead of interleaved control characters.
- Verbosity reduction - collapses repetitive output patterns like dependency installation logs, download progress bars, and framework boilerplate into compact summaries.
Each stage runs sequentially, and the pipeline is extensible for new CI systems and frameworks.
Related Features
- Error Baselines - separates pre-existing errors from new ones, another denoising technique
- Token Trimming - manages overall context window size after logs are cleaned
Need Help?
Have questions or suggestions? We're here to help you get the most out of GitAuto.
Contact us with your questions or feedback!