fix(MULTIVERSION-0001): surface pre-parsed config in CrawlResult to fix rules persistence

When trueref.json specifies a `folders` allowlist (e.g. ["src/"]),
shouldIndexFile() excludes trueref.json itself because it lives at the
repo root. The indexing pipeline then searches crawlResult.files for the
config file, finds nothing, and never writes rules to repository_configs.

Fix (Option B): add a `config` field to CrawlResult so LocalCrawler
returns the pre-parsed config directly. The indexing pipeline now reads
crawlResult.config first instead of scanning files[], which resolves the
regression for all repos with a folders allowlist.

- Add `config?: RepoConfig` to CrawlResult in crawler/types.ts
- Return `config` from LocalCrawler.crawlDirectory()
- Update IndexingPipeline.crawl() to propagate CrawlResult.config
- Update IndexingPipeline.run() to prefer crawlResult.config over files
- Add regression tests covering the folders-allowlist exclusion scenario

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Giancarmine Salucci
2026-03-28 17:27:53 +01:00
parent 666ec7d55f
commit cd4ea7112c
5 changed files with 157 additions and 11 deletions

View File

@@ -230,7 +230,11 @@ export class LocalCrawler {
totalFiles: filteredPaths.length,
skippedFiles: allRelPaths.length - filteredPaths.length,
branch,
commitSha
commitSha,
// Surface the pre-parsed config so the indexing pipeline can read rules
// without needing to find trueref.json inside crawledFiles (which fails
// when a `folders` allowlist excludes the repo root).
config: config ?? undefined
};
}