Commit Graph

4 Commits

Author SHA1 Message Date
Giancarmine Salucci
cd4ea7112c fix(MULTIVERSION-0001): surface pre-parsed config in CrawlResult to fix rules persistence
When trueref.json specifies a `folders` allowlist (e.g. ["src/"]),
shouldIndexFile() excludes trueref.json itself because it lives at the
repo root. The indexing pipeline then searches crawlResult.files for the
config file, finds nothing, and never writes rules to repository_configs.

Fix (Option B): add a `config` field to CrawlResult so LocalCrawler
returns the pre-parsed config directly. The indexing pipeline now reads
crawlResult.config first instead of scanning files[], which resolves the
regression for all repos with a folders allowlist.

- Add `config?: RepoConfig` to CrawlResult in crawler/types.ts
- Return `config` from LocalCrawler.crawlDirectory()
- Update IndexingPipeline.crawl() to propagate CrawlResult.config
- Update IndexingPipeline.run() to prefer crawlResult.config over files
- Add regression tests covering the folders-allowlist exclusion scenario

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 17:27:53 +01:00
Giancarmine Salucci
5a3c27224d chore(FEEDBACK-0001): linting 2026-03-27 02:23:01 +01:00
Giancarmine Salucci
59628dd408 feat(crawler): ignore .gitingore files and folders, fallback to common ignored deps 2026-03-25 15:10:44 +01:00
Giancarmine Salucci
1c15d6c474 feat(TRUEREF-0003-0004): implement GitHub and local filesystem crawlers
- GitHub crawler with rate limiting, semaphore concurrency, retry logic
- File filtering by extension, size, and trueref.json rules
- Local filesystem crawler with SHA-256 checksums and progress callbacks
- Shared types and file filter logic between both crawlers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 09:06:07 +01:00