TRUEREF-0023 rewrite indexing pipeline - parallel reads - serialized writes

This commit is contained in:
Giancarmine Salucci
2026-04-02 09:49:38 +02:00
parent 9525c58e9a
commit f86be4106b
68 changed files with 5042 additions and 3131 deletions

View File

@@ -47,8 +47,8 @@ Executed in `IndexingPipeline.run()` before the crawl, when the job has a `versi
containing shell metacharacters).
3. **Path partitioning**: The changed-file list is split into `changedPaths` (added + modified
+ renamed-destination) and `deletedPaths`. `unchangedPaths` is derived as
`ancestorFilePaths changedPaths deletedPaths`.
- renamed-destination) and `deletedPaths`. `unchangedPaths` is derived as
`ancestorFilePaths changedPaths deletedPaths`.
4. **Guard**: Returns `null` when no indexed ancestor exists, when the ancestor has no indexed
documents, or when all files changed (nothing to clone).
@@ -74,18 +74,18 @@ matching files are returned. This minimises GitHub API requests and local I/O.
## API Surface Changes
| Symbol | Location | Change |
|---|---|---|
| `buildDifferentialPlan` | `pipeline/differential-strategy.ts` | **New** — async function |
| `DifferentialPlan` | `pipeline/differential-strategy.ts` | **New** — interface |
| `findBestAncestorVersion` | `utils/tag-order.ts` | **New** — pure function |
| `fetchGitHubChangedFiles` | `crawler/github-compare.ts` | **New** — async function |
| `getChangedFilesBetweenRefs` | `utils/git.ts` | **New** — sync function (uses `execFileSync`) |
| `ChangedFile` | `crawler/types.ts` | **New** — interface |
| `CrawlOptions.allowedPaths` | `crawler/types.ts` | **New** — optional field |
| `IndexingPipeline.crawl()` | `pipeline/indexing.pipeline.ts` | **Modified** — added `allowedPaths` param |
| `IndexingPipeline.cloneFromAncestor()` | `pipeline/indexing.pipeline.ts` | **New** — private method |
| `IndexingPipeline.run()` | `pipeline/indexing.pipeline.ts` | **Modified** — Stage 0 added |
| Symbol | Location | Change |
| -------------------------------------- | ----------------------------------- | --------------------------------------------- |
| `buildDifferentialPlan` | `pipeline/differential-strategy.ts` | **New** — async function |
| `DifferentialPlan` | `pipeline/differential-strategy.ts` | **New** — interface |
| `findBestAncestorVersion` | `utils/tag-order.ts` | **New** — pure function |
| `fetchGitHubChangedFiles` | `crawler/github-compare.ts` | **New** — async function |
| `getChangedFilesBetweenRefs` | `utils/git.ts` | **New** — sync function (uses `execFileSync`) |
| `ChangedFile` | `crawler/types.ts` | **New** — interface |
| `CrawlOptions.allowedPaths` | `crawler/types.ts` | **New** — optional field |
| `IndexingPipeline.crawl()` | `pipeline/indexing.pipeline.ts` | **Modified** — added `allowedPaths` param |
| `IndexingPipeline.cloneFromAncestor()` | `pipeline/indexing.pipeline.ts` | **New** — private method |
| `IndexingPipeline.run()` | `pipeline/indexing.pipeline.ts` | **Modified** — Stage 0 added |
---