- Move IndexingPipeline.run() into Worker Threads via WorkerPool
- Add dedicated embedding worker thread with single model instance
- Add stage/stageDetail columns to indexing_jobs schema
- Create ProgressBroadcaster for SSE channel management
- Add SSE endpoints: GET /api/v1/jobs/:id/stream, GET /api/v1/jobs/stream
- Replace UI polling with EventSource on repo detail and admin pages
- Add concurrency settings UI and API endpoint
- Build worker entries separately via esbuild
Version jobs now write rules only to the version-specific (repo, versionId)
row. Previously every version job unconditionally wrote to the (repo, NULL)
row as well, causing whichever version indexed last to contaminate the
repo-wide rules that the context API merges into every query response.
Adds a regression test (Bug5b) that indexes the main branch, then indexes a
version with different rules, and asserts the NULL row still holds the
main-branch rules.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When trueref.json specifies a `folders` allowlist (e.g. ["src/"]),
shouldIndexFile() excludes trueref.json itself because it lives at the
repo root. The indexing pipeline then searches crawlResult.files for the
config file, finds nothing, and never writes rules to repository_configs.
Fix (Option B): add a `config` field to CrawlResult so LocalCrawler
returns the pre-parsed config directly. The indexing pipeline now reads
crawlResult.config first instead of scanning files[], which resolves the
regression for all repos with a folders allowlist.
- Add `config?: RepoConfig` to CrawlResult in crawler/types.ts
- Return `config` from LocalCrawler.crawlDirectory()
- Update IndexingPipeline.crawl() to propagate CrawlResult.config
- Update IndexingPipeline.run() to prefer crawlResult.config over files
- Add regression tests covering the folders-allowlist exclusion scenario
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add migration 0003: recreate repository_configs with nullable version_id
column and two partial unique indexes (repo-wide: version_id IS NULL,
per-version: (repository_id, version_id) WHERE version_id IS NOT NULL)
- Update schema.ts to reflect the new composite structure with uniqueIndex
partial constraints via drizzle-orm sql helper
- IndexingPipeline: parse trueref.json / context7.json after crawl, apply
excludeFiles filter before diff computation, update totalFiles accordingly
- IndexingPipeline: persist repo-wide rules (version_id=null) and
version-specific rules (when versionId set) via upsertRepoConfig helper
- Add matchesExcludePattern static helper supporting plain filename,
glob prefix (docs/legacy*), and exact path patterns
- context endpoint: split getRules into repo-wide + version-specific lookup
with dedup merge; pass versionId at call site
- Update test DB loaders to include migration 0003
- Add pipeline tests for excludeFiles, repo-wide rules persistence, and
per-version rules persistence
- Add integration tests for merged rules, repo-only rules, and dedup logic
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bug 1: Thread version tag from run() into crawl() via getVersionTag() helper so
LocalCrawler and GithubCrawler receive the correct ref when indexing a named
version instead of always crawling HEAD.
Bug 2: Return HTTP 404 with code VERSION_NOT_FOUND when a requested version tag
is not found in repository_versions, instead of silently falling back to a
cross-version mixed result set.
Bug 4: Before returning 404, attempt a commit_hash prefix match (min 7 chars)
so callers can request a version by full or short SHA.
Bug 3: Change HybridSearchService.search() to return
{ results, searchModeUsed } and propagate searchModeUsed through
ContextResponseMetadata and ContextJsonResponseDto so callers can see which
strategy (keyword / semantic / hybrid / keyword_fallback) was actually used.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add updateVersion() helper to IndexingPipeline that writes to repository_versions
- Set version state to indexing/indexed/error at the appropriate pipeline stages
- Add computeVersionStats() to count snippets for a specific version
- Replace Map<string,string> with Record<string,string|undefined> for activeVersionJobs to fix Svelte 5 reactivity edge cases
- Remove premature loadVersions() call from handleIndexVersion (oncomplete fires it instead)
- Add refreshRepo() to version oncomplete callback so stat badges update after indexing
- Disable Index button when activeVersionJobs has an entry for that tag (not just version.state)
- Add three pipeline test cases covering versionId indexing, error, and no-touch paths
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Wire local embedding provider as the default on startup when no profile is configured
- Refactor embedding settings into dedicated service, DTOs, mappers and models
- Rebuild settings page with profile management UI and live test feedback
- Expose index summary (indexed versions + embedding count) on repo endpoints
- Harden indexing pipeline and context search with additional test coverage
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace ad-hoc inline row casting (snake_case → camelCase) spread across
services, routes, and the indexing pipeline with explicit model classes
(Repository, IndexingJob, RepositoryVersion, Snippet, SearchResult) and
dedicated mapper classes that own the DB → domain conversion.
- Add src/lib/server/models/ with typed model classes for all domain entities
- Add src/lib/server/mappers/ with mapper classes per entity
- Remove duplicated RawRow interfaces and inline map functions from
job-queue, repository.service, indexing.pipeline, and all API routes
- Add dtoJsonResponse helper to standardise JSON responses via SvelteKit json()
- Add api-contract.integration.test.ts as a regression baseline
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- computeDiff classifies files into added/modified/deleted/unchanged buckets
- Only changed and new files are parsed and re-embedded on re-runs
- Deleted files removed atomically from DB
- Progress counts all files including unchanged for accurate reporting
- ~20x speedup for re-indexing large repositories with few changes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the end-to-end indexing pipeline with a SQLite-backed job
queue, startup recovery, and REST API endpoints for job status.
- IndexingPipeline: orchestrates crawl → parse → atomic replace → embed
→ repo stats update with progress tracking at each stage
- JobQueue: sequential SQLite-backed queue (no external broker), deduplicates
active jobs per repository, drains queued jobs on startup
- startup.ts: stale job recovery (running→failed), repo state reset, singleton
initialization wired from hooks.server.ts
- GET /api/v1/jobs with repositoryId/status/limit filtering
- GET /api/v1/jobs/[id] single job lookup
- hooks.server.ts: initializes DB and pipeline on server start
- 18 unit tests covering queue, pipeline stages, recovery, and atomicity
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>