Compare commits

...

19 Commits

Author SHA1 Message Date
b386904303 feature/TRUEREF-0023_libsql_vector_search' 2026-04-02 09:50:25 +02:00
Giancarmine Salucci
f86be4106b TRUEREF-0023 rewrite indexing pipeline - parallel reads - serialized writes 2026-04-02 09:49:38 +02:00
Giancarmine Salucci
9525c58e9a feat(TRUEREF-0023): add sqlite-vec search pipeline 2026-04-01 14:09:19 +02:00
0752636847 feature/TRUEREF-0022_worker_thread_indexing 2026-03-30 20:04:50 +02:00
Giancarmine Salucci
ec6140e3bb TRUEREF-0022 fix, more tests 2026-03-30 19:11:09 +02:00
Giancarmine Salucci
6297edf109 chore(TRUEREF-0022): fix lint errors and update architecture docs
- Fix 15 ESLint errors across pipeline workers, SSE endpoints, and UI
- Replace explicit any with proper entity types in worker entries
- Remove unused imports and variables (basename, SSEEvent, getBroadcasterFn, seedRules)
- Use empty catch clauses instead of unused error variables
- Use SvelteSet for reactive Set state in repository page
- Fix operator precedence in nullish coalescing expression
- Replace $state+$effect with $derived for concurrency input
- Use resolve() directly in href for navigation lint rule
- Update ARCHITECTURE.md and FINDINGS.md for worker-thread architecture
2026-03-30 17:28:38 +02:00
Giancarmine Salucci
7630740403 feat(TRUEREF-0022): complete iteration 0 — worker-thread indexing, parallel jobs, SSE progress
- Move IndexingPipeline.run() into Worker Threads via WorkerPool
- Add dedicated embedding worker thread with single model instance
- Add stage/stageDetail columns to indexing_jobs schema
- Create ProgressBroadcaster for SSE channel management
- Add SSE endpoints: GET /api/v1/jobs/:id/stream, GET /api/v1/jobs/stream
- Replace UI polling with EventSource on repo detail and admin pages
- Add concurrency settings UI and API endpoint
- Build worker entries separately via esbuild
2026-03-30 17:08:23 +02:00
U811073
6f3f4db19b fix(TRUEREF-0021): reduce event loop blocking, add busy_timeout, and add TRUEREF-0022 PRD 2026-03-30 15:46:15 +02:00
U811073
f4fe8c6043 feat(TRUEREF-0021): implement differential tag indexing 2026-03-30 15:46:15 +02:00
Giancarmine Salucci
e63279fcf6 improve readme, untrack agents 2026-03-29 18:35:47 +02:00
Giancarmine Salucci
a426f4305c Merge branch 'fix/MULTIVERSION-0001-trueref-config-crawl-result' 2026-03-29 12:44:47 +02:00
Giancarmine Salucci
23ea8f2b4b Merge branch 'fix/MULTIVERSION-0001-multi-version-indexing' 2026-03-29 12:44:47 +02:00
Giancarmine Salucci
0bf01e3057 last fix 2026-03-29 12:44:06 +02:00
Giancarmine Salucci
09c6f9f7c1 fix(MULTIVERSION-0001): eliminate NULL-row contamination in getRules
When a versioned query is made, getRules() now returns only the
version-specific repository_configs row. The NULL (HEAD/repo-wide)
row is no longer merged in, preventing v4 rules from bleeding into
v1/v2/v3 versioned context responses.

Tests updated to assert the isolation: versioned queries return only
their own rules row; a new test verifies that a version with no
config row returns an empty rules array even when a NULL row exists.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 11:47:31 +02:00
Giancarmine Salucci
bbc67f8064 fix(MULTIVERSION-0001): prevent version jobs from overwriting repo-wide NULL rules entry
Version jobs now write rules only to the version-specific (repo, versionId)
row. Previously every version job unconditionally wrote to the (repo, NULL)
row as well, causing whichever version indexed last to contaminate the
repo-wide rules that the context API merges into every query response.

Adds a regression test (Bug5b) that indexes the main branch, then indexes a
version with different rules, and asserts the NULL row still holds the
main-branch rules.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 01:15:58 +01:00
Giancarmine Salucci
cd4ea7112c fix(MULTIVERSION-0001): surface pre-parsed config in CrawlResult to fix rules persistence
When trueref.json specifies a `folders` allowlist (e.g. ["src/"]),
shouldIndexFile() excludes trueref.json itself because it lives at the
repo root. The indexing pipeline then searches crawlResult.files for the
config file, finds nothing, and never writes rules to repository_configs.

Fix (Option B): add a `config` field to CrawlResult so LocalCrawler
returns the pre-parsed config directly. The indexing pipeline now reads
crawlResult.config first instead of scanning files[], which resolves the
regression for all repos with a folders allowlist.

- Add `config?: RepoConfig` to CrawlResult in crawler/types.ts
- Return `config` from LocalCrawler.crawlDirectory()
- Update IndexingPipeline.crawl() to propagate CrawlResult.config
- Update IndexingPipeline.run() to prefer crawlResult.config over files
- Add regression tests covering the folders-allowlist exclusion scenario

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 17:27:53 +01:00
Giancarmine Salucci
666ec7d55f feat(MULTIVERSION-0001): wire trueref.json into pipeline + per-version rules
- Add migration 0003: recreate repository_configs with nullable version_id
  column and two partial unique indexes (repo-wide: version_id IS NULL,
  per-version: (repository_id, version_id) WHERE version_id IS NOT NULL)
- Update schema.ts to reflect the new composite structure with uniqueIndex
  partial constraints via drizzle-orm sql helper
- IndexingPipeline: parse trueref.json / context7.json after crawl, apply
  excludeFiles filter before diff computation, update totalFiles accordingly
- IndexingPipeline: persist repo-wide rules (version_id=null) and
  version-specific rules (when versionId set) via upsertRepoConfig helper
- Add matchesExcludePattern static helper supporting plain filename,
  glob prefix (docs/legacy*), and exact path patterns
- context endpoint: split getRules into repo-wide + version-specific lookup
  with dedup merge; pass versionId at call site
- Update test DB loaders to include migration 0003
- Add pipeline tests for excludeFiles, repo-wide rules persistence, and
  per-version rules persistence
- Add integration tests for merged rules, repo-only rules, and dedup logic

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:44:30 +01:00
Giancarmine Salucci
255838dcc0 fix(MULTIVERSION-0001): fix version isolation, 404 on unknown version, commit-hash lookup, and searchModeUsed
Bug 1: Thread version tag from run() into crawl() via getVersionTag() helper so
LocalCrawler and GithubCrawler receive the correct ref when indexing a named
version instead of always crawling HEAD.

Bug 2: Return HTTP 404 with code VERSION_NOT_FOUND when a requested version tag
is not found in repository_versions, instead of silently falling back to a
cross-version mixed result set.

Bug 4: Before returning 404, attempt a commit_hash prefix match (min 7 chars)
so callers can request a version by full or short SHA.

Bug 3: Change HybridSearchService.search() to return
{ results, searchModeUsed } and propagate searchModeUsed through
ContextResponseMetadata and ContextJsonResponseDto so callers can see which
strategy (keyword / semantic / hybrid / keyword_fallback) was actually used.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:31:15 +01:00
Giancarmine Salucci
417c6fd072 fix(MULTIVERSION-0001): fix version indexing pipeline state and UI reactivity
- Add updateVersion() helper to IndexingPipeline that writes to repository_versions
- Set version state to indexing/indexed/error at the appropriate pipeline stages
- Add computeVersionStats() to count snippets for a specific version
- Replace Map<string,string> with Record<string,string|undefined> for activeVersionJobs to fix Svelte 5 reactivity edge cases
- Remove premature loadVersions() call from handleIndexVersion (oncomplete fires it instead)
- Add refreshRepo() to version oncomplete callback so stat badges update after indexing
- Disable Index button when activeVersionJobs has an entry for that tag (not just version.state)
- Add three pipeline test cases covering versionId indexing, error, and no-touch paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:03:44 +01:00
102 changed files with 13435 additions and 1212 deletions

1
.github/agents vendored
View File

@@ -1 +0,0 @@
/home/moze/Sources/copilot-agents/.github/agents

1
.github/schemas vendored
View File

@@ -1 +0,0 @@
/home/moze/Sources/copilot-agents/.github/schemas

1
.github/skills vendored
View File

@@ -1 +0,0 @@
/home/moze/Sources/copilot-agents/.github/skills

5
.gitignore vendored
View File

@@ -36,3 +36,8 @@ docs/docs_cache_state.yaml
# Claude Code — ignore local/machine-specific settings, keep project rules
.claude/
!.claude/rules/
# Github Copilot
.github/agents
.github/schemas
.github/skills

170
README.md
View File

@@ -16,9 +16,12 @@ The goal is straightforward: give your assistants accurate, current, version-awa
- Stores metadata in SQLite.
- Supports keyword search out of the box with SQLite FTS5.
- Supports semantic and hybrid search when an embedding provider is configured.
- Exposes REST endpoints for library discovery and documentation retrieval.
- Supports multi-version indexing: index specific git tags independently, query a version by appending it to the library ID.
- Discovers available git tags from local repositories automatically.
- Stores per-version rules from `trueref.json` and prepends them to every `query-docs` response.
- Exposes REST endpoints for library discovery, documentation retrieval, and version management.
- Exposes an MCP server over stdio and HTTP for AI clients.
- Provides a SvelteKit web UI for repository management, search, indexing jobs, and embedding settings.
- Provides a SvelteKit web UI for repository management, version management, search, indexing jobs, and embedding settings.
- Supports repository-level configuration through `trueref.json` or `context7.json`.
## Project status
@@ -28,10 +31,12 @@ TrueRef is under active development. The current codebase already includes:
- repository management
- indexing jobs and recovery on restart
- local and GitHub crawling
- version registration support
- multi-version indexing with git tag isolation
- automatic tag discovery for local git repositories
- per-version rules from `trueref.json` prepended to context responses
- context7-compatible API endpoints
- MCP stdio and HTTP transports
- configurable embedding providers
- configurable embedding providers (none / OpenAI-compatible / local ONNX)
## Architecture
@@ -66,7 +71,15 @@ Each indexed repository becomes a library with an ID such as `/facebook/react`.
### Versions
Libraries can register version tags. Queries can target a specific version by using a library ID such as `/facebook/react/v18.3.0`.
Libraries can register version tags. Each version is indexed independently so snippets from different releases never mix.
Query a specific version by appending the tag to the library ID:
```
/facebook/react/v18.3.0
```
For local repositories, TrueRef can discover all available git tags automatically via the versions/discover endpoint. Tags can be added through the UI on the repository detail page or via the REST API.
### Snippets
@@ -76,6 +89,8 @@ Documents are split into code and info snippets. These snippets are what search
Repository rules defined in `trueref.json` are prepended to `query-docs` responses so assistants get usage constraints along with the retrieved content.
Rules are stored per version when a version-specific config is found during indexing, so different releases can carry different usage guidance.
## Requirements
- Node.js 20+
@@ -153,6 +168,12 @@ Use the main page to:
- delete an indexed repository
- monitor active indexing jobs
Open a repository's detail page to:
- view registered version tags
- discover available git tags (local repositories)
- trigger version-specific indexing jobs
### Search
Use the Search page to:
@@ -175,21 +196,40 @@ If no embedding provider is configured, TrueRef still works with FTS5-only searc
## Repository configuration
TrueRef supports a repository-local config file named `trueref.json`.
You can place a `trueref.json` file at the **root** of any repository you index. TrueRef reads it during every indexing run to control what gets indexed and what gets shown to AI assistants.
For compatibility with existing context7-style repositories, `context7.json` is also supported.
For backward compatibility with repositories that already have a `context7.json`, that file is also supported. When both files are present, `trueref.json` takes precedence.
### What the config controls
### Where to place it
- project display title
- project description
- included folders
- excluded folders
- excluded file names
- assistant-facing usage rules
- previously released versions
```
my-library/
├── trueref.json ← here, at the repository root
├── src/
├── docs/
└── ...
```
### Example `trueref.json`
For GitHub repositories, TrueRef fetches the file from the default branch root. For local repositories, it reads it from the filesystem root of the indexed folder.
### Fields
| Field | Type | Required | Description |
| ------------------ | -------- | -------- | ------------------------------------------------------------------------------------------------- |
| `$schema` | string | No | URL to the live JSON Schema for editor validation |
| `projectTitle` | string | No | Display name override (max 100 chars) |
| `description` | string | No | Library description used for search ranking (10500 chars) |
| `folders` | string[] | No | Path prefixes or regex strings to **include** (max 50 items). If absent, all folders are included |
| `excludeFolders` | string[] | No | Path prefixes or regex strings to **exclude** after the `folders` allowlist (max 50 items) |
| `excludeFiles` | string[] | No | Exact filenames to skip — no path, no glob (max 100 items) |
| `rules` | string[] | No | Best-practice rules prepended to every `query-docs` response (max 20 rules, 5500 chars each) |
| `previousVersions` | object[] | No | Version tags to register when the repository is indexed (max 50 entries) |
`previousVersions` entries each require a `tag` (e.g. `"v1.2.3"`) and a `title` (e.g. `"Version 1.2.3"`).
The parser is intentionally lenient: unknown keys are silently ignored, mistyped values are skipped with a warning, and oversized strings or arrays are truncated. Only invalid JSON or a non-object root is a hard error.
### Full example
```json
{
@@ -197,30 +237,76 @@ For compatibility with existing context7-style repositories, `context7.json` is
"projectTitle": "My Internal SDK",
"description": "Internal SDK for billing, auth, and event ingestion.",
"folders": ["src/", "docs/"],
"excludeFolders": ["tests/", "fixtures/", "node_modules/"],
"excludeFiles": ["CHANGELOG.md"],
"excludeFolders": ["tests/", "fixtures/", "node_modules/", "__mocks__/"],
"excludeFiles": ["CHANGELOG.md", "jest.config.ts"],
"rules": [
"Prefer named imports over wildcard imports.",
"Use the async client API for all network calls."
"Use the async client API for all network calls.",
"Never import from internal sub-paths — use the package root only."
],
"previousVersions": [
{
"tag": "v1.2.3",
"title": "Version 1.2.3"
}
{ "tag": "v2.0.0", "title": "Version 2.0.0" },
{ "tag": "v1.2.3", "title": "Version 1.2.3 (legacy)" }
]
}
```
### JSON schema
### How `folders` and `excludeFolders` are matched
You can point your editor to the live schema served by TrueRef:
Both fields accept strings that are matched against the full relative file path within the repository. A string is treated as a path prefix unless it starts with `^`, in which case it is compiled as a regex:
```text
```json
{
"folders": ["src/", "docs/", "^packages/core"],
"excludeFolders": ["src/internal/", "__tests__"]
}
```
- `"src/"` — includes any file whose path starts with `src/`
- `"^packages/core"` — regex, includes only `packages/core` not `packages/core-utils`
`excludeFolders` is applied **after** the `folders` allowlist, so you can narrow a broad include with a targeted exclude.
### How `rules` are used
Rules are stored in the database at index time and automatically prepended to every `query-docs` response for that library (and version). This means AI assistants receive them alongside the retrieved snippets without any extra configuration.
When a version is indexed, the rules from the config found at that version's checkout are stored separately. Different version tags can therefore carry different rules.
Example context response with rules prepended:
```
RULES:
- Prefer named imports over wildcard imports.
- Use the async client API for all network calls.
LIBRARY DOCUMENTATION:
...
```
### How `previousVersions` works
When TrueRef indexes a repository and finds `previousVersions`, it registers those tags in the versions table. The tags are then available for version-specific indexing and queries without any further manual registration.
This is useful when you want all historical releases available from a fresh TrueRef setup without manually triggering one indexing job per version.
### JSON Schema for editor support
TrueRef serves a live JSON Schema at:
```
http://localhost:5173/api/v1/schema/trueref-config.json
```
That enables validation and autocomplete in editors that support JSON Schema references.
Add it to your `trueref.json` via the `$schema` field to get inline validation and autocomplete in VS Code, IntelliJ, and any other editor that supports JSON Schema Draft 07:
```json
{
"$schema": "http://localhost:5173/api/v1/schema/trueref-config.json"
}
```
If you are running TrueRef on a server, replace `localhost:5173` with your actual host and port. The schema endpoint always reflects the version of TrueRef you are running.
## REST API
@@ -299,6 +385,36 @@ curl "http://localhost:5173/api/v1/jobs"
curl "http://localhost:5173/api/v1/jobs/<job-id>"
```
### Version management
List registered versions for a library:
```sh
curl "http://localhost:5173/api/v1/libs/%2Ffacebook%2Freact/versions"
```
Index a specific version tag:
```sh
curl -X POST "http://localhost:5173/api/v1/libs/%2Ffacebook%2Freact/versions/v18.3.0/index"
```
Discover available git tags (local repositories only):
```sh
curl -X POST "http://localhost:5173/api/v1/libs/%2Fpath%2Fto%2Fmy-lib/versions/discover"
```
Returns `{ "tags": [{ "tag": "v1.0.0", "commitHash": "abc123" }, ...] }`. Returns an empty array for GitHub repositories.
### Version-targeted context retrieval
Append the version tag to the library ID to retrieve snippets from a specific indexed version:
```sh
curl "http://localhost:5173/api/v1/context?libraryId=/facebook/react/v18.3.0&query=how%20to%20use%20useEffect&type=txt"
```
### Response formats
The two search endpoints support:

View File

@@ -1,141 +1,168 @@
# Architecture
Last Updated: 2026-03-27T00:24:13.000Z
Last Updated: 2026-04-01T12:05:23.000Z
## Overview
TrueRef is a TypeScript-first, self-hosted documentation retrieval platform built on SvelteKit. The repository contains a Node-targeted web application, a REST API, a Model Context Protocol server, and a server-side indexing pipeline backed by SQLite via better-sqlite3 and Drizzle ORM.
TrueRef is a TypeScript-first, self-hosted documentation retrieval platform built on SvelteKit. The repository contains a Node-targeted web application, a REST API, a Model Context Protocol server, and a worker-threaded indexing pipeline backed by SQLite via better-sqlite3, Drizzle ORM, FTS5, and sqlite-vec.
- Primary language: TypeScript (110 files) with a small amount of JavaScript configuration (2 files)
- Application type: Full-stack SvelteKit application with server-side indexing and retrieval services
- Primary language: TypeScript (147 `.ts` files) with a small amount of JavaScript configuration and build code (2 `.js` files), excluding generated output and dependencies
- Application type: Full-stack SvelteKit application with server-side indexing, retrieval, and MCP integration
- Runtime framework: SvelteKit with adapter-node
- Storage: SQLite with Drizzle-managed schema plus hand-written FTS5 setup
- Testing: Vitest with separate client and server projects
- Storage: SQLite in WAL mode with Drizzle-managed relational schema, FTS5 full-text indexes, and sqlite-vec virtual tables for vector lookup
- Concurrency: Node.js `worker_threads` for parse, embed, and auxiliary write-worker infrastructure
- Testing: Vitest for unit and integration coverage
## Project Structure
- src/routes: SvelteKit pages and HTTP endpoints, including the public UI and /api/v1 surface
- src/lib/server: Backend implementation grouped by concern: api, config, crawler, db, embeddings, mappers, models, parser, pipeline, search, services, utils
- src/mcp: Standalone MCP server entry point and tool handlers
- static: Static assets such as robots.txt
- docs/features: Feature-level implementation notes and product documentation
- build: Generated SvelteKit output
- `src/routes`: SvelteKit pages and HTTP endpoints, including the public UI and `/api/v1` surface
- `src/lib/server`: Backend implementation grouped by concern: `api`, `config`, `crawler`, `db`, `embeddings`, `mappers`, `models`, `parser`, `pipeline`, `search`, `services`, `utils`
- `src/mcp`: Standalone MCP server entry point, client, tests, and tool handlers
- `scripts`: Build helpers, including worker bundling
- `static`: Static assets such as `robots.txt`
- `docs/features`: Feature-level implementation notes and product documentation
- `build`: Generated SvelteKit output and bundled worker entrypoints
## Key Directories
### src/routes
### `src/routes`
Contains the UI entry points and API routes. The API tree under src/routes/api/v1 is the public HTTP contract for repository management, indexing jobs, search/context retrieval, settings, filesystem browsing, and JSON schema discovery.
Contains the UI entry points and API routes. The API tree under `src/routes/api/v1` is the public HTTP contract for repository management, version discovery, indexing jobs, search/context retrieval, embedding settings, indexing settings, filesystem browsing, worker-status inspection, and SSE progress streaming.
### src/lib/server/db
### `src/lib/server/db`
Owns SQLite schema definitions, migration bootstrapping, and FTS initialization. Database startup runs through initializeDatabase(), which executes Drizzle migrations and then applies FTS5 SQL that cannot be expressed directly in the ORM.
Owns SQLite schema definitions, relational migrations, connection bootstrapping, and sqlite-vec loading. Database startup goes through `initializeDatabase()` and `getClient()`, both of which configure WAL-mode pragmas and ensure sqlite-vec is loaded on each connection before vector-backed queries run.
### src/lib/server/pipeline
### `src/lib/server/search`
Coordinates crawl, parse, chunk, store, and optional embedding generation work. Startup recovery marks stale jobs as failed, resets repositories stuck in indexing state, initializes singleton queue/pipeline instances, and drains queued work after restart.
Implements keyword, vector, and hybrid retrieval. Keyword search uses SQLite FTS5 and BM25-style ranking. Vector search uses `SqliteVecStore` to maintain per-profile sqlite-vec `vec0` tables plus rowid mapping tables, and hybrid search blends FTS and vector candidates through reciprocal rank fusion.
### src/lib/server/search
### `src/lib/server/pipeline`
Implements keyword, vector, and hybrid retrieval. The keyword path uses SQLite FTS5 and BM25; the hybrid path blends FTS and vector search with reciprocal rank fusion.
Coordinates crawl, diff, parse, store, embed, and job-state broadcasting. The pipeline module consists of:
### src/lib/server/crawler and src/lib/server/parser
- `IndexingPipeline`: orchestrates crawl, diff, parse, transactional replacement, optional embedding generation, and repository statistics updates
- `WorkerPool`: manages parse workers, an optional embed worker, an optional write worker, per-repository-and-version serialization, worker respawn, and runtime concurrency changes
- `worker-entry.ts`: parse worker that opens its own `better-sqlite3` connection, runs the indexing pipeline, and reports progress back to the parent
- `embed-worker-entry.ts`: embedding worker that loads the active profile, creates an `EmbeddingService`, and generates vectors after parse completion
- `write-worker-entry.ts`: batch-write worker with a `write`/`write_ack`/`write_error` message protocol for document and snippet persistence
- `progress-broadcaster.ts`: server-side pub/sub for per-job, per-repository, global, and worker-status SSE streams
- `startup.ts`: recovers stale jobs, constructs singleton queue/pipeline/pool/broadcaster instances, loads concurrency settings, and drains queued work after restart
- `worker-types.ts`: shared TypeScript discriminated unions for parse, embed, and write worker protocols
Convert GitHub repositories and local folders into normalized snippet records. Crawlers fetch repository contents, parsers split Markdown, code, config, HTML-like, and plain-text files into chunks, and downstream services persist searchable content.
### `src/lib/server/crawler` and `src/lib/server/parser`
### src/mcp
Convert GitHub repositories and local folders into normalized snippet records. Crawlers fetch repository contents and configuration, parsers split Markdown, code, config, HTML-like, and plain-text files into searchable snippet records, and downstream services persist searchable content and embeddings.
Provides a thin compatibility layer over the HTTP API. The MCP server exposes resolve-library-id and query-docs over stdio or HTTP and forwards work to local tool handlers.
### `src/mcp`
Provides a thin compatibility layer over the HTTP API. The MCP server exposes `resolve-library-id` and `query-docs` over stdio or HTTP and forwards work to local handlers that reuse the application retrieval stack.
## Design Patterns
- No explicit design patterns detected from semantic analysis.
- The implementation does consistently use service classes such as RepositoryService, SearchService, and HybridSearchService for business logic.
- Mapping and entity layers separate raw database rows from domain objects through mapper/entity pairs such as RepositoryMapper and RepositoryEntity.
- Pipeline startup uses module-level singleton state for JobQueue and IndexingPipeline lifecycle management.
- **Service layer**: business logic lives in classes such as `RepositoryService`, `VersionService`, `SearchService`, `HybridSearchService`, and `EmbeddingService`
- **Factory pattern**: embedding providers are created from persisted profile records through registry/factory helpers
- **Mapper/entity separation**: mappers translate between raw database rows and domain entities such as `RepositoryEntity`, `RepositoryVersionEntity`, and `EmbeddingProfileEntity`
- **Module-level singletons**: pipeline startup owns lifecycle for `JobQueue`, `IndexingPipeline`, `WorkerPool`, and `ProgressBroadcaster`, with accessor functions for route handlers
- **Pub/sub**: `ProgressBroadcaster` maintains job, repository, global, and worker-status subscriptions for SSE delivery
- **Discriminated unions**: worker message protocols use a `type` field for type-safe parent/worker communication
## Key Components
### SvelteKit server bootstrap
src/hooks.server.ts initializes the database, loads persisted embedding configuration, creates the optional EmbeddingService, starts the indexing pipeline, and applies CORS headers to all /api routes.
`src/hooks.server.ts` initializes the relational database, opens the shared raw SQLite client, loads the default embedding profile, creates the optional `EmbeddingService`, reads indexing concurrency from the `settings` table, and initializes the queue/pipeline/worker infrastructure.
### Database layer
src/lib/server/db/schema.ts defines repositories, repository_versions, documents, snippets, embedding_profiles, snippet_embeddings, indexing_jobs, repository_configs, and settings. This schema models the indexed library catalog, retrieval corpus, embedding state, and job tracking.
`src/lib/server/db/schema.ts` defines repositories, repository versions, documents, snippets, embedding profiles, relational embedding metadata, indexing jobs, repository configs, and generic settings. Relational embedding rows keep canonical model metadata and raw float buffers, while sqlite-vec virtual tables are managed separately per profile through `SqliteVecStore`.
### sqlite-vec integration
`src/lib/server/db/sqlite-vec.ts` centralizes sqlite-vec loading and deterministic per-profile table naming. `SqliteVecStore` creates `vec0` tables plus rowid mapping tables, backfills missing rows from `snippet_embeddings`, removes stale vector references, and executes nearest-neighbor queries constrained by repository, optional version, and profile.
### Retrieval API
src/routes/api/v1/context/+server.ts validates input, resolves repository and optional version IDs, chooses keyword, semantic, or hybrid retrieval, applies token budgeting that skips oversized snippets instead of stopping early, prepends repository rules, and formats JSON or text responses with repository and version metadata.
`src/routes/api/v1/context/+server.ts` validates input, resolves repository and optional version scope, chooses keyword, semantic, or hybrid retrieval, applies token budgeting, and formats JSON or text responses. `/api/v1/libs/search` handles repository-level lookup, while MCP tool handlers expose the same retrieval behavior over stdio or HTTP transports.
### Search engine
src/lib/server/search/search.service.ts preprocesses raw user input into FTS5-safe MATCH expressions before keyword search and repository lookup. src/lib/server/search/hybrid.search.service.ts supports explicit keyword, semantic, and hybrid modes, falls back to vector retrieval when FTS yields no candidates and an embedding provider is configured, and uses reciprocal rank fusion for blended ranking.
`SearchService` preprocesses raw user input into FTS5-safe expressions before keyword search. `HybridSearchService` supports explicit keyword, semantic, and hybrid modes, falls back to vector retrieval when keyword search yields no candidates and an embedding provider is configured, and uses reciprocal rank fusion to merge ranked lists. `VectorSearch` delegates KNN execution to `SqliteVecStore` instead of doing brute-force in-memory cosine scoring.
### Repository management
### Repository and version management
src/lib/server/services/repository.service.ts provides CRUD and statistics for indexed repositories, including canonical ID generation for GitHub and local sources.
`RepositoryService` and `VersionService` provide CRUD, indexing-status, cleanup, and statistics logic for indexed repositories and tagged versions, including sqlite-vec cleanup when repository-scoped or version-scoped content is removed.
### MCP surface
### Worker-threaded indexing
src/mcp/index.ts creates the MCP server, registers the two supported tools, and exposes them over stdio or streamable HTTP.
The active indexing path is parse-worker-first: queued jobs are dispatched to parse workers, progress is written to SQLite and broadcast over SSE, and successful parse completion can enqueue embedding work on the dedicated embed worker. The worker pool also exposes status snapshots through `/api/v1/workers`. Write-worker infrastructure exists in the current architecture and is bundled at build time, but parse/embed flow remains the primary live path described by `IndexingPipeline` and `WorkerPool`.
### SSE streaming and job control
`progress-broadcaster.ts` provides real-time Server-Sent Event streaming of indexing progress. Route handlers under `/api/v1/jobs/stream` and `/api/v1/jobs/[id]/stream` expose SSE endpoints, and `/api/v1/workers` exposes worker-pool status. Job control endpoints support pause, resume, and cancel transitions backed by SQLite job state.
### Indexing settings
`/api/v1/settings/indexing` exposes GET and PUT for indexing concurrency. The value is persisted in the `settings` table and applied live to the `WorkerPool` through `setMaxConcurrency()`.
## Dependencies
### Production
- @modelcontextprotocol/sdk: MCP server transport and protocol types
- @xenova/transformers: local embedding support
- better-sqlite3: synchronous SQLite driver
- zod: runtime input validation for MCP tools and server helpers
- `@modelcontextprotocol/sdk`: MCP server transport and protocol types
- `@xenova/transformers`: local embedding support
- `better-sqlite3`: synchronous SQLite driver used by the main app and workers
- `sqlite-vec`: SQLite vector extension used for `vec0` storage and nearest-neighbor queries
- `zod`: runtime validation for MCP tools and server helpers
### Development
- @sveltejs/kit and @sveltejs/adapter-node: application framework and Node deployment target
- drizzle-kit and drizzle-orm: schema management and typed database access
- vite and @tailwindcss/vite: bundling and Tailwind integration
- vitest and @vitest/browser-playwright: server and browser test execution
- eslint, typescript-eslint, eslint-plugin-svelte, prettier, prettier-plugin-svelte, prettier-plugin-tailwindcss: linting and formatting
- typescript and @types/node: type-checking and Node typings
- `@sveltejs/kit` and `@sveltejs/adapter-node`: application framework and Node deployment target
- `drizzle-kit` and `drizzle-orm`: schema management and typed database access
- `esbuild`: worker entrypoint bundling into `build/workers`
- `vite` and `@tailwindcss/vite`: application bundling and Tailwind integration
- `vitest` and `@vitest/browser-playwright`: server and browser test execution
- `eslint`, `typescript-eslint`, `eslint-plugin-svelte`, `prettier`, `prettier-plugin-svelte`, `prettier-plugin-tailwindcss`: linting and formatting
- `typescript` and `@types/node`: type-checking and Node typings
## Module Organization
The backend is organized by responsibility rather than by route. HTTP handlers in src/routes/api/v1 are intentionally thin and delegate to library modules in src/lib/server. Within src/lib/server, concerns are separated into:
The backend is organized by responsibility rather than by route. HTTP handlers under `src/routes/api/v1` are intentionally thin and delegate to modules in `src/lib/server`. Within `src/lib/server`, concerns are separated into:
- models and mappers for entity translation
- services for repository/version operations
- search for retrieval strategies
- crawler and parser for indexing input transformation
- pipeline for orchestration and job execution
- embeddings for provider abstraction and embedding generation
- api and utils for response formatting, validation, and shared helpers
- `models` and `mappers` for entity translation
- `services` for repository/version operations
- `search` for keyword, vector, and hybrid retrieval strategies
- `crawler` and `parser` for indexing input transformation
- `pipeline` for orchestration, workers, and job execution
- `embeddings` for provider abstraction and vector generation
- `db`, `api`, `config`, and `utils` for persistence, response formatting, validation, and shared helpers
The frontend and backend share the same SvelteKit repository, but most non-UI behavior is implemented on the server side.
The frontend and backend live in the same SvelteKit repository, but most non-UI behavior is implemented on the server side.
## Data Flow
### Indexing flow
1. Server startup runs initializeDatabase() and initializePipeline() from src/hooks.server.ts.
2. The pipeline recovers stale jobs, initializes crawler/parser infrastructure, and resumes queued work.
3. Crawlers ingest GitHub or local repository contents.
4. Parsers split files into document and snippet records with token counts and metadata.
5. Database modules persist repositories, documents, snippets, versions, configs, and job state.
6. If an embedding provider is configured, embedding services generate vectors for snippet search.
1. Server startup runs database initialization, opens the shared client, loads sqlite-vec, and initializes the pipeline singletons.
2. Startup recovery marks interrupted jobs as failed, resets repositories stuck in `indexing`, reads persisted concurrency settings, and drains queued jobs.
3. `JobQueue` dispatches eligible work to the `WorkerPool`, which serializes by `(repositoryId, versionId)` and posts jobs to idle parse workers.
4. Each parse worker opens its own SQLite connection, crawls the source, computes differential work, parses files into snippets, and persists replacement data through the indexing pipeline.
5. The parent thread updates job progress in SQLite and broadcasts SSE progress and worker-status events.
6. If an embedding provider is configured, the completed parse job triggers embed work that stores canonical embedding blobs and synchronizes sqlite-vec profile tables for nearest-neighbor lookup.
7. Repository/version statistics and job status are finalized in SQLite, and control endpoints can pause, resume, or cancel subsequent queued work.
### Retrieval flow
1. Clients call /api/v1/libs/search, /api/v1/context, or the MCP tools.
2. Route handlers validate input and load the SQLite client.
3. Keyword search uses FTS5 via SearchService; hybrid search optionally adds vector results via HybridSearchService.
4. Query preprocessing normalizes punctuation-heavy or code-like input before FTS search, while semantic mode bypasses FTS and auto or hybrid mode can fall back to vector retrieval when keyword search produces no candidates.
5. Token budgeting walks ranked snippets in order and skips individual over-budget snippets so later matches can still be returned.
6. Formatters emit repository and version metadata in JSON responses and origin-aware or explicit no-result text output for plain-text responses.
7. MCP handlers expose the same retrieval behavior over stdio or HTTP transports.
1. Clients call `/api/v1/libs/search`, `/api/v1/context`, or the MCP tools.
2. Route handlers validate input and use the shared SQLite client.
3. Keyword search uses FTS5 through `SearchService`; semantic search uses sqlite-vec KNN through `VectorSearch`; hybrid search merges both paths with reciprocal rank fusion.
4. Retrieval is scoped by repository and optional version, and semantic/hybrid paths can fall back when keyword search yields no usable candidates.
5. Token budgeting selects ranked snippets for the response formatter, which emits repository-aware JSON or text payloads.
## Build System
- Build command: npm run build
- Test command: npm run test
- Primary local run command from package.json: npm run dev
- MCP entry points: npm run mcp:start and npm run mcp:http
- Build command: `npm run build` (runs `vite build` then `node scripts/build-workers.mjs`)
- Worker bundling: `scripts/build-workers.mjs` uses esbuild to compile `worker-entry.ts`, `embed-worker-entry.ts`, and `write-worker-entry.ts` into `build/workers/` as ESM bundles
- Test command: `npm test`
- Primary local run command: `npm run dev`
- MCP entry points: `npm run mcp:start` and `npm run mcp:http`

View File

@@ -1,25 +1,28 @@
# Findings
Last Updated: 2026-03-27T00:24:13.000Z
Last Updated: 2026-04-01T12:05:23.000Z
## Initializer Summary
- JIRA: FEEDBACK-0001
- JIRA: TRUEREF-0023
- Refresh mode: REFRESH_IF_REQUIRED
- Result: refreshed affected documentation only. ARCHITECTURE.md and FINDINGS.md were updated from current repository analysis; CODE_STYLE.md remained trusted and unchanged because the documented conventions still match the codebase.
- Result: Refreshed ARCHITECTURE.md and FINDINGS.md. CODE_STYLE.md remained trusted — sqlite-vec, worker-status, and write-worker additions follow the established conventions already documented.
## Research Performed
- Discovered source-language distribution, dependency manifest, import patterns, and project structure.
- Read the retrieval, formatter, token-budget, parser, mapper, and response-model modules affected by the latest implementation changes.
- Compared the trusted cache state with current behavior to identify which documentation files were actually stale.
- Confirmed package scripts for build and test.
- Confirmed Linux-native md5sum availability for documentation trust metadata.
- Counted 149 TypeScript/JavaScript source files in the repository-wide scan and verified the live, non-generated source mix as 147 `.ts` files and 2 `.js` files.
- Read `package.json`, `.prettierrc`, and `eslint.config.js` to verify dependencies, formatting rules, and linting conventions.
- Read `sqlite-vec.ts`, `sqlite-vec.store.ts`, `vector.search.ts`, `hybrid.search.service.ts`, `schema.ts`, `client.ts`, and startup wiring to verify the accepted sqlite-vec implementation and current retrieval architecture.
- Read `worker-pool.ts`, `worker-types.ts`, `write-worker-entry.ts`, and `/api/v1/workers/+server.ts` to verify the current worker topology and status surface.
- Compared `docs/docs_cache_state.yaml` against the live docs and codebase to identify stale cache evidence and architecture drift.
- Confirmed `CODE_STYLE.md` still matches the codebase: tabs, single quotes, `trailingComma: none`, ESM imports with `node:` built-ins, flat ESLint config, and descriptive PascalCase/camelCase naming remain consistent.
## Open Questions For Planner
- Verify whether the retrieval response contract should document the new repository and version metadata fields formally in a public API reference beyond the architecture summary.
- Verify whether parser chunking should evolve further from file-level and declaration-level boundaries to member-level semantic chunks for class-heavy codebases.
- Verify whether the write-worker protocol should become part of the active indexing flow or remain documented as optional infrastructure only.
- Verify whether worker-status and SSE event payloads should be documented in a dedicated API reference for external consumers.
- Verify whether sqlite-vec operational details such as per-profile vec-table lifecycle and backfill behavior should move into a separate persistence document if the subsystem grows further.
- Assess whether the WorkerPool fallback mode (main-thread execution when worker scripts are missing) still belongs in the runtime contract or should be removed in favour of a hard build requirement.
## Planner Notes Template
@@ -33,6 +36,41 @@ Add subsequent research below this section.
- Findings:
- Risks / follow-ups:
### 2026-04-01 — TRUEREF-0023 initializer refresh audit
- Task: Refresh only stale or invalid documentation after the accepted sqlite-vec implementation.
- Files inspected:
- `docs/docs_cache_state.yaml`
- `docs/ARCHITECTURE.md`
- `docs/CODE_STYLE.md`
- `docs/FINDINGS.md`
- `package.json`
- `.prettierrc`
- `eslint.config.js`
- `src/hooks.server.ts`
- `src/lib/server/db/client.ts`
- `src/lib/server/db/schema.ts`
- `src/lib/server/db/sqlite-vec.ts`
- `src/lib/server/search/sqlite-vec.store.ts`
- `src/lib/server/search/vector.search.ts`
- `src/lib/server/search/hybrid.search.service.ts`
- `src/lib/server/pipeline/startup.ts`
- `src/lib/server/pipeline/worker-pool.ts`
- `src/lib/server/pipeline/worker-types.ts`
- `src/lib/server/pipeline/write-worker-entry.ts`
- `src/routes/api/v1/workers/+server.ts`
- `scripts/build-workers.mjs`
- Findings:
- The trusted cache metadata was no longer reliable as evidence for planning: `docs/docs_cache_state.yaml` still referenced 2026-03-27 hashes while `ARCHITECTURE.md` and `FINDINGS.md` had been edited later.
- `ARCHITECTURE.md` was stale. It still described only parse and embed worker concurrency, omitted the `sqlite-vec` production dependency, and did not document the current per-profile vec-table storage layer, worker-status endpoint, or write-worker infrastructure.
- The current retrieval stack uses sqlite-vec concretely: `loadSqliteVec()` bootstraps connections, `SqliteVecStore` manages vec0 tables plus rowid mapping tables, and `VectorSearch` delegates nearest-neighbor lookup to that store instead of brute-force scoring.
- The worker architecture now includes parse, embed, and write worker protocols in `worker-types.ts`, build-time bundling for all three entries, and a `/api/v1/workers` route that returns `WorkerPool` status snapshots.
- `CODE_STYLE.md` remained valid and did not require refresh. The observed source and config files still use tabs, single quotes, `trailingComma: none`, flat ESLint config, ESM imports, PascalCase class names, and camelCase helpers exactly as already documented.
- `FINDINGS.md` itself was stale because the initializer summary still referred to `TRUEREF-0022` instead of the requested `TRUEREF-0023` refresh.
- Risks / follow-ups:
- The write-worker protocol exists and is bundled, but the active indexing path is still centered on parse plus optional embed flow. Future documentation should keep distinguishing implemented infrastructure from the currently exercised path.
- Cache validity should continue to be driven by deterministic hash evidence rather than document timestamps or trust text alone.
### 2026-03-27 — FEEDBACK-0001 initializer refresh audit
- Task: Refresh only stale documentation after changes to retrieval, formatters, token budgeting, and parser behavior.
@@ -188,3 +226,156 @@ Add subsequent research below this section.
- Risks / follow-ups:
- The fix should preserve the existing `/repos/[id]` route shape instead of redesigning it to a rest route unless a broader navigation contract change is explicitly requested.
- Any normalization helper introduced for the repo detail page should be reused consistently across server load and client event handlers to avoid mixed encoded and decoded repository IDs during navigation and fetches.
### 2026-04-01 — TRUEREF-0023 sqlite-vec replanning research
- Task: Replan the rejected libSQL-native vector iteration around sqlite-vec using the current worktree and verified runtime constraints.
- Files inspected:
- `package.json`
- `docs/docs_cache_state.yaml`
- `prompts/TRUEREF-0023/prompt.yaml`
- `prompts/TRUEREF-0023/progress.yaml`
- `prompts/TRUEREF-0023/iteration_0/review_report.yaml`
- `prompts/TRUEREF-0023/iteration_0/plan.md`
- `prompts/TRUEREF-0023/iteration_0/tasks.yaml`
- `src/lib/server/db/client.ts`
- `src/lib/server/db/index.ts`
- `src/lib/server/db/schema.ts`
- `src/lib/server/db/fts.sql`
- `src/lib/server/db/vectors.sql`
- `src/lib/server/db/schema.test.ts`
- `src/lib/server/search/vector.search.ts`
- `src/lib/server/search/hybrid.search.service.test.ts`
- `src/lib/server/embeddings/embedding.service.ts`
- `src/lib/server/embeddings/embedding.service.test.ts`
- `src/lib/server/pipeline/job-queue.ts`
- `src/lib/server/pipeline/progress-broadcaster.ts`
- `src/lib/server/pipeline/progress-broadcaster.test.ts`
- `src/lib/server/pipeline/worker-pool.ts`
- `src/lib/server/pipeline/worker-entry.ts`
- `src/lib/server/pipeline/embed-worker-entry.ts`
- `src/lib/server/pipeline/worker-types.ts`
- `src/lib/server/pipeline/startup.ts`
- `src/lib/server/pipeline/indexing.pipeline.ts`
- `src/lib/server/pipeline/indexing.pipeline.test.ts`
- `src/routes/api/v1/jobs/+server.ts`
- `src/routes/api/v1/jobs/stream/+server.ts`
- `src/routes/api/v1/jobs/[id]/stream/+server.ts`
- `src/routes/api/v1/sse-and-settings.integration.test.ts`
- `src/routes/admin/jobs/+page.svelte`
- `src/lib/components/IndexingProgress.svelte`
- `src/lib/components/admin/JobStatusBadge.svelte`
- `src/lib/components/admin/JobSkeleton.svelte`
- `src/lib/components/admin/Toast.svelte`
- `src/lib/components/admin/WorkerStatusPanel.svelte`
- `scripts/build-workers.mjs`
- `node_modules/libsql/types/index.d.ts`
- `node_modules/libsql/index.js`
- Findings:
- Iteration 0 already changed the workspace materially: direct DB imports were switched from `better-sqlite3` to `libsql`, the extra WAL-related pragmas were added in the main DB clients and embed worker, composite indexes plus `vec_embedding` were added to the Drizzle schema and migration metadata, `IndexingProgress.svelte` now uses SSE, the admin jobs page was overhauled, and `WorkerPool` now serializes on `(repositoryId, versionId)` instead of repository only.
- The rejected vector implementation is still invalid in the current tree. `src/lib/server/db/vectors.sql` contains the rejected libSQL-native assumptions, including a dangling `USING libsql_vector_idx(...)` clause with no valid `CREATE INDEX` statement, and `src/lib/server/search/vector.search.ts` still performs full-table JS cosine scoring over `snippet_embeddings` instead of true in-database KNN.
- `sqlite-vec` is not currently present in `package.json` or the lockfile, and there is no existing `sqliteVec.load(...)`, `db.loadExtension(...)`, `vec0`, or extension bootstrap code anywhere under `src/`.
- Context7 sqlite-vec docs confirm the supported Node integration path is `import * as sqliteVec from 'sqlite-vec'; sqliteVec.load(db);`, storing vectors in a `vec0` virtual table and querying with `WHERE embedding MATCH ? ORDER BY distance LIMIT ?`. The docs also show vec0 metadata columns can be filtered directly, which fits the repositoryId, versionId, and profileId requirements.
- Context7 `better-sqlite3` v12.6.2 docs confirm `db.loadExtension(path)` exists. The installed `libsql` package in this workspace also exposes `loadExtension(path): this` in `node_modules/libsql/types/index.d.ts` and `loadExtension(...args)` in `node_modules/libsql/index.js`, so extension loading is not obviously blocked by the driver API surface alone.
- The review report remains the only verified runtime evidence for the current libsql path: `vector_from_float32(...)` is unavailable and `libsql_vector_idx` DDL is rejected in this environment. That invalidates the original native-vector approach but does not by itself prove sqlite-vec extension loading succeeds through the current `libsql` package alias, so the replan must include explicit connection-bootstrap and test coverage for real extension loading on the main DB client and worker-owned connections.
- Two iteration-0 deliverables referenced in the rejected plan do not exist in the current worktree: `src/lib/server/pipeline/write-worker-entry.ts` and `src/routes/api/v1/workers/+server.ts`. `scripts/build-workers.mjs` and the admin `WorkerStatusPanel.svelte` already reference those missing paths, so iteration 1 must either create them or revert those dangling references as part of a consistent plan.
- The existing admin/SSE work is largely salvageable. `src/routes/api/v1/jobs/stream/+server.ts`, `src/routes/api/v1/jobs/[id]/stream/+server.ts`, `src/lib/server/pipeline/progress-broadcaster.ts`, `src/lib/components/IndexingProgress.svelte`, `src/lib/components/admin/JobSkeleton.svelte`, `src/lib/components/admin/Toast.svelte`, and `src/lib/components/admin/WorkerStatusPanel.svelte` provide a usable foundation, but `src/routes/admin/jobs/+page.svelte` still contains `confirm(...)` and the queue API still only supports exact `repository_id = ?` and single-status filtering.
- The existing tests still encode the rejected pre-sqlite-vec model: `embedding.service.test.ts`, `schema.test.ts`, `hybrid.search.service.test.ts`, and `indexing.pipeline.test.ts` seed and assert against `snippet_embeddings.embedding` blobs only. The sqlite-vec replan therefore needs new DB bootstrap helpers, vec-table lifecycle assertions, and vector-search tests that validate actual vec0 writes and filtered KNN queries.
- Risks / follow-ups:
- The current worktree is dirty with iteration-0 partial changes and generated migration metadata, so iteration-1 tasks must explicitly distinguish keep/revise/revert work to avoid sibling tasks fighting over the same files.
- Because the current `libsql` package appears to expose `loadExtension`, the replan should avoid assuming an immediate full revert to upstream `better-sqlite3`; instead it should sequence a driver/bootstrap compatibility decision around actual sqlite-vec extension loading behavior with testable acceptance criteria.
### 2026-04-01 — TRUEREF-0023 iteration-2 current-worktree verification
- Task: Replan iteration 2 against the post-iteration-1 workspace state so the first validation unit no longer leaves a known vec_embedding mismatch behind.
- Files inspected:
- `package.json`
- `package-lock.json`
- `scripts/build-workers.mjs`
- `src/lib/server/db/client.ts`
- `src/lib/server/db/index.ts`
- `src/lib/server/db/schema.ts`
- `src/lib/server/db/vectors.sql`
- `src/lib/server/db/migrations/0006_yielding_centennial.sql`
- `src/lib/server/db/schema.test.ts`
- `src/lib/server/embeddings/embedding.service.ts`
- `src/lib/server/embeddings/embedding.service.test.ts`
- `src/lib/server/search/vector.search.ts`
- `src/lib/server/search/hybrid.search.service.ts`
- `src/lib/server/search/hybrid.search.service.test.ts`
- `src/lib/server/pipeline/job-queue.ts`
- `src/lib/server/pipeline/worker-pool.ts`
- `src/lib/server/pipeline/worker-entry.ts`
- `src/lib/server/pipeline/embed-worker-entry.ts`
- `src/lib/server/pipeline/worker-types.ts`
- `src/lib/server/pipeline/indexing.pipeline.ts`
- `src/lib/server/pipeline/indexing.pipeline.test.ts`
- `src/lib/server/pipeline/startup.ts`
- `src/lib/server/pipeline/progress-broadcaster.ts`
- `src/routes/api/v1/jobs/+server.ts`
- `src/routes/api/v1/jobs/stream/+server.ts`
- `src/routes/api/v1/sse-and-settings.integration.test.ts`
- `src/routes/admin/jobs/+page.svelte`
- `src/lib/components/IndexingProgress.svelte`
- `src/lib/components/admin/JobStatusBadge.svelte`
- `src/lib/components/admin/JobSkeleton.svelte`
- `src/lib/components/admin/Toast.svelte`
- `src/lib/components/admin/WorkerStatusPanel.svelte`
- `prompts/TRUEREF-0023/iteration_1/plan.md`
- `prompts/TRUEREF-0023/iteration_1/tasks.yaml`
- Findings:
- Iteration 1 already completed the direct-driver reset in the working tree: `package.json` and `package-lock.json` now contain real `better-sqlite3` plus `sqlite-vec`, and the current production/test files read in this pass import `better-sqlite3`, not `libsql`.
- The remaining failing intermediate state is exactly the schema/write mismatch called out in the review report: `src/lib/server/db/schema.ts` and `src/lib/server/db/migrations/0006_yielding_centennial.sql` still declare `vec_embedding`, `src/lib/server/db/index.ts` still executes `vectors.sql`, and `src/lib/server/embeddings/embedding.service.ts` still inserts `(embedding, vec_embedding)` into `snippet_embeddings`.
- `src/lib/server/db/vectors.sql` is still invalid startup SQL. It contains a dangling `USING libsql_vector_idx(...)` clause with no enclosing `CREATE INDEX`, so leaving it in the initialization path keeps the rejected libSQL-native design alive.
- The first iteration-1 task boundary was therefore wrong for the current baseline: the package/import reset is already present, but it only becomes a valid foundation once the relational `vec_embedding` artifacts and `EmbeddingService` insert path are cleaned up in the same validation unit.
- The current search path is still the pre-sqlite-vec implementation. `src/lib/server/search/vector.search.ts` reads every candidate embedding blob and scores in JavaScript; no `vec0`, `sqliteVec.load(db)`, or sqlite-vec KNN query exists anywhere under `src/` yet.
- The write worker and worker-status backend are still missing in the live tree even though they are already referenced elsewhere: `scripts/build-workers.mjs` includes `src/lib/server/pipeline/write-worker-entry.ts`, `src/lib/components/admin/WorkerStatusPanel.svelte` fetches `/api/v1/workers`, and `src/routes/api/v1/jobs/stream/+server.ts` currently has no worker-status event source.
- The admin jobs page remains incomplete but salvageable: `src/routes/admin/jobs/+page.svelte` still uses `confirm(...)` and `alert(...)`, while `JobSkeleton.svelte`, `Toast.svelte`, `WorkerStatusPanel.svelte`, `JobStatusBadge.svelte`, and `IndexingProgress.svelte` already provide the intended UI foundation.
- `src/lib/server/pipeline/job-queue.ts` still only supports exact `repository_id = ?` and single `status = ?` filtering, so API-side filter work remains a separate backend task and does not need to block the vector-storage implementation.
- Risks / follow-ups:
- Iteration 2 task decomposition must treat the current dirty code files from iterations 0 and 1 as the validation baseline, otherwise the executor will keep rediscovering pre-existing worktree drift instead of new task deltas.
- The sqlite-vec bootstrap helper and the relational cleanup should be planned as one acceptance unit before any downstream vec0, worker-status, or admin-page tasks, because that is the smallest unit that removes the known broken intermediate state.
### 2026-04-01T00:00:00.000Z — TRUEREF-0023 iteration 3 navbar follow-up planning research
- Task: Plan the accepted follow-up request to add an admin route to the main navbar.
- Files inspected:
- `prompts/TRUEREF-0023/progress.yaml`
- `prompts/TRUEREF-0023/iteration_2/review_report.yaml`
- `prompts/TRUEREF-0023/prompt.yaml`
- `package.json`
- `src/routes/+layout.svelte`
- `src/routes/admin/jobs/+page.svelte`
- Findings:
- The accepted iteration-2 workspace is green: `review_report.yaml` records passing build, passing tests, and no workspace diagnostics, so this request is a narrow additive follow-up rather than a rework of the sqlite-vec/admin jobs implementation.
- The main navbar is defined entirely in `src/routes/+layout.svelte` and already uses base-aware SvelteKit navigation via `resolve as resolveRoute` from `$app/paths` for the existing `Repositories`, `Search`, and `Settings` links.
- The existing admin surface already lives at `src/routes/admin/jobs/+page.svelte`, which sets the page title to `Job Queue - TrueRef Admin`; adding a navbar entry can therefore target `/admin/jobs` directly without introducing new routes, loaders, or components.
- Repository findings from the earlier lint planning work already confirm the codebase expectation to avoid root-relative internal navigation in SvelteKit pages and components, so the new navbar link should follow the existing `resolveRoute('/...')` anchor pattern.
- No dedicated test file currently covers the shared navbar. The appropriate validation for this follow-up remains repository-level `npm run build` and `npm test` after the single layout edit.
- Risks / follow-ups:
- The follow-up navigation request should stay isolated to the shared layout so it does not reopen the accepted sqlite-vec implementation surface.
- Build and test validation remain the appropriate regression checks because no dedicated navbar test currently exists.
### 2026-04-01T12:05:23.000Z — TRUEREF-0023 iteration 5 tabs filter and bulk reprocess planning research
- Task: Plan the follow-up repo-detail UI change to filter version rows in the tabs/tags view and add a bulk action that reprocesses all errored tags without adding a new backend endpoint.
- Files inspected:
- `prompts/TRUEREF-0023/progress.yaml`
- `prompts/TRUEREF-0023/prompt.yaml`
- `prompts/TRUEREF-0023/iteration_2/plan.md`
- `prompts/TRUEREF-0023/iteration_2/tasks.yaml`
- `src/routes/repos/[id]/+page.svelte`
- `src/routes/api/v1/libs/[id]/versions/[tag]/index/+server.ts`
- `src/routes/api/v1/api-contract.integration.test.ts`
- `package.json`
- Findings:
- The relevant UI surface is entirely in `src/routes/repos/[id]/+page.svelte`; the page already loads `versions`, renders per-version state badges, and exposes per-tag `Index` and `Remove` buttons.
- Version states are concretely `pending`, `indexing`, `indexed`, and `error`, and the page already centralizes their labels and color classes in `stateLabels` and `stateColors`.
- Existing per-tag reprocessing is implemented by `handleIndexVersion(tag)`, which POSTs to `/api/v1/libs/:id/versions/:tag/index`; the corresponding backend route exists and returns a queued job DTO with status `202`.
- No bulk reprocess endpoint exists, so the lowest-risk implementation is a UI-only bulk action that iterates the existing per-tag route.
- The page already contains a bounded batching pattern in `handleRegisterSelected()` with `BATCH_SIZE = 5`, which provides a concrete local precedent for bulk tag operations without inventing a new concurrency model.
- There is no existing page-component or browser test targeting `src/routes/repos/[id]/+page.svelte`; nearby automated coverage is API-contract focused, so this iteration should rely on `npm run build` and `npm test` regression validation unless a developer discovers an existing Svelte page harness during implementation.
- Context7 lookup for Svelte and SvelteKit could not be completed in this environment because the configured API key is invalid; planning therefore relied on installed versions from `package.json` (`svelte` `^5.51.0`, `@sveltejs/kit` `^2.50.2`) and the live page patterns already present in the repository.
- Risks / follow-ups:
- Bulk reprocessing must avoid queuing duplicate jobs for tags already shown as `indexing` or already tracked in `activeVersionJobs`.
- Filter state should be implemented as local UI state only and must not disturb the existing `onMount(loadVersions)` fetch path or the SSE job-progress flow.

View File

@@ -0,0 +1,113 @@
# TRUEREF-0021 — Differential Tag Indexing
**Priority:** P1
**Status:** Implemented
**Depends On:** TRUEREF-0014, TRUEREF-0017, TRUEREF-0019
**Blocks:**
---
## Problem Statement
Repositories with many version tags (e.g. hundreds or thousands, as seen in projects like RWC
UXFramework) make full re-indexing prohibitively expensive. Between consecutive semver tags the
overwhelming majority of files are unchanged — often only dependency manifests (`package.json`,
`*.lock`) differ. Indexing the complete file tree for every tag wastes compute time, GitHub API
quota, and embedding credits.
---
## Solution
Differential tag indexing detects when an already-indexed ancestor version exists for a given
target tag, determines exactly which files changed, and:
1. **Clones** unchanged document rows, snippet rows, and embedding rows from the ancestor version
into the target version in a single SQLite transaction (`cloneFromAncestor`).
2. **Crawls** only the changed (added / modified) files, parses and embeds them normally.
3. **Skips** deleted files (not cloned, not crawled).
4. **Falls back** silently to a full crawl when no indexed ancestor can be found or any step fails.
---
## Algorithm
### Stage 0 — Differential Plan (`buildDifferentialPlan`)
Executed in `IndexingPipeline.run()` before the crawl, when the job has a `versionId`:
1. **Ancestor selection** (`findBestAncestorVersion` in `tag-order.ts`): Loads all `indexed`
versions for the repository, parses their tags as semver, and returns the closest predecessor
to the target tag. Falls back to creation-timestamp ordering for non-semver tags.
2. **Changed-file detection**: For GitHub repositories, calls the GitHub Compare API
(`fetchGitHubChangedFiles` in `github-compare.ts`). For local repositories, uses
`git diff --name-status` via `getChangedFilesBetweenRefs` in `git.ts` (implemented with
`execFileSync` — not `execSync` — to prevent shell-injection attacks on branch/tag names
containing shell metacharacters).
3. **Path partitioning**: The changed-file list is split into `changedPaths` (added + modified
- renamed-destination) and `deletedPaths`. `unchangedPaths` is derived as
`ancestorFilePaths changedPaths deletedPaths`.
4. **Guard**: Returns `null` when no indexed ancestor exists, when the ancestor has no indexed
documents, or when all files changed (nothing to clone).
### Stage 0.5 — Clone Unchanged Files (`cloneFromAncestor`)
When `buildDifferentialPlan` returns a non-null plan with `unchangedPaths.size > 0`:
- Fetches ancestor `documents` rows for the unchanged paths using a parameterised
`IN (?, ?, …)` query (no string interpolation of path values → no SQL injection).
- Inserts new `documents` rows for each, with new UUIDs and `version_id = targetVersionId`.
- Fetches ancestor `snippets` rows for those document IDs; inserts clones with new IDs.
- Fetches ancestor `snippet_embeddings` rows; inserts clones pointing to the new snippet IDs.
- The entire operation runs inside a single `this.db.transaction(…)()` call for atomicity.
### Stage 1 — Partial Crawl
`IndexingPipeline.crawl()` accepts an optional third argument `allowedPaths?: Set<string>`.
When provided (set to `differentialPlan.changedPaths`), the crawl result is filtered so only
matching files are returned. This minimises GitHub API requests and local I/O.
---
## API Surface Changes
| Symbol | Location | Change |
| -------------------------------------- | ----------------------------------- | --------------------------------------------- |
| `buildDifferentialPlan` | `pipeline/differential-strategy.ts` | **New** — async function |
| `DifferentialPlan` | `pipeline/differential-strategy.ts` | **New** — interface |
| `findBestAncestorVersion` | `utils/tag-order.ts` | **New** — pure function |
| `fetchGitHubChangedFiles` | `crawler/github-compare.ts` | **New** — async function |
| `getChangedFilesBetweenRefs` | `utils/git.ts` | **New** — sync function (uses `execFileSync`) |
| `ChangedFile` | `crawler/types.ts` | **New** — interface |
| `CrawlOptions.allowedPaths` | `crawler/types.ts` | **New** — optional field |
| `IndexingPipeline.crawl()` | `pipeline/indexing.pipeline.ts` | **Modified** — added `allowedPaths` param |
| `IndexingPipeline.cloneFromAncestor()` | `pipeline/indexing.pipeline.ts` | **New** — private method |
| `IndexingPipeline.run()` | `pipeline/indexing.pipeline.ts` | **Modified** — Stage 0 added |
---
## Correctness Properties
- **Atomicity**: `cloneFromAncestor` wraps all inserts in one SQLite transaction; a failure
leaves the target version with no partially-cloned data.
- **Idempotency (fallback)**: If the clone or plan step fails for any reason, the pipeline
catches the error, logs a warning, and continues with a full crawl. No data loss occurs.
- **No shell injection**: `getChangedFilesBetweenRefs` uses `execFileSync` with an argument
array rather than `execSync` with a template-literal string.
- **No SQL injection**: Path values are never interpolated into SQL strings; only `?`
placeholders are used.
---
## Fallback Conditions
The differential plan returns `null` (triggering a full crawl) when:
- No versions for this repository have `state = 'indexed'`.
- The best ancestor has no indexed documents.
- All files changed between ancestor and target (`unchangedPaths.size === 0`).
- The GitHub Compare API call or `git diff` call throws an error.
- Any unexpected exception inside `buildDifferentialPlan`.

View File

@@ -0,0 +1,454 @@
# TRUEREF-0022 — Worker-Thread Indexing, Parallel Job Execution, and Real-Time Progress Streaming
**Priority:** P1
**Status:** Draft
**Depends On:** TRUEREF-0009, TRUEREF-0014, TRUEREF-0017, TRUEREF-0021
**Blocks:**
---
## Overview
The indexing pipeline currently runs on the same Node.js event loop as the HTTP and MCP servers. Because `better-sqlite3` is synchronous and `parseFile` is CPU-bound, a single indexing job can starve the event loop for seconds at a time, making the web UI completely unresponsive during indexing. With hundreds of version tags queued simultaneously, the problem compounds: the UI cannot navigate, poll for progress, or serve MCP requests while work is in flight.
This feature fixes the root cause by moving all indexing work into a dedicated Node.js Worker Thread, enabling controlled parallel execution of multiple jobs, and replacing the polling-based progress model with a real-time push mechanism — either Server-Sent Events (SSE) or a lightweight WebSocket channel.
---
## Problem Statement
### 1. Event Loop Starvation
`IndexingPipeline.run()` spends the bulk of its time in two blocking operations:
- `parseFile(file, ...)` — CPU-bound text/AST parsing, fully synchronous
- `better-sqlite3` writes — synchronous I/O by design
Neither yields to the event loop. A repository with 2 000 files produces ~2 000 consecutive blocking micro-tasks with no opportunity for the HTTP server to process any incoming request in between. The current mitigation (yielding every 20 files via `setImmediate`) reduces the freeze to sub-second intervals but does not eliminate it — it is a band-aid, not a structural fix.
### 2. Sequential-only Queue
`JobQueue` serialises all jobs one at a time to avoid SQLite write contention. This is correct for a single-writer model, but it means:
- Indexing `/my-lib/v3.0.0` blocks `/other-lib` from starting, even though they write to entirely disjoint rows.
- With hundreds of version tags registered from Discover Tags, a user must wait for every previous tag to finish before the next one starts — typically hours for a large monorepo.
### 3. Polling Overhead and Lag
The UI currently polls `GET /api/v1/jobs?repositoryId=...` every 2 seconds. This means:
- Progress updates are always up to 2 seconds stale.
- Each poll is a full DB read regardless of whether anything changed.
- The polling interval itself adds load during the highest-contention window.
---
## Goals
1. Move `IndexingPipeline.run()` into a Node.js Worker Thread so the HTTP event loop is never blocked by indexing work.
2. Support configurable parallel job execution (default: 2 concurrent workers, max: N where N is the number of CPU cores minus 1).
3. Replace polling with Server-Sent Events (SSE) for real-time per-job progress streaming.
4. Keep a single SQLite file as the persistence layer — no external message broker.
5. Detailed progress: expose current stage name (crawl / diff / parse / store / embed), not just a percentage.
6. Remain backward-compatible: the existing `GET /api/v1/jobs/{id}` REST endpoint continues to work unchanged.
---
## Non-Goals
- Moving to a multi-process (fork) architecture.
- External queue systems (Redis, BullMQ, etc.).
- Distributed or cluster execution across multiple machines.
- Resumable indexing (pause mid-parse and continue after restart).
- Changing the SQLite storage backend.
---
## Architecture
### Worker Thread Model
```
┌─────────────────────────────────────────────────────────────────┐
│ Main Thread (SvelteKit / HTTP / MCP) │
│ │
│ WorkerPool JobQueue (SQLite) │
│ ┌────────────┐ ┌──────────────────────────────────────┐ │
│ │ Worker 0 │◄────►│ indexing_jobs (queued/running/done) │ │
│ │ Worker 1 │ └──────────────────────────────────────┘ │
│ │ Worker N │ │
│ └────────────┘ ProgressBroadcaster │
│ │ ┌──────────────────────────────────────┐ │
│ └─────────────►│ SSE channels (Map<jobId, Response>) │ │
│ postMessage └──────────────────────────────────────┘ │
│ { type: 'progress', jobId, stage, ... } │
└─────────────────────────────────────────────────────────────────┘
```
#### Worker Thread lifecycle
Each worker is a long-lived `node:worker_threads` `Worker` instance that:
1. Opens its own `better-sqlite3` connection to the same database file.
2. Listens for `{ type: 'run', jobId }` messages from the main thread.
3. Runs `IndexingPipeline.run(job)`, emitting `postMessage` progress events at each stage boundary and every N files.
4. Posts `{ type: 'done', jobId }` or `{ type: 'failed', jobId, error }` when finished.
5. Is reused for subsequent jobs (no spawn-per-job overhead).
#### WorkerPool (main thread)
Manages a pool of `concurrency` workers.
```typescript
interface WorkerPoolOptions {
concurrency: number; // default: Math.max(1, os.cpus().length - 1), capped at 4
workerScript: string; // absolute path to the compiled worker entry
}
class WorkerPool {
private workers: Worker[];
private idle: Worker[];
enqueue(jobId: string): void;
private dispatch(worker: Worker, jobId: string): void;
private onWorkerMessage(msg: WorkerMessage): void;
private onWorkerExit(worker: Worker, code: number): void;
}
```
Workers are kept alive across jobs. If a worker crashes (non-zero exit), the pool spawns a replacement and marks any in-flight job as `failed`.
#### Parallelism and write contention
With WAL mode enabled (already the case), SQLite supports:
- **One concurrent writer** (the transaction lock)
- **Many concurrent readers**
The `replaceSnippets` transaction for different repositories never contends — they write different rows. The `cloneFromAncestor` operation writes to the same tables but different `version_id` values, so WAL checkpoint logic keeps them non-overlapping at the page level.
Two jobs on the **same repository** (e.g. `/my-lib/v1.0.0` and `/my-lib/v2.0.0`) can run in parallel because:
- Differential indexing (TRUEREF-0021) ensures `v2.0.0` reads from `v1.0.0`'s already-committed rows.
- The write transactions for each version touch disjoint `version_id` partitions.
If write contention still occurs under parallel load, `busy_timeout = 5000` (already set) absorbs transient waits.
#### Concurrency limit per repository
To prevent a user from queuing 500 tags and overwhelming the worker pool, the pool enforces:
- **Max 1 running job per repository** for the default branch (re-index).
- **Max `concurrency` total running jobs** across all repositories.
- Version jobs for the same repository are serialised within the pool (the queue picks the oldest queued version job for a given repo only when no other version job for that repo is running).
---
## Progress Model
### Stage Enumeration
Replace the opaque integer progress with a structured stage model:
```typescript
type IndexingStage =
| 'queued'
| 'differential' // computing ancestor diff
| 'crawling' // fetching files from GitHub or local FS
| 'cloning' // cloning unchanged files from ancestor (differential only)
| 'parsing' // parsing files into snippets
| 'storing' // writing documents + snippets to DB
| 'embedding' // generating vector embeddings
| 'done'
| 'failed';
```
### Extended Job Schema
```sql
ALTER TABLE indexing_jobs ADD COLUMN stage TEXT NOT NULL DEFAULT 'queued';
ALTER TABLE indexing_jobs ADD COLUMN stage_detail TEXT; -- e.g. "42 / 200 files"
```
The `progress` column (0100) is retained for backward compatibility and overall bar rendering.
### Worker → Main thread progress message
```typescript
interface ProgressMessage {
type: 'progress';
jobId: string;
stage: IndexingStage;
stageDetail?: string; // human-readable detail for the current stage
progress: number; // 0100 overall
processedFiles: number;
totalFiles: number;
}
```
Workers emit this message:
- On every stage transition (crawl start, parse start, store start, embed start).
- Every `PROGRESS_EMIT_EVERY = 10` files during the parse loop.
- On job completion or failure.
The main thread receives these messages and does two things:
1. Writes the update to `indexing_jobs` in SQLite (batched — one write per message, not per file).
2. Pushes the payload to any open SSE channels for that jobId.
---
## Server-Sent Events API
### `GET /api/v1/jobs/:id/stream`
Opens an SSE connection for a specific job. The server:
1. Sends the current job state as the first event immediately (no initial lag).
2. Pushes `ProgressMessage` events as the worker emits them.
3. Sends a final `event: done` or `event: failed` event, then closes the connection.
4. Accepts `Last-Event-ID` header for reconnect support — replays the last known state.
```
GET /api/v1/jobs/abc-123/stream HTTP/1.1
Accept: text/event-stream
HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
X-Accel-Buffering: no
id: 1
event: progress
data: {"stage":"crawling","progress":0,"processedFiles":0,"totalFiles":0}
id: 2
event: progress
data: {"stage":"parsing","progress":12,"processedFiles":240,"totalFiles":2000}
id: 47
event: done
data: {"stage":"done","progress":100,"processedFiles":2000,"totalFiles":2000}
```
The connection is automatically closed by the server after `event: done` or `event: failed`. If the client disconnects and reconnects with `Last-Event-ID: 47`, the server replays the last cached event (only the most recent event per job is cached in memory).
### `GET /api/v1/jobs/stream` (batch)
A second endpoint streams progress for all active jobs of a repository:
```
GET /api/v1/jobs/stream?repositoryId=/my-lib HTTP/1.1
Accept: text/event-stream
```
Events are multiplexed in the same stream:
```
event: job-progress
data: {"jobId":"abc-123","stage":"parsing","progress":34,...}
event: job-progress
data: {"jobId":"def-456","stage":"crawling","progress":5,...}
```
This replaces the current single-interval `GET /api/v1/jobs?repositoryId=...` poll on the repository detail page entirely.
---
## UI Changes
### Repository detail page (`/repos/[id]`)
- Replace the `$effect` poll for version jobs with a single `EventSource` connection to `GET /api/v1/jobs/stream?repositoryId=...`.
- Replace the inline progress bar markup with a refined component that shows stage name + file count + percentage.
- Show a compact "N jobs in queue" badge when jobs are queued but not yet running.
### Admin jobs page (`/admin/jobs`)
- Replace the `setInterval(fetchJobs, 3000)` poll with an `EventSource` on `GET /api/v1/jobs/stream` (all jobs, no repositoryId filter).
### Progress component
```
┌──────────────────────────────────────────────────────────────┐
│ v2.1.0 Parsing 42 / 200 21% │
│ ████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │
└──────────────────────────────────────────────────────────────┘
```
Stage label transitions: `Queued → Diff → Crawling → Cloning → Parsing → Storing → Embedding → Done`
---
## Configuration
Expose via the settings table (key `indexing.concurrency`):
```typescript
interface IndexingSettings {
concurrency: number; // 1max(cpus-1, 1); default 2
}
```
Surfaced in the settings UI (`/settings`) alongside the embedding provider config.
---
## Schema Migration
```sql
-- Migration: add stage columns to indexing_jobs
ALTER TABLE indexing_jobs ADD COLUMN stage TEXT NOT NULL DEFAULT 'queued';
ALTER TABLE indexing_jobs ADD COLUMN stage_detail TEXT;
```
The `progress`, `processedFiles`, and `totalFiles` columns are retained. The `status` column (`queued / running / paused / cancelled / done / failed`) is also retained. `stage` provides sub-status granularity within `running`.
---
## Acceptance Criteria
### Worker Thread
- [ ] `IndexingPipeline.run()` executes entirely inside a Worker Thread — zero `parseFile` / `replaceSnippets` calls on the main thread during indexing
- [ ] Worker crashes are detected: the pool spawns a replacement and marks the in-flight job as `failed`
- [ ] Worker pool concurrency is configurable via settings (min 1, max `cpus - 1`, default 2)
- [ ] Restarting the server cleans up stale `running` jobs and re-queues them (existing behaviour preserved)
### Parallel Execution
- [ ] Two jobs for two different repositories run concurrently when `concurrency ≥ 2`
- [ ] Two version jobs for the same repository are serialised (at most one per repo at a time)
- [ ] Main-branch re-index job and version jobs for the same repo are serialised
- [ ] Admin jobs page shows parallel running jobs simultaneously
### Progress Streaming
- [ ] `GET /api/v1/jobs/:id/stream` returns `text/event-stream` with stage + progress events
- [ ] `GET /api/v1/jobs/stream?repositoryId=...` multiplexes all active jobs for a repo
- [ ] First event is sent immediately (no wait for the first stage transition)
- [ ] SSE connection closes automatically after `done` / `failed`
- [ ] `Last-Event-ID` reconnect replays the last cached event
- [ ] Existing `GET /api/v1/jobs/:id` REST endpoint still works (no breaking change)
### Stage Detail
- [ ] `stage` column in `indexing_jobs` reflects the current pipeline stage
- [ ] UI shows stage label next to the progress bar
- [ ] Stage transitions: `queued → differential → crawling → cloning → parsing → storing → embedding → done`
### UI Responsiveness
- [ ] Navigating between pages while indexing is in progress has < 200 ms response time
- [ ] MCP `query-docs` calls resolve correctly while indexing is running in parallel
- [ ] No `SQLITE_BUSY` errors under concurrent indexing + read load
---
## Implementation Order
1. **Schema migration** — add `stage` and `stage_detail` columns (non-breaking, backward-compatible defaults)
2. **Worker entry point**`src/lib/server/pipeline/worker.ts` — thin wrapper that receives `run` messages and calls `IndexingPipeline.run()`
3. **WorkerPool**`src/lib/server/pipeline/worker-pool.ts` — pool management, message routing, crash recovery
4. **ProgressBroadcaster**`src/lib/server/pipeline/progress-broadcaster.ts` — in-memory SSE channel registry, last-event cache
5. **SSE endpoints**`src/routes/api/v1/jobs/[id]/stream/+server.ts` and `src/routes/api/v1/jobs/stream/+server.ts`
6. **JobQueue update** — replace `processNext`'s direct `pipeline.run()` call with `workerPool.enqueue(jobId)`; enforce per-repo serialisation
7. **Pipeline stage reporting** — add `this.reportStage(stage, detail)` calls at each stage boundary in `IndexingPipeline.run()`
8. **UI: SSE client on repository page** — replace `$effect` poll with `EventSource`
9. **UI: SSE client on admin/jobs page** — replace `setInterval` with `EventSource`
10. **Settings UI** — add concurrency slider to `/settings`
11. **Integration tests** — parallel execution, crash recovery, SSE event sequence
---
## Dedicated Embedding Worker
The embedding stage must **not** run inside the same Worker Thread as the crawl/parse/store pipeline. The reasons are structural:
### Why a dedicated embedding worker
| Concern | Per-parse-worker model | Dedicated embedding worker |
| ------------------ | ------------------------------------------------------ | -------------------------------------------------------------------------------- |
| Memory | N × ~100 MB (model weights + WASM heap) per worker | 1 × ~100 MB regardless of concurrency |
| Model warm-up | Paid once per worker spawn; cold starts slow | Paid once at server startup |
| Batch size | Each worker batches only its own job's snippets | All in-flight jobs queue to one worker → larger batches → higher WASM throughput |
| Provider migration | Must update every worker | Update one file |
| API rate limiting | N parallel streams to the same API → N×rate-limit hits | One serial stream, naturally throttled |
With `Xenova/all-MiniLM-L6-v2`, the WASM model and weight files occupy ~90120 MB of heap. Running three parse workers with embedded model loading costs ~300360 MB of resident memory that can never be freed while the server is alive. A dedicated worker keeps that cost fixed at one instance.
Batch efficiency matters too: `embedSnippets` already uses `BATCH_SIZE = 50`. A single embedding worker receiving snippet batches from multiple concurrently completing parse jobs can saturate its WASM batch budget (50 texts) far more consistently than individual workers whose parse jobs complete asynchronously.
### Architecture
The embedding worker is a **separate, long-lived worker thread** distinct from the parse worker pool:
```
┌─────────────────────────────────────────────────────────────┐
│ Main Thread │
│ │
│ WorkerPool (parse workers, concurrency N) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Worker 0 │ │ Worker 1 │ │ Worker N │ │
│ │ crawl │ │ crawl │ │ crawl │ │
│ │ diff │ │ diff │ │ diff │ │
│ │ parse │ │ parse │ │ parse │ │
│ │ store │ │ store │ │ store │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ notify │ notify │ notify │
│ └──────────────┴─────────────┘ │
│ │ (via main thread broadcast) │
│ ┌──────▼───────┐ │
│ │ Embed Worker │ ← single instance │
│ │ loads model once │
│ │ drains snippet_embeddings deficit │
│ │ writes embeddings to DB │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
### Communication protocol
Parse workers do **not** send snippet content to the embedding worker over IPC — that would serialise potentially megabytes of text per job and negate the bandwidth savings of the deficit-drain pattern.
Instead, the existing `findSnippetIdsMissingEmbeddings` query is the handshake:
1. Parse worker completes stage `storing` and posts `{ type: 'snippets-ready', repositoryId, versionId }` to the main thread.
2. Main thread forwards this to the embedding worker.
3. Embedding worker calls `findSnippetIdsMissingEmbeddings(repositoryId, versionId)` on its own DB connection, then runs `embedSnippets()` as it does today.
4. Embedding worker posts `{ type: 'embed-progress', jobId, done, total }` back to the main thread at each batch boundary.
5. Main thread routes this to the SSE broadcaster → UI updates the embedding progress slice.
This means:
- The embedding worker reads snippet text from the DB itself (no IPC serialisation of content).
- The model is loaded once, stays warm, and processes batches from all repositories in FIFO order.
- Parse workers are never blocked waiting for embeddings — they complete their job stages and exit immediately.
### Embedding worker message contract
```typescript
// Main → Embedding worker
type EmbedRequest =
| { type: 'embed'; jobId: string; repositoryId: string; versionId: string | null }
| { type: 'shutdown' };
// Embedding worker → Main
type EmbedResponse =
| { type: 'embed-progress'; jobId: string; done: number; total: number }
| { type: 'embed-done'; jobId: string }
| { type: 'embed-failed'; jobId: string; error: string }
| { type: 'ready' }; // emitted once after model warm-up completes
```
The `ready` message allows the server startup sequence to defer routing any embed requests until the model is loaded, preventing a race on first-run.
---
## Open Questions
1. Should `cloneFromAncestor` (TRUEREF-0021) remain synchronous on the parse worker, or be split into its own `cloning` stage with explicit SSE events and a progress count? Given that cloning is a bulk DB operation rather than a per-file loop, a single stage-transition event (`stage: 'cloning'`, no per-row progress) is sufficient.
2. Does the `busy_timeout = 5000` setting need to increase under high-concurrency parallel writes, or is 5 s sufficient? Empirically test with `concurrency = 4` + the embedding worker all writing simultaneously before increasing it.
3. Should the embedding worker support priority queueing — e.g. embedding the most recently completed parse job first — or is strict FIFO sufficient? FIFO is simpler and correct for the current use case.

View File

@@ -0,0 +1,955 @@
# TRUEREF-0023 — libSQL Migration, Native Vector Search, Parallel Tag Indexing, and Performance Hardening
**Priority:** P1
**Status:** Draft
**Depends On:** TRUEREF-0001, TRUEREF-0022
**Blocks:**
---
## Overview
TrueRef currently uses `better-sqlite3` for all database access. This creates three compounding performance problems:
1. **Vector search does not scale.** `VectorSearch.vectorSearch()` loads the entire `snippet_embeddings` table for a repository into Node.js memory and computes cosine similarity in a JavaScript loop. A repository with 100k snippets at 1536 OpenAI dimensions allocates ~600 MB per query and ties up the worker thread for seconds before returning results.
2. **Missing composite indexes cause table scans on every query.** The schema defines FK columns used in every search and embedding filter, but declares zero composite or covering indexes on them. Every call to `searchSnippets`, `findSnippetIdsMissingEmbeddings`, and `cloneFromAncestor` performs full or near-full table scans.
3. **SQLite connection is under-configured.** Critical pragmas (`synchronous`, `cache_size`, `mmap_size`, `temp_store`) are absent, leaving significant I/O throughput on the table.
The solution is to replace `better-sqlite3` with `@libsql/better-sqlite3` — an embeddable, drop-in synchronous replacement that is a superset of the better-sqlite3 API and exposes libSQL's native vector index (`libsql_vector_idx`). Because the API is identical, no service layer or ORM code changes are needed beyond import statements and the vector search implementation.
Two additional structural improvements are delivered in the same feature:
4. **Per-repo job serialization is too coarse.** `WorkerPool` prevents any two jobs sharing the same `repositoryId` from running in parallel. This means indexing 200 tags of a single library is fully sequential — one tag at a time — even though different tags write to entirely disjoint row sets. The constraint should track `(repositoryId, versionId)` pairs instead.
5. **Write lock contention under parallel indexing.** When multiple parse workers flush parsed snippets simultaneously they all compete for the SQLite write lock, spending most of their time in `busy_timeout` back-off. A single dedicated write worker eliminates this: parse workers become pure CPU workers (crawl → parse → send batches over `postMessage`) and the write worker is the sole DB writer.
6. **Admin UI is unusable under load.** The job queue page has no status or repository filters, no worker status panel, no skeleton loading, uses blocking `alert()` / `confirm()` dialogs, and `IndexingProgress` still polls every 2 seconds instead of consuming the existing SSE stream.
---
## Goals
1. Replace `better-sqlite3` with `@libsql/better-sqlite3` with minimal code churn — import paths only.
2. Add a libSQL vector index on `snippet_embeddings` so that KNN queries execute inside SQLite instead of in a JavaScript loop.
3. Add the six composite and covering indexes required by the hot query paths.
4. Tune the SQLite pragma configuration for I/O performance.
5. Eliminate the leading cause of OOM risk during semantic search.
6. Keep a single embedded database file — no external server, no network.
7. Allow multiple tags of the same repository to index in parallel (unrelated version rows, no write conflict).
8. Eliminate write-lock contention between parallel parse workers by introducing a single dedicated write worker.
9. Rebuild the admin jobs page with full filtering (status, repository, free-text), a live worker status panel, skeleton loading on initial fetch, per-action inline spinners, non-blocking toast notifications, and SSE-driven real-time updates throughout.
---
## Non-Goals
- Migrating to the async `@libsql/client` package (HTTP/embedded-replica mode).
- Changing the Drizzle ORM adapter (`drizzle-orm/better-sqlite3` stays unchanged).
- Changing `drizzle.config.ts` dialect (`sqlite` is still correct for embedded libSQL).
- Adding hybrid/approximate indexing beyond the default HNSW strategy provided by `libsql_vector_idx`.
- Parallelizing embedding batches across providers (separate feature).
- Horizontally scaling across processes.
- Allowing more than one job for the exact same `(repositoryId, versionId)` pair to run concurrently (still serialized — duplicate detection in `JobQueue` is unchanged).
- A full admin authentication system (out of scope).
- Mobile-responsive redesign of the entire admin section (out of scope).
---
## Problem Detail
### 1. Vector Search — Full Table Scan in JavaScript
**File:** `src/lib/server/search/vector.search.ts`
```typescript
// Current: no LIMIT, loads ALL embeddings for repo into memory
const rows = this.db.prepare<unknown[], RawEmbeddingRow>(sql).all(...params);
const scored: VectorSearchResult[] = rows.map((row) => {
const embedding = new Float32Array(
row.embedding.buffer,
row.embedding.byteOffset,
row.embedding.byteLength / 4
);
return { snippetId: row.snippet_id, score: cosineSimilarity(queryEmbedding, embedding) };
});
return scored.sort((a, b) => b.score - a.score).slice(0, limit);
```
For a repo with N snippets and D dimensions, this allocates `N × D × 4` bytes per query. At N=100k and D=1536, that is ~600 MB allocated synchronously. The result is sorted entirely in JS before the top-k is returned. With a native vector index, SQLite returns only the top-k rows.
### 2. Missing Composite Indexes
The `snippets`, `documents`, and `snippet_embeddings` tables are queried with multi-column WHERE predicates in every hot path, but no composite indexes exist:
| Table | Filter columns | Used in |
| -------------------- | ----------------------------- | ---------------------------------------------- |
| `snippets` | `(repository_id, version_id)` | All search, diff, clone |
| `snippets` | `(repository_id, type)` | Type-filtered queries |
| `documents` | `(repository_id, version_id)` | Diff strategy, clone |
| `snippet_embeddings` | `(profile_id, snippet_id)` | `findSnippetIdsMissingEmbeddings` LEFT JOIN |
| `repositories` | `(state)` | `searchRepositories` WHERE `state = 'indexed'` |
| `indexing_jobs` | `(repository_id, status)` | Job status lookups |
Without these indexes, SQLite performs a B-tree scan of the primary key and filters rows in memory. On a 500k-row `snippets` table this is the dominant cost of every search.
### 4. Admin UI — Current Problems
**File:** `src/routes/admin/jobs/+page.svelte`, `src/lib/components/IndexingProgress.svelte`
| Problem | Location | Impact |
| -------------------------------------------------------------- | ----------------------------------------- | ------------------------------------------------------------ |
| `IndexingProgress` polls every 2 s via `setInterval` + `fetch` | `IndexingProgress.svelte` | Constant HTTP traffic; progress lags by up to 2 s |
| No status or repository filter controls | `admin/jobs/+page.svelte` | With 200 tag jobs, finding a specific one requires scrolling |
| No worker status panel | — (no endpoint exists) | Operator cannot see which workers are busy or idle |
| `alert()` for errors, `confirm()` for cancel | `admin/jobs/+page.svelte``showToast()` | Blocks the entire browser tab; unusable under parallel jobs |
| `actionInProgress` is a single string, not per-job | `admin/jobs/+page.svelte` | Pausing job A disables buttons on all other jobs |
| No skeleton loading — blank + spinner on first load | `admin/jobs/+page.svelte` | Layout shift; no structural preview while data loads |
| Hard-coded `limit=50` query, no pagination | `admin/jobs/+page.svelte:fetchJobs()` | Page truncates silently for large queues |
---
### 3. Under-configured SQLite Connection
**File:** `src/lib/server/db/client.ts` and `src/lib/server/db/index.ts`
Current pragmas:
```typescript
client.pragma('journal_mode = WAL');
client.pragma('foreign_keys = ON');
client.pragma('busy_timeout = 5000');
```
Missing:
- `synchronous = NORMAL` — halves fsync overhead vs the default FULL; safe with WAL
- `cache_size = -65536` — 64 MB page cache; default is 2 MB
- `temp_store = MEMORY` — temp tables and sort spills stay in RAM
- `mmap_size = 268435456` — 256 MB memory-mapped read path; bypasses system call overhead for reads
- `wal_autocheckpoint = 1000` — more frequent checkpoints prevent WAL growth
---
## Architecture
### Drop-In Replacement: `@libsql/better-sqlite3`
`@libsql/better-sqlite3` is published by Turso and implemented as a Node.js native addon wrapping the libSQL embedded engine. The exported class is API-compatible with `better-sqlite3`:
```typescript
// before
import Database from 'better-sqlite3';
const db = new Database('/path/to/file.db');
db.pragma('journal_mode = WAL');
const rows = db.prepare('SELECT ...').all(...params);
// after — identical code
import Database from '@libsql/better-sqlite3';
const db = new Database('/path/to/file.db');
db.pragma('journal_mode = WAL');
const rows = db.prepare('SELECT ...').all(...params);
```
All of the following continue to work unchanged:
- `drizzle-orm/better-sqlite3` adapter and `migrate` helper
- `drizzle-kit` with `dialect: 'sqlite'`
- Prepared statements, transactions, WAL pragmas, foreign keys
- Worker thread per-thread connections (`worker-entry.ts`, `embed-worker-entry.ts`)
- All `type Database from 'better-sqlite3'` type imports (replaced in lock-step)
### Vector Index Design
libSQL provides `libsql_vector_idx()` — a virtual index type stored in a shadow table alongside the main table. Once indexed, KNN queries use a SQL `vector_top_k()` function:
```sql
-- KNN: return top-k snippet IDs closest to the query vector
SELECT snippet_id
FROM vector_top_k('idx_snippet_embeddings_vec', vector_from_float32(?), ?)
```
`vector_from_float32(blob)` accepts the same raw little-endian Float32 bytes currently stored in the `embedding` blob column. **No data migration is needed** — the existing blob column can be re-indexed with `libsql_vector_idx` pointing at the bytes-stored column.
The index strategy:
1. Add a generated `vec_embedding` column of type `F32_BLOB(dimensions)` to `snippet_embeddings`, populated from the existing `embedding` blob via a migration trigger.
2. Create the vector index: `CREATE INDEX idx_snippet_embeddings_vec ON snippet_embeddings(vec_embedding) USING libsql_vector_idx(vec_embedding)`.
3. Rewrite `VectorSearch.vectorSearch()` to use `vector_top_k()` with a two-step join instead of the in-memory loop.
4. Update `EmbeddingService.embedSnippets()` to write `vec_embedding` on insert.
Dimensions are profile-specific. Because the index is per-column, a separate index is needed per embedding dimensionality. For v1, a single index covering the default profile's dimensions is sufficient; multi-profile KNN can be handled with a `WHERE profile_id = ?` pre-filter on the vector_top_k results.
### Updated Vector Search Query
```typescript
vectorSearch(queryEmbedding: Float32Array, options: VectorSearchOptions): VectorSearchResult[] {
const { repositoryId, versionId, profileId = 'local-default', limit = 50 } = options;
// Encode query vector as raw bytes (same format as stored blobs)
const queryBytes = Buffer.from(queryEmbedding.buffer);
// Use libSQL vector_top_k for ANN — returns ordered (rowid, distance) pairs
let sql = `
SELECT se.snippet_id,
vector_distance_cos(se.vec_embedding, vector_from_float32(?)) AS score
FROM vector_top_k('idx_snippet_embeddings_vec', vector_from_float32(?), ?) AS knn
JOIN snippet_embeddings se ON se.rowid = knn.id
JOIN snippets s ON s.id = se.snippet_id
WHERE s.repository_id = ?
AND se.profile_id = ?
`;
const params: unknown[] = [queryBytes, queryBytes, limit * 4, repositoryId, profileId];
if (versionId) {
sql += ' AND s.version_id = ?';
params.push(versionId);
}
sql += ' ORDER BY score ASC LIMIT ?';
params.push(limit);
return this.db
.prepare<unknown[], { snippet_id: string; score: number }>(sql)
.all(...params)
.map((row) => ({ snippetId: row.snippet_id, score: 1 - row.score }));
}
```
`vector_distance_cos` returns distance (0 = identical), so `1 - distance` gives a similarity score in [0, 1] matching the existing `VectorSearchResult.score` contract.
---
## Implementation Plan
### Phase 1 — Package Swap (no logic changes)
**Files touched:** `package.json`, all `.ts` files that import `better-sqlite3`
1. In `package.json`:
- Remove `"better-sqlite3": "^12.6.2"` from `dependencies`
- Add `"@libsql/better-sqlite3": "^0.4.0"` to `dependencies`
- Remove `"@types/better-sqlite3": "^7.6.13"` from `devDependencies`
- `@libsql/better-sqlite3` ships its own TypeScript declarations
2. Replace all import statements (35 occurrences across 19 files):
| Old import | New import |
| --------------------------------------------------------------- | ---------------------------------------------------- |
| `import Database from 'better-sqlite3'` | `import Database from '@libsql/better-sqlite3'` |
| `import type Database from 'better-sqlite3'` | `import type Database from '@libsql/better-sqlite3'` |
| `import { drizzle } from 'drizzle-orm/better-sqlite3'` | unchanged |
| `import { migrate } from 'drizzle-orm/better-sqlite3/migrator'` | unchanged |
Affected production files:
- `src/lib/server/db/index.ts`
- `src/lib/server/db/client.ts`
- `src/lib/server/embeddings/embedding.service.ts`
- `src/lib/server/pipeline/indexing.pipeline.ts`
- `src/lib/server/pipeline/job-queue.ts`
- `src/lib/server/pipeline/startup.ts`
- `src/lib/server/pipeline/worker-entry.ts`
- `src/lib/server/pipeline/embed-worker-entry.ts`
- `src/lib/server/pipeline/differential-strategy.ts`
- `src/lib/server/search/vector.search.ts`
- `src/lib/server/search/hybrid.search.service.ts`
- `src/lib/server/search/search.service.ts`
- `src/lib/server/services/repository.service.ts`
- `src/lib/server/services/version.service.ts`
- `src/lib/server/services/embedding-settings.service.ts`
Affected test files (same mechanical replacement):
- `src/routes/api/v1/api-contract.integration.test.ts`
- `src/routes/api/v1/sse-and-settings.integration.test.ts`
- `src/routes/settings/page.server.test.ts`
- `src/lib/server/db/schema.test.ts`
- `src/lib/server/embeddings/embedding.service.test.ts`
- `src/lib/server/pipeline/indexing.pipeline.test.ts`
- `src/lib/server/pipeline/differential-strategy.test.ts`
- `src/lib/server/search/search.service.test.ts`
- `src/lib/server/search/hybrid.search.service.test.ts`
- `src/lib/server/services/repository.service.test.ts`
- `src/lib/server/services/version.service.test.ts`
- `src/routes/api/v1/settings/embedding/server.test.ts`
- `src/routes/api/v1/libs/[id]/index/server.test.ts`
- `src/routes/api/v1/libs/[id]/versions/discover/server.test.ts`
3. Run all tests — they should pass with zero logic changes: `npm test`
### Phase 2 — Pragma Hardening
**Files touched:** `src/lib/server/db/client.ts`, `src/lib/server/db/index.ts`
Add the following pragmas to both connection factories (raw client and `initializeDatabase()`):
```typescript
client.pragma('synchronous = NORMAL');
client.pragma('cache_size = -65536'); // 64 MB
client.pragma('temp_store = MEMORY');
client.pragma('mmap_size = 268435456'); // 256 MB
client.pragma('wal_autocheckpoint = 1000');
```
Worker threads (`worker-entry.ts`, `embed-worker-entry.ts`) open their own connections — apply the same pragmas there.
### Phase 3 — Composite Indexes (Drizzle migration)
**Files touched:** `src/lib/server/db/schema.ts`, new migration SQL file
Add indexes in `schema.ts` using Drizzle's `index()` helper:
```typescript
// snippets table
export const snippets = sqliteTable(
'snippets',
{
/* unchanged */
},
(t) => [
index('idx_snippets_repo_version').on(t.repositoryId, t.versionId),
index('idx_snippets_repo_type').on(t.repositoryId, t.type)
]
);
// documents table
export const documents = sqliteTable(
'documents',
{
/* unchanged */
},
(t) => [index('idx_documents_repo_version').on(t.repositoryId, t.versionId)]
);
// snippet_embeddings table
export const snippetEmbeddings = sqliteTable(
'snippet_embeddings',
{
/* unchanged */
},
(table) => [
primaryKey({ columns: [table.snippetId, table.profileId] }), // unchanged
index('idx_embeddings_profile').on(table.profileId, table.snippetId)
]
);
// repositories table
export const repositories = sqliteTable(
'repositories',
{
/* unchanged */
},
(t) => [index('idx_repositories_state').on(t.state)]
);
// indexing_jobs table
export const indexingJobs = sqliteTable(
'indexing_jobs',
{
/* unchanged */
},
(t) => [index('idx_jobs_repo_status').on(t.repositoryId, t.status)]
);
```
Generate and apply migration: `npm run db:generate && npm run db:migrate`
### Phase 4 — Vector Column and Index (Drizzle migration)
**Files touched:** `src/lib/server/db/schema.ts`, new migration SQL, `src/lib/server/search/vector.search.ts`, `src/lib/server/embeddings/embedding.service.ts`
#### 4a. Schema: add `vec_embedding` column
Add `vec_embedding` to `snippet_embeddings`. Drizzle does not have a `F32_BLOB` column type helper; use a raw SQL column:
```typescript
import { sql } from 'drizzle-orm';
import { customType } from 'drizzle-orm/sqlite-core';
const f32Blob = (name: string, dimensions: number) =>
customType<{ data: Buffer }>({
dataType() {
return `F32_BLOB(${dimensions})`;
}
})(name);
export const snippetEmbeddings = sqliteTable(
'snippet_embeddings',
{
snippetId: text('snippet_id')
.notNull()
.references(() => snippets.id, { onDelete: 'cascade' }),
profileId: text('profile_id')
.notNull()
.references(() => embeddingProfiles.id, { onDelete: 'cascade' }),
model: text('model').notNull(),
dimensions: integer('dimensions').notNull(),
embedding: blob('embedding').notNull(), // existing blob — kept for backward compat
vecEmbedding: f32Blob('vec_embedding', 1536), // libSQL vector column (nullable during migration fill)
createdAt: integer('created_at').notNull()
},
(table) => [
primaryKey({ columns: [table.snippetId, table.profileId] }),
index('idx_embeddings_profile').on(table.profileId, table.snippetId)
]
);
```
Because dimensionality is fixed per model, `F32_BLOB(1536)` covers OpenAI `text-embedding-3-small/large`. A follow-up can parameterize this per profile.
#### 4b. Migration SQL: populate `vec_embedding` from existing `embedding` blob and create the vector index
The vector index cannot be expressed in SQL DDL portable across Drizzle — it must be applied in the FTS-style custom SQL file (`src/lib/server/db/fts.sql` or an equivalent `vectors.sql`):
```sql
-- Backfill vec_embedding from existing raw blob data
UPDATE snippet_embeddings
SET vec_embedding = vector_from_float32(embedding)
WHERE vec_embedding IS NULL AND embedding IS NOT NULL;
-- Create the HNSW vector index (libSQL extension syntax)
CREATE INDEX IF NOT EXISTS idx_snippet_embeddings_vec
ON snippet_embeddings(vec_embedding)
USING libsql_vector_idx(vec_embedding, 'metric=cosine', 'compress_neighbors=float8', 'max_neighbors=20');
```
Add a call to this SQL in `initializeDatabase()` alongside the existing `fts.sql` execution:
```typescript
const vectorSql = readFileSync(join(__dirname, 'vectors.sql'), 'utf-8');
client.exec(vectorSql);
```
#### 4c. Update `EmbeddingService.embedSnippets()`
When inserting a new embedding, write both the blob and the vec column:
```typescript
const insert = this.db.prepare<[string, string, string, number, Buffer, Buffer]>(`
INSERT OR REPLACE INTO snippet_embeddings
(snippet_id, profile_id, model, dimensions, embedding, vec_embedding, created_at)
VALUES (?, ?, ?, ?, ?, vector_from_float32(?), unixepoch())
`);
// inside the transaction:
insert.run(
snippet.id,
this.profileId,
embedding.model,
embedding.dimensions,
embeddingBuffer,
embeddingBuffer // same bytes — vector_from_float32() interprets them
);
```
#### 4d. Rewrite `VectorSearch.vectorSearch()`
Replace the full-scan JS loop with `vector_top_k()`:
```typescript
vectorSearch(queryEmbedding: Float32Array, options: VectorSearchOptions): VectorSearchResult[] {
const { repositoryId, versionId, profileId = 'local-default', limit = 50 } = options;
const queryBytes = Buffer.from(queryEmbedding.buffer);
const candidatePool = limit * 4; // over-fetch for post-filter
let sql = `
SELECT se.snippet_id,
vector_distance_cos(se.vec_embedding, vector_from_float32(?)) AS distance
FROM vector_top_k('idx_snippet_embeddings_vec', vector_from_float32(?), ?) AS knn
JOIN snippet_embeddings se ON se.rowid = knn.id
JOIN snippets s ON s.id = se.snippet_id
WHERE s.repository_id = ?
AND se.profile_id = ?
`;
const params: unknown[] = [queryBytes, queryBytes, candidatePool, repositoryId, profileId];
if (versionId) {
sql += ' AND s.version_id = ?';
params.push(versionId);
}
sql += ' ORDER BY distance ASC LIMIT ?';
params.push(limit);
return this.db
.prepare<unknown[], { snippet_id: string; distance: number }>(sql)
.all(...params)
.map((row) => ({ snippetId: row.snippet_id, score: 1 - row.distance }));
}
```
The `score` contract is preserved (1 = identical, 0 = orthogonal). The `cosineSimilarity` helper function is no longer called at runtime but can be kept for unit tests.
### Phase 5 — Per-Job Serialization Key Fix
**Files touched:** `src/lib/server/pipeline/worker-pool.ts`
The current serialization guard uses a bare `repositoryId`:
```typescript
// current
private runningRepoIds = new Set<string>();
// blocks any job whose repositoryId is already in the set
const jobIdx = this.jobQueue.findIndex((j) => !this.runningRepoIds.has(j.repositoryId));
```
Different tags of the same repository write to completely disjoint rows (`version_id`-partitioned documents, snippets, and embeddings). The only genuine conflict is two jobs for the same `(repositoryId, versionId)` pair, which `JobQueue.enqueue()` already prevents via the `status IN ('queued', 'running')` deduplication check.
Change the guard to key on the compound pair:
```typescript
// replace Set<string> with Set<string> keyed on compound pair
private runningJobKeys = new Set<string>();
private jobKey(repositoryId: string, versionId?: string | null): string {
return `${repositoryId}|${versionId ?? ''}`;
}
```
Update all four sites that read/write `runningRepoIds`:
| Location | Old | New |
| ------------------------------------ | ----------------------------------------------------- | ---------------------------------------------------------------------------------------- |
| `dispatch()` find | `!this.runningRepoIds.has(j.repositoryId)` | `!this.runningJobKeys.has(this.jobKey(j.repositoryId, j.versionId))` |
| `dispatch()` add | `this.runningRepoIds.add(job.repositoryId)` | `this.runningJobKeys.add(this.jobKey(job.repositoryId, job.versionId))` |
| `onWorkerMessage` done/failed delete | `this.runningRepoIds.delete(runningJob.repositoryId)` | `this.runningJobKeys.delete(this.jobKey(runningJob.repositoryId, runningJob.versionId))` |
| `onWorkerExit` delete | same | same |
The `QueuedJob` and `RunningJob` interfaces already carry `versionId` — no type changes needed.
The only serialized case that remains is `versionId = null` (default-branch re-index) paired with itself, which maps to the stable key `"repositoryId|"` — correctly deduplicated.
---
### Phase 6 — Dedicated Write Worker (Single-Writer Pattern)
**Files touched:** `src/lib/server/pipeline/worker-types.ts`, `src/lib/server/pipeline/write-worker-entry.ts` (new), `src/lib/server/pipeline/worker-entry.ts`, `src/lib/server/pipeline/worker-pool.ts`
#### Motivation
With Phase 5 in place, N tags of the same library can index in parallel. Each parse worker currently opens its own DB connection and holds the write lock while storing parsed snippets. Under N concurrent writers, each worker spends the majority of its wall-clock time waiting in `busy_timeout` back-off. The fix is the single-writer pattern: one dedicated write worker owns the only writable DB connection; parse workers become stateless CPU workers that send write batches over `postMessage`.
```
Parse Worker 1 ──┐ WriteRequest (docs[], snippets[]) ┌── WriteAck
Parse Worker 2 ──┼─────────────────────────────────────► Write Worker (sole DB writer)
Parse Worker N ──┘ └── single better-sqlite3 connection
```
#### New message types (`worker-types.ts`)
```typescript
export interface WriteRequest {
type: 'write';
jobId: string;
documents: SerializedDocument[];
snippets: SerializedSnippet[];
}
export interface WriteAck {
type: 'write_ack';
jobId: string;
documentCount: number;
snippetCount: number;
}
export interface WriteError {
type: 'write_error';
jobId: string;
error: string;
}
// SerializedDocument / SerializedSnippet mirror the DB column shapes
// (plain objects, safe to transfer via structured clone)
```
#### Write worker (`write-worker-entry.ts`)
The write worker:
- Opens its own `Database` connection (WAL mode, all pragmas from Phase 2)
- Listens for `WriteRequest` messages
- Wraps each batch in a single transaction
- Posts `WriteAck` or `WriteError` back to the parent, which forwards the ack to the originating parse worker by `jobId`
```typescript
import Database from '@libsql/better-sqlite3';
import { workerData, parentPort } from 'node:worker_threads';
import type { WriteRequest, WriteAck, WriteError } from './worker-types.js';
const db = new Database((workerData as WorkerInitData).dbPath);
db.pragma('journal_mode = WAL');
db.pragma('synchronous = NORMAL');
db.pragma('cache_size = -65536');
db.pragma('foreign_keys = ON');
const insertDoc = db.prepare(`INSERT OR REPLACE INTO documents (...) VALUES (...)`);
const insertSnippet = db.prepare(`INSERT OR REPLACE INTO snippets (...) VALUES (...)`);
const writeBatch = db.transaction((req: WriteRequest) => {
for (const doc of req.documents) insertDoc.run(doc);
for (const snip of req.snippets) insertSnippet.run(snip);
});
parentPort!.on('message', (req: WriteRequest) => {
try {
writeBatch(req);
const ack: WriteAck = {
type: 'write_ack',
jobId: req.jobId,
documentCount: req.documents.length,
snippetCount: req.snippets.length
};
parentPort!.postMessage(ack);
} catch (err) {
const fail: WriteError = { type: 'write_error', jobId: req.jobId, error: String(err) };
parentPort!.postMessage(fail);
}
});
```
#### Parse worker changes (`worker-entry.ts`)
Parse workers lose their DB connection. `IndexingPipeline` receives a `sendWrite` callback instead of a `db` instance. After parsing each file batch, the worker calls `sendWrite({ type: 'write', jobId, documents, snippets })` and awaits the `WriteAck` before continuing. This keeps back-pressure: a slow write worker naturally throttles the parse workers without additional semaphores.
#### WorkerPool changes
- Spawn one write worker at startup (always, regardless of embedding config)
- Route incoming `write_ack` / `write_error` messages to the correct waiting parse worker via a `Map<jobId, resolve>` promise registry
- The write worker is separate from the embed worker — embed writes (`snippet_embeddings`) can still go through the write worker by adding an `EmbedWriteRequest` message type, or remain in the embed worker since embedding runs after parsing completes (no lock contention with active parse jobs)
#### Conflict analysis with Phase 5
Phases 5 and 6 compose cleanly:
- Phase 5 allows multiple `(repo, versionId)` jobs to run concurrently
- Phase 6 ensures all those concurrent jobs share a single write path — contention is eliminated by design
- The write worker is stateless with respect to job identity; it just executes batches in arrival order within a FIFO message queue (Node.js `postMessage` is ordered)
- The embed worker remains a separate process (it runs after parse completes, so it never overlaps with active parse writes for the same job)
---
### Phase 7 — Admin UI Overhaul
**Files touched:**
- `src/routes/admin/jobs/+page.svelte` — rebuilt
- `src/routes/api/v1/workers/+server.ts` — new endpoint
- `src/lib/components/admin/JobStatusBadge.svelte` — extend with spinner variant
- `src/lib/components/admin/JobSkeleton.svelte` — new
- `src/lib/components/admin/WorkerStatusPanel.svelte` — new
- `src/lib/components/admin/Toast.svelte` — new
- `src/lib/components/IndexingProgress.svelte` — switch to SSE
#### 7a. New API endpoint: `GET /api/v1/workers`
The `WorkerPool` singleton tracks running jobs in `runningJobs: Map<Worker, RunningJob>` and idle workers in `idleWorkers: Worker[]`. Expose this state as a lightweight REST snapshot:
```typescript
// GET /api/v1/workers
// Response shape:
interface WorkersResponse {
concurrency: number; // configured max workers
active: number; // workers with a running job
idle: number; // workers waiting for work
workers: WorkerStatus[]; // one entry per spawned parse worker
}
interface WorkerStatus {
index: number; // worker slot (0-based)
state: 'idle' | 'running'; // current state
jobId: string | null; // null when idle
repositoryId: string | null;
versionId: string | null;
}
```
The route handler calls `getPool().getStatus()` — add a `getStatus(): WorkersResponse` method to `WorkerPool` that reads `runningJobs` and `idleWorkers` without any DB call. This is read-only and runs on the main thread.
The SSE stream at `/api/v1/jobs/stream` should emit a new `worker-status` event type whenever a worker transitions idle ↔ running (on `dispatch()` and job completion). This allows the worker panel to update in real-time without polling the REST endpoint.
#### 7b. `GET /api/v1/jobs` — add `repositoryId` free-text and multi-status filter
The existing endpoint already accepts `repositoryId` (exact match) and `status` (single value). Extend:
- `repositoryId` to also support prefix match (e.g. `?repositoryId=/facebook` returns all `/facebook/*` repos)
- `status` to accept comma-separated values: `?status=queued,running`
- `page` and `pageSize` query params (default pageSize=50, max 200) in addition to `limit` for backwards compat
Return `{ jobs, total, page, pageSize }` with `total` always reflecting the unfiltered-by-page count.
#### 7c. New component: `JobSkeleton.svelte`
A set of skeleton rows matching the job table structure. Shown during the initial fetch before any data arrives. Uses Tailwind `animate-pulse`:
```svelte
<!-- renders N skeleton rows -->
<script lang="ts">
let { rows = 5 }: { rows?: number } = $props();
</script>
{#each Array(rows) as _, i (i)}
<tr>
<td class="px-6 py-4">
<div class="h-4 w-48 animate-pulse rounded bg-gray-200"></div>
<div class="mt-1 h-3 w-24 animate-pulse rounded bg-gray-100"></div>
</td>
<td class="px-6 py-4">
<div class="h-5 w-16 animate-pulse rounded-full bg-gray-200"></div>
</td>
<td class="px-6 py-4">
<div class="h-4 w-20 animate-pulse rounded bg-gray-200"></div>
</td>
<td class="px-6 py-4">
<div class="h-2 w-32 animate-pulse rounded-full bg-gray-200"></div>
</td>
<td class="px-6 py-4">
<div class="h-4 w-28 animate-pulse rounded bg-gray-200"></div>
</td>
<td class="px-6 py-4 text-right">
<div class="ml-auto h-7 w-20 animate-pulse rounded bg-gray-200"></div>
</td>
</tr>
{/each}
```
#### 7d. New component: `Toast.svelte`
Replaces all `alert()` / `console.log()` calls in the jobs page. Renders a fixed-position stack in the bottom-right corner. Each toast auto-dismisses after 4 seconds and can be manually closed:
```svelte
<!-- Usage: bind a toasts array and call push({ message, type }) -->
<script lang="ts">
export interface ToastItem {
id: string;
message: string;
type: 'success' | 'error' | 'info';
}
let { toasts = $bindable([]) }: { toasts: ToastItem[] } = $props();
function dismiss(id: string) {
toasts = toasts.filter((t) => t.id !== id);
}
</script>
<div class="fixed right-4 bottom-4 z-50 flex flex-col gap-2">
{#each toasts as toast (toast.id)}
<!-- color by type, close button, auto-dismiss via onmount timer -->
{/each}
</div>
```
The jobs page replaces `showToast()` with pushing onto the bound `toasts` array. The `confirm()` for cancel is replaced with an inline confirmation state per job (`pendingCancelId`) that shows "Confirm cancel?" / "Yes" / "No" buttons inside the row.
#### 7e. New component: `WorkerStatusPanel.svelte`
A compact panel displayed above the job table showing the worker pool health. Subscribes to the `worker-status` SSE events and falls back to polling `GET /api/v1/workers` every 5 s on SSE error:
```
┌─────────────────────────────────────────────────────────┐
│ Workers [2 / 4 active] ████░░░░ 50% │
│ Worker 0 ● running /facebook/react / v18.3.0 │
│ Worker 1 ● running /facebook/react / v17.0.2 │
│ Worker 2 ○ idle │
│ Worker 3 ○ idle │
└─────────────────────────────────────────────────────────┘
```
Each worker row shows: slot index, status dot (animated green pulse for running), repository ID, version tag, and a link to the job row in the table below.
#### 7f. Filter bar on the jobs page
Add a filter strip between the page header and the table:
```
[ Repository: _______________ ] [ Status: ▾ all ] [ 🔍 Apply ] [ ↺ Reset ]
```
- **Repository field**: free-text input, matches `repositoryId` prefix (e.g. `/facebook` shows all `/facebook/*`)
- **Status dropdown**: multi-select checkboxes for `queued`, `running`, `paused`, `cancelled`, `done`, `failed`; default = all
- Filters are applied client-side against the loaded `jobs` array for instant feedback, and also re-fetched from the API on Apply to get the correct total count
- Filter state is mirrored to URL search params (`?repo=...&status=...`) so the view is bookmarkable and survives refresh
#### 7g. Per-job action spinner and disabled state
Replace the single `actionInProgress: string | null` with a `Map<string, 'pausing' | 'resuming' | 'cancelling'>`:
```typescript
let actionInProgress = $state(new Map<string, 'pausing' | 'resuming' | 'cancelling'>());
```
Each action button shows an inline spinner (small `animate-spin` circle) and is disabled only for that row. Other rows remain fully interactive during the action. On completion the entry is deleted from the map.
#### 7h. `IndexingProgress.svelte` — switch from polling to SSE
The component currently uses `setInterval + fetch` at 2 s. Replace with the per-job SSE stream already available at `/api/v1/jobs/{id}/stream`:
```typescript
// replace the $effect body
$effect(() => {
job = null;
const es = new EventSource(`/api/v1/jobs/${jobId}/stream`);
es.addEventListener('job-progress', (event) => {
const data = JSON.parse(event.data);
job = { ...job, ...data };
});
es.addEventListener('job-done', () => {
void fetch(`/api/v1/jobs/${jobId}`)
.then((r) => r.json())
.then((d) => {
job = d.job;
oncomplete?.();
});
es.close();
});
es.addEventListener('job-failed', (event) => {
const data = JSON.parse(event.data);
job = { ...job, status: 'failed', error: data.error };
oncomplete?.();
es.close();
});
es.onerror = () => {
// on SSE failure fall back to a single fetch to get current state
es.close();
void fetch(`/api/v1/jobs/${jobId}`)
.then((r) => r.json())
.then((d) => {
job = d.job;
});
};
return () => es.close();
});
```
This reduces network traffic from 1 request/2 s to zero requests during active indexing — updates arrive as server-push events.
#### 7i. Pagination on the jobs page
Replace the hard-coded `?limit=50` fetch with paginated requests:
```typescript
let currentPage = $state(1);
const PAGE_SIZE = 50;
async function fetchJobs() {
const params = new URLSearchParams({
page: String(currentPage),
pageSize: String(PAGE_SIZE),
...(filterRepo ? { repositoryId: filterRepo } : {}),
...(filterStatuses.length ? { status: filterStatuses.join(',') } : {})
});
const data = await fetch(`/api/v1/jobs?${params}`).then((r) => r.json());
jobs = data.jobs;
total = data.total;
}
```
Render a simple `« Prev Page N of M Next »` control below the table, hidden when `total <= PAGE_SIZE`.
---
## Acceptance Criteria
- [ ] `npm install` with `@libsql/better-sqlite3` succeeds; `better-sqlite3` is absent from `node_modules`
- [ ] All existing unit and integration tests pass after Phase 1 import swap
- [ ] `npm run db:migrate` applies the composite index migration cleanly against an existing database
- [ ] `npm run db:migrate` applies the vector column migration cleanly; `sql> SELECT vec_embedding FROM snippet_embeddings LIMIT 1` returns a non-NULL value for any previously-embedded snippet
- [ ] `GET /api/v1/context?libraryId=...&query=...` with a semantic-mode or hybrid-mode request returns results in ≤ 200 ms on a repository with 50k+ snippets (vs previous multi-second response)
- [ ] Memory profiled during a /context request shows no allocation spike proportional to repository size
- [ ] `EXPLAIN QUERY PLAN` on the `snippets` search query shows `SCAN snippets USING INDEX idx_snippets_repo_version` instead of `SCAN snippets`
- [ ] Worker threads (`worker-entry.ts`, `embed-worker-entry.ts`) start and complete an indexing job successfully after the package swap
- [ ] `drizzle-kit studio` connects and browses the migrated database
- [ ] Re-indexing a repository after the migration correctly populates `vec_embedding` on all new snippets
- [ ] `cosineSimilarity` unit tests still pass (function is kept)
- [ ] Starting two indexing jobs for different tags of the same repository simultaneously results in both jobs reaching `running` state concurrently (not one waiting for the other)
- [ ] Starting two indexing jobs for the **same** `(repositoryId, versionId)` pair returns the existing job (deduplication unchanged)
- [ ] With 4 parse workers and 4 concurrent tag jobs, zero `SQLITE_BUSY` errors appear in logs
- [ ] Write worker is present in the process list during active indexing (`worker_threads` inspector shows `write-worker-entry`)
- [ ] A `WriteError` from the write worker marks the originating job as `failed` with the error message propagated to the SSE stream
- [ ] `GET /api/v1/workers` returns a `WorkersResponse` JSON object with correct `active`, `idle`, and `workers[]` fields while jobs are in-flight
- [ ] The `worker-status` SSE event is emitted by `/api/v1/jobs/stream` whenever a worker transitions state
- [ ] The admin jobs page shows skeleton rows (not a blank screen) during the initial `fetchJobs()` call
- [ ] No `alert()` or `confirm()` calls exist in `admin/jobs/+page.svelte` after this change; all notifications go through `Toast.svelte`
- [ ] Pausing job A while job B is also in progress does not disable job B's action buttons
- [ ] The status filter multi-select correctly restricts the visible job list; the URL updates to reflect the filter state
- [ ] The repository prefix filter `?repositoryId=/facebook` returns all jobs whose `repositoryId` starts with `/facebook`
- [ ] Paginating past page 1 fetches the next batch from the API, not from the client-side array
- [ ] `IndexingProgress.svelte` has no `setInterval` call; it uses `EventSource` for progress updates
- [ ] The `WorkerStatusPanel` shows the correct number of running workers live during a multi-tag indexing run
- [ ] Refreshing the jobs page with `?repo=/facebook/react&status=running` pre-populates the filters and fetches with those params
---
## Migration Safety
### Backward Compatibility
The `embedding` blob column is kept. The `vec_embedding` column is nullable during the backfill window and becomes populated as:
1. The `UPDATE` in `vectors.sql` fills all existing rows on startup
2. New embeddings populate it at insert time
If `vec_embedding IS NULL` for a row (e.g., a row inserted before the migration runs), the vector index silently omits that row from results. The fallback in `HybridSearchService` to FTS-only mode still applies when no embeddings exist, so degraded-but-correct behavior is preserved.
### Rollback
Rollback before Phase 4 (vector column): remove `@libsql/better-sqlite3`, restore `better-sqlite3`, restore imports. No schema changes have been made.
Rollback after Phase 4: schema now has `vec_embedding` column. Drop the column with a migration reversal and restore imports. The `embedding` blob is intact throughout — no data loss.
### SQLite File Compatibility
libSQL embedded mode reads and writes standard SQLite 3 files. The WAL file, page size, and encoding are unchanged. An existing production database opened with `@libsql/better-sqlite3` is fully readable and writable. The vector index is stored in a shadow table `idx_snippet_embeddings_vec_shadow` which better-sqlite3 would ignore if rolled back (it is a regular table with a special name).
---
## Dependencies
| Package | Action | Reason |
| ------------------------ | ----------------------------- | ----------------------------------------------- |
| `better-sqlite3` | Remove from `dependencies` | Replaced |
| `@types/better-sqlite3` | Remove from `devDependencies` | `@libsql/better-sqlite3` ships own types |
| `@libsql/better-sqlite3` | Add to `dependencies` | Drop-in libSQL node addon |
| `drizzle-orm` | No change | `better-sqlite3` adapter works unchanged |
| `drizzle-kit` | No change | `dialect: 'sqlite'` correct for embedded libSQL |
No new runtime dependencies beyond the package replacement.
---
## Testing Strategy
### Unit Tests
- `src/lib/server/search/vector.search.ts`: add test asserting KNN results are correct for a seeded 3-vector table; verify memory is not proportional to table size (mock `db.prepare` to assert no unbounded `.all()` is called)
- `src/lib/server/embeddings/embedding.service.ts`: existing tests cover insert round-trips; verify `vec_embedding` column is non-NULL after `embedSnippets()`
### Integration Tests
- `api-contract.integration.test.ts`: existing tests already use `new Database(':memory:')` — these continue to work with `@libsql/better-sqlite3` because the in-memory path is identical
- Add one test to `api-contract.integration.test.ts`: seed a repository + multiple embeddings, call `/api/v1/context` in semantic mode, assert non-empty results and response time < 500ms on in-memory DB
### UI Tests
- `src/routes/admin/jobs/+page.svelte`: add Vitest browser tests (Playwright) verifying:
- Skeleton rows appear before the first fetch resolves (mock `fetch` to delay 200 ms)
- Status filter restricts displayed rows; URL param updates
- Pausing job A leaves job B's buttons enabled
- Toast appears and auto-dismisses on successful pause
- Cancel confirm flow shows inline confirmation, not `window.confirm`
- `src/lib/components/IndexingProgress.svelte`: unit test that no `setInterval` is created; verify `EventSource` is opened with the correct URL
### Performance Regression Gate
Add a benchmark script `scripts/bench-vector-search.mjs` that:
1. Creates an in-memory libSQL database
2. Seeds 10000 snippet embeddings (random Float32Array, 1536 dims)
3. Runs 100 `vectorSearch()` calls
4. Asserts p99 < 50 ms
This gates the CI check on Phase 4 correctness and speed.

754
package-lock.json generated
View File

@@ -11,6 +11,7 @@
"@modelcontextprotocol/sdk": "^1.27.1",
"@xenova/transformers": "^2.17.2",
"better-sqlite3": "^12.6.2",
"sqlite-vec": "^0.1.9",
"zod": "^4.3.6"
},
"devDependencies": {
@@ -25,6 +26,7 @@
"@vitest/browser-playwright": "^4.1.0",
"drizzle-kit": "^0.31.8",
"drizzle-orm": "^0.45.1",
"esbuild": "^0.24.0",
"eslint": "^9.39.2",
"eslint-config-prettier": "^10.1.8",
"eslint-plugin-svelte": "^3.14.0",
@@ -494,9 +496,9 @@
}
},
"node_modules/@esbuild/aix-ppc64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/aix-ppc64/-/aix-ppc64-0.25.12.tgz",
"integrity": "sha512-Hhmwd6CInZ3dwpuGTF8fJG6yoWmsToE+vYgD4nytZVxcu1ulHpUQRAB1UJ8+N1Am3Mz4+xOByoQoSZf4D+CpkA==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/aix-ppc64/-/aix-ppc64-0.24.2.tgz",
"integrity": "sha512-thpVCb/rhxE/BnMLQ7GReQLLN8q9qbHmI55F4489/ByVg2aQaQ6kbcLb6FHkocZzQhxc4gx0sCk0tJkKBFzDhA==",
"cpu": [
"ppc64"
],
@@ -511,9 +513,9 @@
}
},
"node_modules/@esbuild/android-arm": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/android-arm/-/android-arm-0.25.12.tgz",
"integrity": "sha512-VJ+sKvNA/GE7Ccacc9Cha7bpS8nyzVv0jdVgwNDaR4gDMC/2TTRc33Ip8qrNYUcpkOHUT5OZ0bUcNNVZQ9RLlg==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/android-arm/-/android-arm-0.24.2.tgz",
"integrity": "sha512-tmwl4hJkCfNHwFB3nBa8z1Uy3ypZpxqxfTQOcHX+xRByyYgunVbZ9MzUUfb0RxaHIMnbHagwAxuTL+tnNM+1/Q==",
"cpu": [
"arm"
],
@@ -528,9 +530,9 @@
}
},
"node_modules/@esbuild/android-arm64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/android-arm64/-/android-arm64-0.25.12.tgz",
"integrity": "sha512-6AAmLG7zwD1Z159jCKPvAxZd4y/VTO0VkprYy+3N2FtJ8+BQWFXU+OxARIwA46c5tdD9SsKGZ/1ocqBS/gAKHg==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/android-arm64/-/android-arm64-0.24.2.tgz",
"integrity": "sha512-cNLgeqCqV8WxfcTIOeL4OAtSmL8JjcN6m09XIgro1Wi7cF4t/THaWEa7eL5CMoMBdjoHOTh/vwTO/o2TRXIyzg==",
"cpu": [
"arm64"
],
@@ -545,9 +547,9 @@
}
},
"node_modules/@esbuild/android-x64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/android-x64/-/android-x64-0.25.12.tgz",
"integrity": "sha512-5jbb+2hhDHx5phYR2By8GTWEzn6I9UqR11Kwf22iKbNpYrsmRB18aX/9ivc5cabcUiAT/wM+YIZ6SG9QO6a8kg==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/android-x64/-/android-x64-0.24.2.tgz",
"integrity": "sha512-B6Q0YQDqMx9D7rvIcsXfmJfvUYLoP722bgfBlO5cGvNVb5V/+Y7nhBE3mHV9OpxBf4eAS2S68KZztiPaWq4XYw==",
"cpu": [
"x64"
],
@@ -562,9 +564,9 @@
}
},
"node_modules/@esbuild/darwin-arm64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/darwin-arm64/-/darwin-arm64-0.25.12.tgz",
"integrity": "sha512-N3zl+lxHCifgIlcMUP5016ESkeQjLj/959RxxNYIthIg+CQHInujFuXeWbWMgnTo4cp5XVHqFPmpyu9J65C1Yg==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/darwin-arm64/-/darwin-arm64-0.24.2.tgz",
"integrity": "sha512-kj3AnYWc+CekmZnS5IPu9D+HWtUI49hbnyqk0FLEJDbzCIQt7hg7ucF1SQAilhtYpIujfaHr6O0UHlzzSPdOeA==",
"cpu": [
"arm64"
],
@@ -579,9 +581,9 @@
}
},
"node_modules/@esbuild/darwin-x64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/darwin-x64/-/darwin-x64-0.25.12.tgz",
"integrity": "sha512-HQ9ka4Kx21qHXwtlTUVbKJOAnmG1ipXhdWTmNXiPzPfWKpXqASVcWdnf2bnL73wgjNrFXAa3yYvBSd9pzfEIpA==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/darwin-x64/-/darwin-x64-0.24.2.tgz",
"integrity": "sha512-WeSrmwwHaPkNR5H3yYfowhZcbriGqooyu3zI/3GGpF8AyUdsrrP0X6KumITGA9WOyiJavnGZUwPGvxvwfWPHIA==",
"cpu": [
"x64"
],
@@ -596,9 +598,9 @@
}
},
"node_modules/@esbuild/freebsd-arm64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/freebsd-arm64/-/freebsd-arm64-0.25.12.tgz",
"integrity": "sha512-gA0Bx759+7Jve03K1S0vkOu5Lg/85dou3EseOGUes8flVOGxbhDDh/iZaoek11Y8mtyKPGF3vP8XhnkDEAmzeg==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/freebsd-arm64/-/freebsd-arm64-0.24.2.tgz",
"integrity": "sha512-UN8HXjtJ0k/Mj6a9+5u6+2eZ2ERD7Edt1Q9IZiB5UZAIdPnVKDoG7mdTVGhHJIeEml60JteamR3qhsr1r8gXvg==",
"cpu": [
"arm64"
],
@@ -613,9 +615,9 @@
}
},
"node_modules/@esbuild/freebsd-x64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/freebsd-x64/-/freebsd-x64-0.25.12.tgz",
"integrity": "sha512-TGbO26Yw2xsHzxtbVFGEXBFH0FRAP7gtcPE7P5yP7wGy7cXK2oO7RyOhL5NLiqTlBh47XhmIUXuGciXEqYFfBQ==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/freebsd-x64/-/freebsd-x64-0.24.2.tgz",
"integrity": "sha512-TvW7wE/89PYW+IevEJXZ5sF6gJRDY/14hyIGFXdIucxCsbRmLUcjseQu1SyTko+2idmCw94TgyaEZi9HUSOe3Q==",
"cpu": [
"x64"
],
@@ -630,9 +632,9 @@
}
},
"node_modules/@esbuild/linux-arm": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/linux-arm/-/linux-arm-0.25.12.tgz",
"integrity": "sha512-lPDGyC1JPDou8kGcywY0YILzWlhhnRjdof3UlcoqYmS9El818LLfJJc3PXXgZHrHCAKs/Z2SeZtDJr5MrkxtOw==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/linux-arm/-/linux-arm-0.24.2.tgz",
"integrity": "sha512-n0WRM/gWIdU29J57hJyUdIsk0WarGd6To0s+Y+LwvlC55wt+GT/OgkwoXCXvIue1i1sSNWblHEig00GBWiJgfA==",
"cpu": [
"arm"
],
@@ -647,9 +649,9 @@
}
},
"node_modules/@esbuild/linux-arm64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/linux-arm64/-/linux-arm64-0.25.12.tgz",
"integrity": "sha512-8bwX7a8FghIgrupcxb4aUmYDLp8pX06rGh5HqDT7bB+8Rdells6mHvrFHHW2JAOPZUbnjUpKTLg6ECyzvas2AQ==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/linux-arm64/-/linux-arm64-0.24.2.tgz",
"integrity": "sha512-7HnAD6074BW43YvvUmE/35Id9/NB7BeX5EoNkK9obndmZBUk8xmJJeU7DwmUeN7tkysslb2eSl6CTrYz6oEMQg==",
"cpu": [
"arm64"
],
@@ -664,9 +666,9 @@
}
},
"node_modules/@esbuild/linux-ia32": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/linux-ia32/-/linux-ia32-0.25.12.tgz",
"integrity": "sha512-0y9KrdVnbMM2/vG8KfU0byhUN+EFCny9+8g202gYqSSVMonbsCfLjUO+rCci7pM0WBEtz+oK/PIwHkzxkyharA==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/linux-ia32/-/linux-ia32-0.24.2.tgz",
"integrity": "sha512-sfv0tGPQhcZOgTKO3oBE9xpHuUqguHvSo4jl+wjnKwFpapx+vUDcawbwPNuBIAYdRAvIDBfZVvXprIj3HA+Ugw==",
"cpu": [
"ia32"
],
@@ -681,9 +683,9 @@
}
},
"node_modules/@esbuild/linux-loong64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/linux-loong64/-/linux-loong64-0.25.12.tgz",
"integrity": "sha512-h///Lr5a9rib/v1GGqXVGzjL4TMvVTv+s1DPoxQdz7l/AYv6LDSxdIwzxkrPW438oUXiDtwM10o9PmwS/6Z0Ng==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/linux-loong64/-/linux-loong64-0.24.2.tgz",
"integrity": "sha512-CN9AZr8kEndGooS35ntToZLTQLHEjtVB5n7dl8ZcTZMonJ7CCfStrYhrzF97eAecqVbVJ7APOEe18RPI4KLhwQ==",
"cpu": [
"loong64"
],
@@ -698,9 +700,9 @@
}
},
"node_modules/@esbuild/linux-mips64el": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/linux-mips64el/-/linux-mips64el-0.25.12.tgz",
"integrity": "sha512-iyRrM1Pzy9GFMDLsXn1iHUm18nhKnNMWscjmp4+hpafcZjrr2WbT//d20xaGljXDBYHqRcl8HnxbX6uaA/eGVw==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/linux-mips64el/-/linux-mips64el-0.24.2.tgz",
"integrity": "sha512-iMkk7qr/wl3exJATwkISxI7kTcmHKE+BlymIAbHO8xanq/TjHaaVThFF6ipWzPHryoFsesNQJPE/3wFJw4+huw==",
"cpu": [
"mips64el"
],
@@ -715,9 +717,9 @@
}
},
"node_modules/@esbuild/linux-ppc64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/linux-ppc64/-/linux-ppc64-0.25.12.tgz",
"integrity": "sha512-9meM/lRXxMi5PSUqEXRCtVjEZBGwB7P/D4yT8UG/mwIdze2aV4Vo6U5gD3+RsoHXKkHCfSxZKzmDssVlRj1QQA==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/linux-ppc64/-/linux-ppc64-0.24.2.tgz",
"integrity": "sha512-shsVrgCZ57Vr2L8mm39kO5PPIb+843FStGt7sGGoqiiWYconSxwTiuswC1VJZLCjNiMLAMh34jg4VSEQb+iEbw==",
"cpu": [
"ppc64"
],
@@ -732,9 +734,9 @@
}
},
"node_modules/@esbuild/linux-riscv64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/linux-riscv64/-/linux-riscv64-0.25.12.tgz",
"integrity": "sha512-Zr7KR4hgKUpWAwb1f3o5ygT04MzqVrGEGXGLnj15YQDJErYu/BGg+wmFlIDOdJp0PmB0lLvxFIOXZgFRrdjR0w==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/linux-riscv64/-/linux-riscv64-0.24.2.tgz",
"integrity": "sha512-4eSFWnU9Hhd68fW16GD0TINewo1L6dRrB+oLNNbYyMUAeOD2yCK5KXGK1GH4qD/kT+bTEXjsyTCiJGHPZ3eM9Q==",
"cpu": [
"riscv64"
],
@@ -749,9 +751,9 @@
}
},
"node_modules/@esbuild/linux-s390x": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/linux-s390x/-/linux-s390x-0.25.12.tgz",
"integrity": "sha512-MsKncOcgTNvdtiISc/jZs/Zf8d0cl/t3gYWX8J9ubBnVOwlk65UIEEvgBORTiljloIWnBzLs4qhzPkJcitIzIg==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/linux-s390x/-/linux-s390x-0.24.2.tgz",
"integrity": "sha512-S0Bh0A53b0YHL2XEXC20bHLuGMOhFDO6GN4b3YjRLK//Ep3ql3erpNcPlEFed93hsQAjAQDNsvcK+hV90FubSw==",
"cpu": [
"s390x"
],
@@ -766,9 +768,9 @@
}
},
"node_modules/@esbuild/linux-x64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/linux-x64/-/linux-x64-0.25.12.tgz",
"integrity": "sha512-uqZMTLr/zR/ed4jIGnwSLkaHmPjOjJvnm6TVVitAa08SLS9Z0VM8wIRx7gWbJB5/J54YuIMInDquWyYvQLZkgw==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/linux-x64/-/linux-x64-0.24.2.tgz",
"integrity": "sha512-8Qi4nQcCTbLnK9WoMjdC9NiTG6/E38RNICU6sUNqK0QFxCYgoARqVqxdFmWkdonVsvGqWhmm7MO0jyTqLqwj0Q==",
"cpu": [
"x64"
],
@@ -783,9 +785,9 @@
}
},
"node_modules/@esbuild/netbsd-arm64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/netbsd-arm64/-/netbsd-arm64-0.25.12.tgz",
"integrity": "sha512-xXwcTq4GhRM7J9A8Gv5boanHhRa/Q9KLVmcyXHCTaM4wKfIpWkdXiMog/KsnxzJ0A1+nD+zoecuzqPmCRyBGjg==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/netbsd-arm64/-/netbsd-arm64-0.24.2.tgz",
"integrity": "sha512-wuLK/VztRRpMt9zyHSazyCVdCXlpHkKm34WUyinD2lzK07FAHTq0KQvZZlXikNWkDGoT6x3TD51jKQ7gMVpopw==",
"cpu": [
"arm64"
],
@@ -800,9 +802,9 @@
}
},
"node_modules/@esbuild/netbsd-x64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/netbsd-x64/-/netbsd-x64-0.25.12.tgz",
"integrity": "sha512-Ld5pTlzPy3YwGec4OuHh1aCVCRvOXdH8DgRjfDy/oumVovmuSzWfnSJg+VtakB9Cm0gxNO9BzWkj6mtO1FMXkQ==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/netbsd-x64/-/netbsd-x64-0.24.2.tgz",
"integrity": "sha512-VefFaQUc4FMmJuAxmIHgUmfNiLXY438XrL4GDNV1Y1H/RW3qow68xTwjZKfj/+Plp9NANmzbH5R40Meudu8mmw==",
"cpu": [
"x64"
],
@@ -817,9 +819,9 @@
}
},
"node_modules/@esbuild/openbsd-arm64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/openbsd-arm64/-/openbsd-arm64-0.25.12.tgz",
"integrity": "sha512-fF96T6KsBo/pkQI950FARU9apGNTSlZGsv1jZBAlcLL1MLjLNIWPBkj5NlSz8aAzYKg+eNqknrUJ24QBybeR5A==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/openbsd-arm64/-/openbsd-arm64-0.24.2.tgz",
"integrity": "sha512-YQbi46SBct6iKnszhSvdluqDmxCJA+Pu280Av9WICNwQmMxV7nLRHZfjQzwbPs3jeWnuAhE9Jy0NrnJ12Oz+0A==",
"cpu": [
"arm64"
],
@@ -834,9 +836,9 @@
}
},
"node_modules/@esbuild/openbsd-x64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/openbsd-x64/-/openbsd-x64-0.25.12.tgz",
"integrity": "sha512-MZyXUkZHjQxUvzK7rN8DJ3SRmrVrke8ZyRusHlP+kuwqTcfWLyqMOE3sScPPyeIXN/mDJIfGXvcMqCgYKekoQw==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/openbsd-x64/-/openbsd-x64-0.24.2.tgz",
"integrity": "sha512-+iDS6zpNM6EnJyWv0bMGLWSWeXGN/HTaF/LXHXHwejGsVi+ooqDfMCCTerNFxEkM3wYVcExkeGXNqshc9iMaOA==",
"cpu": [
"x64"
],
@@ -868,9 +870,9 @@
}
},
"node_modules/@esbuild/sunos-x64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/sunos-x64/-/sunos-x64-0.25.12.tgz",
"integrity": "sha512-3wGSCDyuTHQUzt0nV7bocDy72r2lI33QL3gkDNGkod22EsYl04sMf0qLb8luNKTOmgF/eDEDP5BFNwoBKH441w==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/sunos-x64/-/sunos-x64-0.24.2.tgz",
"integrity": "sha512-hTdsW27jcktEvpwNHJU4ZwWFGkz2zRJUz8pvddmXPtXDzVKTTINmlmga3ZzwcuMpUvLw7JkLy9QLKyGpD2Yxig==",
"cpu": [
"x64"
],
@@ -885,9 +887,9 @@
}
},
"node_modules/@esbuild/win32-arm64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/win32-arm64/-/win32-arm64-0.25.12.tgz",
"integrity": "sha512-rMmLrur64A7+DKlnSuwqUdRKyd3UE7oPJZmnljqEptesKM8wx9J8gx5u0+9Pq0fQQW8vqeKebwNXdfOyP+8Bsg==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/win32-arm64/-/win32-arm64-0.24.2.tgz",
"integrity": "sha512-LihEQ2BBKVFLOC9ZItT9iFprsE9tqjDjnbulhHoFxYQtQfai7qfluVODIYxt1PgdoyQkz23+01rzwNwYfutxUQ==",
"cpu": [
"arm64"
],
@@ -902,9 +904,9 @@
}
},
"node_modules/@esbuild/win32-ia32": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/win32-ia32/-/win32-ia32-0.25.12.tgz",
"integrity": "sha512-HkqnmmBoCbCwxUKKNPBixiWDGCpQGVsrQfJoVGYLPT41XWF8lHuE5N6WhVia2n4o5QK5M4tYr21827fNhi4byQ==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/win32-ia32/-/win32-ia32-0.24.2.tgz",
"integrity": "sha512-q+iGUwfs8tncmFC9pcnD5IvRHAzmbwQ3GPS5/ceCyHdjXubwQWI12MKWSNSMYLJMq23/IUCvJMS76PDqXe1fxA==",
"cpu": [
"ia32"
],
@@ -919,9 +921,9 @@
}
},
"node_modules/@esbuild/win32-x64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/win32-x64/-/win32-x64-0.25.12.tgz",
"integrity": "sha512-alJC0uCZpTFrSL0CCDjcgleBXPnCrEAhTBILpeAp7M/OFgoqtAetfBzX0xM00MUsVVPpVjlPuMbREqnZCXaTnA==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/@esbuild/win32-x64/-/win32-x64-0.24.2.tgz",
"integrity": "sha512-7VTgWzgMGvup6aSqDPLiW5zHaxYJGTO4OokMjIlrCtf+VpEL+cXKtCvg723iguPYI5oaUNdS+/V7OU2gvXVWEg==",
"cpu": [
"x64"
],
@@ -3537,6 +3539,473 @@
"drizzle-kit": "bin.cjs"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/aix-ppc64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/aix-ppc64/-/aix-ppc64-0.25.12.tgz",
"integrity": "sha512-Hhmwd6CInZ3dwpuGTF8fJG6yoWmsToE+vYgD4nytZVxcu1ulHpUQRAB1UJ8+N1Am3Mz4+xOByoQoSZf4D+CpkA==",
"cpu": [
"ppc64"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"aix"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/android-arm": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/android-arm/-/android-arm-0.25.12.tgz",
"integrity": "sha512-VJ+sKvNA/GE7Ccacc9Cha7bpS8nyzVv0jdVgwNDaR4gDMC/2TTRc33Ip8qrNYUcpkOHUT5OZ0bUcNNVZQ9RLlg==",
"cpu": [
"arm"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"android"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/android-arm64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/android-arm64/-/android-arm64-0.25.12.tgz",
"integrity": "sha512-6AAmLG7zwD1Z159jCKPvAxZd4y/VTO0VkprYy+3N2FtJ8+BQWFXU+OxARIwA46c5tdD9SsKGZ/1ocqBS/gAKHg==",
"cpu": [
"arm64"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"android"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/android-x64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/android-x64/-/android-x64-0.25.12.tgz",
"integrity": "sha512-5jbb+2hhDHx5phYR2By8GTWEzn6I9UqR11Kwf22iKbNpYrsmRB18aX/9ivc5cabcUiAT/wM+YIZ6SG9QO6a8kg==",
"cpu": [
"x64"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"android"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/darwin-arm64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/darwin-arm64/-/darwin-arm64-0.25.12.tgz",
"integrity": "sha512-N3zl+lxHCifgIlcMUP5016ESkeQjLj/959RxxNYIthIg+CQHInujFuXeWbWMgnTo4cp5XVHqFPmpyu9J65C1Yg==",
"cpu": [
"arm64"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"darwin"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/darwin-x64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/darwin-x64/-/darwin-x64-0.25.12.tgz",
"integrity": "sha512-HQ9ka4Kx21qHXwtlTUVbKJOAnmG1ipXhdWTmNXiPzPfWKpXqASVcWdnf2bnL73wgjNrFXAa3yYvBSd9pzfEIpA==",
"cpu": [
"x64"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"darwin"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/freebsd-arm64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/freebsd-arm64/-/freebsd-arm64-0.25.12.tgz",
"integrity": "sha512-gA0Bx759+7Jve03K1S0vkOu5Lg/85dou3EseOGUes8flVOGxbhDDh/iZaoek11Y8mtyKPGF3vP8XhnkDEAmzeg==",
"cpu": [
"arm64"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"freebsd"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/freebsd-x64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/freebsd-x64/-/freebsd-x64-0.25.12.tgz",
"integrity": "sha512-TGbO26Yw2xsHzxtbVFGEXBFH0FRAP7gtcPE7P5yP7wGy7cXK2oO7RyOhL5NLiqTlBh47XhmIUXuGciXEqYFfBQ==",
"cpu": [
"x64"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"freebsd"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/linux-arm": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/linux-arm/-/linux-arm-0.25.12.tgz",
"integrity": "sha512-lPDGyC1JPDou8kGcywY0YILzWlhhnRjdof3UlcoqYmS9El818LLfJJc3PXXgZHrHCAKs/Z2SeZtDJr5MrkxtOw==",
"cpu": [
"arm"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/linux-arm64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/linux-arm64/-/linux-arm64-0.25.12.tgz",
"integrity": "sha512-8bwX7a8FghIgrupcxb4aUmYDLp8pX06rGh5HqDT7bB+8Rdells6mHvrFHHW2JAOPZUbnjUpKTLg6ECyzvas2AQ==",
"cpu": [
"arm64"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/linux-ia32": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/linux-ia32/-/linux-ia32-0.25.12.tgz",
"integrity": "sha512-0y9KrdVnbMM2/vG8KfU0byhUN+EFCny9+8g202gYqSSVMonbsCfLjUO+rCci7pM0WBEtz+oK/PIwHkzxkyharA==",
"cpu": [
"ia32"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/linux-loong64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/linux-loong64/-/linux-loong64-0.25.12.tgz",
"integrity": "sha512-h///Lr5a9rib/v1GGqXVGzjL4TMvVTv+s1DPoxQdz7l/AYv6LDSxdIwzxkrPW438oUXiDtwM10o9PmwS/6Z0Ng==",
"cpu": [
"loong64"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/linux-mips64el": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/linux-mips64el/-/linux-mips64el-0.25.12.tgz",
"integrity": "sha512-iyRrM1Pzy9GFMDLsXn1iHUm18nhKnNMWscjmp4+hpafcZjrr2WbT//d20xaGljXDBYHqRcl8HnxbX6uaA/eGVw==",
"cpu": [
"mips64el"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/linux-ppc64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/linux-ppc64/-/linux-ppc64-0.25.12.tgz",
"integrity": "sha512-9meM/lRXxMi5PSUqEXRCtVjEZBGwB7P/D4yT8UG/mwIdze2aV4Vo6U5gD3+RsoHXKkHCfSxZKzmDssVlRj1QQA==",
"cpu": [
"ppc64"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/linux-riscv64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/linux-riscv64/-/linux-riscv64-0.25.12.tgz",
"integrity": "sha512-Zr7KR4hgKUpWAwb1f3o5ygT04MzqVrGEGXGLnj15YQDJErYu/BGg+wmFlIDOdJp0PmB0lLvxFIOXZgFRrdjR0w==",
"cpu": [
"riscv64"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/linux-s390x": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/linux-s390x/-/linux-s390x-0.25.12.tgz",
"integrity": "sha512-MsKncOcgTNvdtiISc/jZs/Zf8d0cl/t3gYWX8J9ubBnVOwlk65UIEEvgBORTiljloIWnBzLs4qhzPkJcitIzIg==",
"cpu": [
"s390x"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/linux-x64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/linux-x64/-/linux-x64-0.25.12.tgz",
"integrity": "sha512-uqZMTLr/zR/ed4jIGnwSLkaHmPjOjJvnm6TVVitAa08SLS9Z0VM8wIRx7gWbJB5/J54YuIMInDquWyYvQLZkgw==",
"cpu": [
"x64"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/netbsd-arm64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/netbsd-arm64/-/netbsd-arm64-0.25.12.tgz",
"integrity": "sha512-xXwcTq4GhRM7J9A8Gv5boanHhRa/Q9KLVmcyXHCTaM4wKfIpWkdXiMog/KsnxzJ0A1+nD+zoecuzqPmCRyBGjg==",
"cpu": [
"arm64"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"netbsd"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/netbsd-x64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/netbsd-x64/-/netbsd-x64-0.25.12.tgz",
"integrity": "sha512-Ld5pTlzPy3YwGec4OuHh1aCVCRvOXdH8DgRjfDy/oumVovmuSzWfnSJg+VtakB9Cm0gxNO9BzWkj6mtO1FMXkQ==",
"cpu": [
"x64"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"netbsd"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/openbsd-arm64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/openbsd-arm64/-/openbsd-arm64-0.25.12.tgz",
"integrity": "sha512-fF96T6KsBo/pkQI950FARU9apGNTSlZGsv1jZBAlcLL1MLjLNIWPBkj5NlSz8aAzYKg+eNqknrUJ24QBybeR5A==",
"cpu": [
"arm64"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"openbsd"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/openbsd-x64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/openbsd-x64/-/openbsd-x64-0.25.12.tgz",
"integrity": "sha512-MZyXUkZHjQxUvzK7rN8DJ3SRmrVrke8ZyRusHlP+kuwqTcfWLyqMOE3sScPPyeIXN/mDJIfGXvcMqCgYKekoQw==",
"cpu": [
"x64"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"openbsd"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/sunos-x64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/sunos-x64/-/sunos-x64-0.25.12.tgz",
"integrity": "sha512-3wGSCDyuTHQUzt0nV7bocDy72r2lI33QL3gkDNGkod22EsYl04sMf0qLb8luNKTOmgF/eDEDP5BFNwoBKH441w==",
"cpu": [
"x64"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"sunos"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/win32-arm64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/win32-arm64/-/win32-arm64-0.25.12.tgz",
"integrity": "sha512-rMmLrur64A7+DKlnSuwqUdRKyd3UE7oPJZmnljqEptesKM8wx9J8gx5u0+9Pq0fQQW8vqeKebwNXdfOyP+8Bsg==",
"cpu": [
"arm64"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"win32"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/win32-ia32": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/win32-ia32/-/win32-ia32-0.25.12.tgz",
"integrity": "sha512-HkqnmmBoCbCwxUKKNPBixiWDGCpQGVsrQfJoVGYLPT41XWF8lHuE5N6WhVia2n4o5QK5M4tYr21827fNhi4byQ==",
"cpu": [
"ia32"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"win32"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/@esbuild/win32-x64": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/@esbuild/win32-x64/-/win32-x64-0.25.12.tgz",
"integrity": "sha512-alJC0uCZpTFrSL0CCDjcgleBXPnCrEAhTBILpeAp7M/OFgoqtAetfBzX0xM00MUsVVPpVjlPuMbREqnZCXaTnA==",
"cpu": [
"x64"
],
"dev": true,
"license": "MIT",
"optional": true,
"os": [
"win32"
],
"engines": {
"node": ">=18"
}
},
"node_modules/drizzle-kit/node_modules/esbuild": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/esbuild/-/esbuild-0.25.12.tgz",
"integrity": "sha512-bbPBYYrtZbkt6Os6FiTLCTFxvq4tt3JKall1vRwshA3fdVztsLAatFaZobhkBC8/BrPetoa0oksYoKXoG4ryJg==",
"dev": true,
"hasInstallScript": true,
"license": "MIT",
"bin": {
"esbuild": "bin/esbuild"
},
"engines": {
"node": ">=18"
},
"optionalDependencies": {
"@esbuild/aix-ppc64": "0.25.12",
"@esbuild/android-arm": "0.25.12",
"@esbuild/android-arm64": "0.25.12",
"@esbuild/android-x64": "0.25.12",
"@esbuild/darwin-arm64": "0.25.12",
"@esbuild/darwin-x64": "0.25.12",
"@esbuild/freebsd-arm64": "0.25.12",
"@esbuild/freebsd-x64": "0.25.12",
"@esbuild/linux-arm": "0.25.12",
"@esbuild/linux-arm64": "0.25.12",
"@esbuild/linux-ia32": "0.25.12",
"@esbuild/linux-loong64": "0.25.12",
"@esbuild/linux-mips64el": "0.25.12",
"@esbuild/linux-ppc64": "0.25.12",
"@esbuild/linux-riscv64": "0.25.12",
"@esbuild/linux-s390x": "0.25.12",
"@esbuild/linux-x64": "0.25.12",
"@esbuild/netbsd-arm64": "0.25.12",
"@esbuild/netbsd-x64": "0.25.12",
"@esbuild/openbsd-arm64": "0.25.12",
"@esbuild/openbsd-x64": "0.25.12",
"@esbuild/openharmony-arm64": "0.25.12",
"@esbuild/sunos-x64": "0.25.12",
"@esbuild/win32-arm64": "0.25.12",
"@esbuild/win32-ia32": "0.25.12",
"@esbuild/win32-x64": "0.25.12"
}
},
"node_modules/drizzle-orm": {
"version": "0.45.1",
"resolved": "https://registry.npmjs.org/drizzle-orm/-/drizzle-orm-0.45.1.tgz",
@@ -3753,9 +4222,9 @@
}
},
"node_modules/esbuild": {
"version": "0.25.12",
"resolved": "https://registry.npmjs.org/esbuild/-/esbuild-0.25.12.tgz",
"integrity": "sha512-bbPBYYrtZbkt6Os6FiTLCTFxvq4tt3JKall1vRwshA3fdVztsLAatFaZobhkBC8/BrPetoa0oksYoKXoG4ryJg==",
"version": "0.24.2",
"resolved": "https://registry.npmjs.org/esbuild/-/esbuild-0.24.2.tgz",
"integrity": "sha512-+9egpBW8I3CD5XPe0n6BfT5fxLzxrlDzqydF3aviG+9ni1lDC/OvMHcxqEFV0+LANZG5R1bFMWfUrjVsdwxJvA==",
"dev": true,
"hasInstallScript": true,
"license": "MIT",
@@ -3766,32 +4235,31 @@
"node": ">=18"
},
"optionalDependencies": {
"@esbuild/aix-ppc64": "0.25.12",
"@esbuild/android-arm": "0.25.12",
"@esbuild/android-arm64": "0.25.12",
"@esbuild/android-x64": "0.25.12",
"@esbuild/darwin-arm64": "0.25.12",
"@esbuild/darwin-x64": "0.25.12",
"@esbuild/freebsd-arm64": "0.25.12",
"@esbuild/freebsd-x64": "0.25.12",
"@esbuild/linux-arm": "0.25.12",
"@esbuild/linux-arm64": "0.25.12",
"@esbuild/linux-ia32": "0.25.12",
"@esbuild/linux-loong64": "0.25.12",
"@esbuild/linux-mips64el": "0.25.12",
"@esbuild/linux-ppc64": "0.25.12",
"@esbuild/linux-riscv64": "0.25.12",
"@esbuild/linux-s390x": "0.25.12",
"@esbuild/linux-x64": "0.25.12",
"@esbuild/netbsd-arm64": "0.25.12",
"@esbuild/netbsd-x64": "0.25.12",
"@esbuild/openbsd-arm64": "0.25.12",
"@esbuild/openbsd-x64": "0.25.12",
"@esbuild/openharmony-arm64": "0.25.12",
"@esbuild/sunos-x64": "0.25.12",
"@esbuild/win32-arm64": "0.25.12",
"@esbuild/win32-ia32": "0.25.12",
"@esbuild/win32-x64": "0.25.12"
"@esbuild/aix-ppc64": "0.24.2",
"@esbuild/android-arm": "0.24.2",
"@esbuild/android-arm64": "0.24.2",
"@esbuild/android-x64": "0.24.2",
"@esbuild/darwin-arm64": "0.24.2",
"@esbuild/darwin-x64": "0.24.2",
"@esbuild/freebsd-arm64": "0.24.2",
"@esbuild/freebsd-x64": "0.24.2",
"@esbuild/linux-arm": "0.24.2",
"@esbuild/linux-arm64": "0.24.2",
"@esbuild/linux-ia32": "0.24.2",
"@esbuild/linux-loong64": "0.24.2",
"@esbuild/linux-mips64el": "0.24.2",
"@esbuild/linux-ppc64": "0.24.2",
"@esbuild/linux-riscv64": "0.24.2",
"@esbuild/linux-s390x": "0.24.2",
"@esbuild/linux-x64": "0.24.2",
"@esbuild/netbsd-arm64": "0.24.2",
"@esbuild/netbsd-x64": "0.24.2",
"@esbuild/openbsd-arm64": "0.24.2",
"@esbuild/openbsd-x64": "0.24.2",
"@esbuild/sunos-x64": "0.24.2",
"@esbuild/win32-arm64": "0.24.2",
"@esbuild/win32-ia32": "0.24.2",
"@esbuild/win32-x64": "0.24.2"
}
},
"node_modules/escape-html": {
@@ -6527,6 +6995,84 @@
"source-map": "^0.6.0"
}
},
"node_modules/sqlite-vec": {
"version": "0.1.9",
"resolved": "https://registry.npmjs.org/sqlite-vec/-/sqlite-vec-0.1.9.tgz",
"integrity": "sha512-L7XJWRIBNvR9O5+vh1FQ+IGkh/3D2AzVksW5gdtk28m78Hy8skFD0pqReKH1Yp0/BUKRGcffgKvyO/EON5JXpA==",
"license": "MIT OR Apache",
"optionalDependencies": {
"sqlite-vec-darwin-arm64": "0.1.9",
"sqlite-vec-darwin-x64": "0.1.9",
"sqlite-vec-linux-arm64": "0.1.9",
"sqlite-vec-linux-x64": "0.1.9",
"sqlite-vec-windows-x64": "0.1.9"
}
},
"node_modules/sqlite-vec-darwin-arm64": {
"version": "0.1.9",
"resolved": "https://registry.npmjs.org/sqlite-vec-darwin-arm64/-/sqlite-vec-darwin-arm64-0.1.9.tgz",
"integrity": "sha512-jSsZpE42OfBkGL/ItyJTVCUwl6o6Ka3U5rc4j+UBDIQzC1ulSSKMEhQLthsOnF/MdAf1MuAkYhkdKmmcjaIZQg==",
"cpu": [
"arm64"
],
"license": "MIT OR Apache",
"optional": true,
"os": [
"darwin"
]
},
"node_modules/sqlite-vec-darwin-x64": {
"version": "0.1.9",
"resolved": "https://registry.npmjs.org/sqlite-vec-darwin-x64/-/sqlite-vec-darwin-x64-0.1.9.tgz",
"integrity": "sha512-KDlVyqQT7pnOhU1ymB9gs7dMbSoVmKHitT+k1/xkjarcX8bBqPxWrGlK/R+C5WmWkfvWwyq5FfXfiBYCBs6PlA==",
"cpu": [
"x64"
],
"license": "MIT OR Apache",
"optional": true,
"os": [
"darwin"
]
},
"node_modules/sqlite-vec-linux-arm64": {
"version": "0.1.9",
"resolved": "https://registry.npmjs.org/sqlite-vec-linux-arm64/-/sqlite-vec-linux-arm64-0.1.9.tgz",
"integrity": "sha512-5wXVJ9c9kR4CHm/wVqXb/R+XUHTdpZ4nWbPHlS+gc9qQFVHs92Km4bPnCKX4rtcPMzvNis+SIzMJR1SCEwpuUw==",
"cpu": [
"arm64"
],
"license": "MIT OR Apache",
"optional": true,
"os": [
"linux"
]
},
"node_modules/sqlite-vec-linux-x64": {
"version": "0.1.9",
"resolved": "https://registry.npmjs.org/sqlite-vec-linux-x64/-/sqlite-vec-linux-x64-0.1.9.tgz",
"integrity": "sha512-w3tCH8xK2finW8fQJ/m8uqKodXUZ9KAuAar2UIhz4BHILfpE0WM/MTGCRfa7RjYbrYim5Luk3guvMOGI7T7JQA==",
"cpu": [
"x64"
],
"license": "MIT OR Apache",
"optional": true,
"os": [
"linux"
]
},
"node_modules/sqlite-vec-windows-x64": {
"version": "0.1.9",
"resolved": "https://registry.npmjs.org/sqlite-vec-windows-x64/-/sqlite-vec-windows-x64-0.1.9.tgz",
"integrity": "sha512-y3gEIyy/17bq2QFPQOWLE68TYWcRZkBQVA2XLrTPHNTOp55xJi/BBBmOm40tVMDMjtP+Elpk6UBUXdaq+46b0Q==",
"cpu": [
"x64"
],
"license": "MIT OR Apache",
"optional": true,
"os": [
"win32"
]
},
"node_modules/stackback": {
"version": "0.0.2",
"resolved": "https://registry.npmjs.org/stackback/-/stackback-0.0.2.tgz",

View File

@@ -5,7 +5,7 @@
"type": "module",
"scripts": {
"dev": "vite dev",
"build": "vite build",
"build": "vite build && node scripts/build-workers.mjs",
"preview": "vite preview",
"prepare": "svelte-kit sync || echo ''",
"check": "svelte-kit sync && svelte-check --tsconfig ./tsconfig.json",
@@ -34,6 +34,7 @@
"@vitest/browser-playwright": "^4.1.0",
"drizzle-kit": "^0.31.8",
"drizzle-orm": "^0.45.1",
"esbuild": "^0.24.0",
"eslint": "^9.39.2",
"eslint-config-prettier": "^10.1.8",
"eslint-plugin-svelte": "^3.14.0",
@@ -55,6 +56,7 @@
"@modelcontextprotocol/sdk": "^1.27.1",
"@xenova/transformers": "^2.17.2",
"better-sqlite3": "^12.6.2",
"sqlite-vec": "^0.1.9",
"zod": "^4.3.6"
}
}

39
scripts/build-workers.mjs Normal file
View File

@@ -0,0 +1,39 @@
import * as esbuild from 'esbuild';
import { existsSync } from 'node:fs';
const entries = [
'src/lib/server/pipeline/worker-entry.ts',
'src/lib/server/pipeline/embed-worker-entry.ts',
'src/lib/server/pipeline/write-worker-entry.ts'
];
try {
const existing = entries.filter((e) => existsSync(e));
if (existing.length === 0) {
console.log('[build-workers] No worker entry files found yet, skipping.');
process.exit(0);
}
await esbuild.build({
entryPoints: existing,
bundle: true,
platform: 'node',
target: 'node20',
format: 'esm',
outdir: 'build/workers',
outExtension: { '.js': '.mjs' },
alias: {
$lib: './src/lib',
'$lib/server': './src/lib/server'
},
external: ['better-sqlite3', '@xenova/transformers'],
banner: {
js: "import { createRequire } from 'module'; const require = createRequire(import.meta.url);"
}
});
console.log(`[build-workers] Compiled ${existing.length} worker(s) to build/workers/`);
} catch (err) {
console.error('[build-workers] Error:', err);
process.exit(1);
}

View File

@@ -16,6 +16,7 @@ import {
type EmbeddingProfileEntityProps
} from '$lib/server/models/embedding-profile.js';
import { EmbeddingProfileMapper } from '$lib/server/mappers/embedding-profile.mapper.js';
import { env } from '$env/dynamic/private';
import type { Handle } from '@sveltejs/kit';
// ---------------------------------------------------------------------------
@@ -24,12 +25,18 @@ import type { Handle } from '@sveltejs/kit';
try {
initializeDatabase();
} catch (err) {
console.error('[hooks.server] FATAL: database initialisation failed:', err);
process.exit(1);
}
try {
const db = getClient();
const activeProfileRow = db
.prepare<[], EmbeddingProfileEntityProps>(
'SELECT * FROM embedding_profiles WHERE is_default = 1 AND enabled = 1 LIMIT 1'
)
.prepare<
[],
EmbeddingProfileEntityProps
>('SELECT * FROM embedding_profiles WHERE is_default = 1 AND enabled = 1 LIMIT 1')
.get();
let embeddingService: EmbeddingService | null = null;
@@ -42,11 +49,35 @@ try {
embeddingService = new EmbeddingService(db, provider, activeProfile.id);
}
initializePipeline(db, embeddingService);
// Read database path from environment
const dbPath = env.DATABASE_URL;
// Read indexing concurrency setting from database
let concurrency = 2; // default
if (dbPath) {
const concurrencyRow = db
.prepare<
[],
{ value: string }
>("SELECT value FROM settings WHERE key = 'indexing.concurrency' LIMIT 1")
.get();
if (concurrencyRow) {
try {
const parsed = JSON.parse(concurrencyRow.value);
concurrency = parsed.value ?? 2;
} catch {
// If parsing fails, use default
concurrency = 2;
}
}
}
initializePipeline(db, embeddingService, { concurrency, dbPath });
console.log('[hooks.server] Indexing pipeline initialised.');
} catch (err) {
console.error(
`[hooks.server] Failed to initialise server: ${err instanceof Error ? err.message : String(err)}`
'[hooks.server] Failed to initialise pipeline:',
err instanceof Error ? err.message : String(err)
);
}

View File

@@ -1,40 +1,47 @@
<script lang="ts">
import type { IndexingJob } from '$lib/types';
let { jobId }: { jobId: string } = $props();
let { jobId, oncomplete }: { jobId: string; oncomplete?: () => void } = $props();
let job = $state<IndexingJob | null>(null);
$effect(() => {
job = null;
let stopped = false;
const es = new EventSource(`/api/v1/jobs/${jobId}/stream`);
async function poll() {
if (stopped) return;
try {
const res = await fetch(`/api/v1/jobs/${jobId}`);
if (res.ok) {
const data = await res.json();
job = data.job;
}
} catch {
// ignore transient errors
}
}
es.addEventListener('job-progress', (event) => {
const data = JSON.parse(event.data);
job = { ...job, ...data } as IndexingJob;
});
void poll();
const interval = setInterval(() => {
if (job?.status === 'done' || job?.status === 'failed') {
clearInterval(interval);
return;
}
void poll();
}, 2000);
es.addEventListener('job-done', () => {
void fetch(`/api/v1/jobs/${jobId}`)
.then((r) => r.json())
.then((d) => {
job = d.job;
oncomplete?.();
});
es.close();
});
return () => {
stopped = true;
clearInterval(interval);
es.addEventListener('job-failed', (event) => {
const data = JSON.parse(event.data);
if (job)
job = { ...job, status: 'failed', error: data.error ?? 'Unknown error' } as IndexingJob;
oncomplete?.();
es.close();
});
es.onerror = () => {
es.close();
void fetch(`/api/v1/jobs/${jobId}`)
.then((r) => r.json())
.then((d) => {
job = d.job;
});
};
return () => es.close();
});
const progress = $derived(job?.progress ?? 0);

View File

@@ -1,5 +1,5 @@
<script lang="ts">
import { resolve as resolveRoute } from '$app/paths';
import { resolve } from '$app/paths';
type RepositoryCardRepo = {
id: string;
@@ -38,10 +38,6 @@
error: 'Error'
};
const detailsHref = $derived(
resolveRoute('/repos/[id]', { id: encodeURIComponent(repo.id) })
);
const totalSnippets = $derived(repo.totalSnippets ?? 0);
const trustScore = $derived(repo.trustScore ?? 0);
const embeddingCount = $derived(repo.embeddingCount ?? 0);
@@ -112,7 +108,7 @@
{repo.state === 'indexing' ? 'Indexing...' : 'Re-index'}
</button>
<a
href={detailsHref}
href={resolve('/repos/[id]', { id: encodeURIComponent(repo.id) })}
class="rounded-lg border border-gray-200 px-3 py-1.5 text-sm text-gray-700 hover:bg-gray-50"
>
Details

View File

@@ -5,7 +5,7 @@ import RepositoryCard from './RepositoryCard.svelte';
describe('RepositoryCard.svelte', () => {
it('encodes slash-bearing repository ids in the details href', async () => {
render(RepositoryCard, {
const { container } = await render(RepositoryCard, {
repo: {
id: '/facebook/react',
title: 'React',
@@ -26,7 +26,8 @@ describe('RepositoryCard.svelte', () => {
.element(page.getByRole('link', { name: 'Details' }))
.toHaveAttribute('href', '/repos/%2Ffacebook%2Freact');
await expect.element(page.getByText('1,200 embeddings')).toBeInTheDocument();
await expect.element(page.getByText('Indexed: main, v18.3.0')).toBeInTheDocument();
const text = container.textContent ?? '';
expect(text).toMatch(/1[,.\u00a0\u202f]?200 embeddings/);
expect(text).toContain('Indexed: main, v18.3.0');
});
});

View File

@@ -0,0 +1,20 @@
<script lang="ts">
let { rows = 5 }: { rows?: number } = $props();
const rowIndexes = $derived(Array.from({ length: rows }, (_, index) => index));
</script>
{#each rowIndexes as i (i)}
<tr>
<td class="px-6 py-4">
<div class="h-4 w-48 animate-pulse rounded bg-gray-200"></div>
<div class="mt-1 h-3 w-24 animate-pulse rounded bg-gray-100"></div>
</td>
<td class="px-6 py-4"><div class="h-5 w-16 animate-pulse rounded-full bg-gray-200"></div></td>
<td class="px-6 py-4"><div class="h-4 w-20 animate-pulse rounded bg-gray-200"></div></td>
<td class="px-6 py-4"><div class="h-2 w-32 animate-pulse rounded-full bg-gray-200"></div></td>
<td class="px-6 py-4"><div class="h-4 w-28 animate-pulse rounded bg-gray-200"></div></td>
<td class="px-6 py-4 text-right"
><div class="ml-auto h-7 w-20 animate-pulse rounded bg-gray-200"></div></td
>
</tr>
{/each}

View File

@@ -1,9 +1,10 @@
<script lang="ts">
interface Props {
status: 'queued' | 'running' | 'paused' | 'cancelled' | 'done' | 'failed';
spinning?: boolean;
}
let { status }: Props = $props();
let { status, spinning = false }: Props = $props();
const statusConfig: Record<typeof status, { bg: string; text: string; label: string }> = {
queued: { bg: 'bg-blue-100', text: 'text-blue-800', label: 'Queued' },
@@ -21,4 +22,9 @@
class="inline-flex items-center rounded-full px-2.5 py-0.5 text-xs font-medium {config.bg} {config.text}"
>
{config.label}
{#if spinning}
<span
class="ml-1 inline-block h-3 w-3 animate-spin rounded-full border-2 border-current border-r-transparent"
></span>
{/if}
</span>

View File

@@ -0,0 +1,77 @@
<script lang="ts">
import { onDestroy } from 'svelte';
import { SvelteMap } from 'svelte/reactivity';
export interface ToastItem {
id: string;
message: string;
type: 'success' | 'error' | 'info';
}
let { toasts = $bindable([]) }: { toasts: ToastItem[] } = $props();
const timers = new SvelteMap<string, ReturnType<typeof setTimeout>>();
$effect(() => {
for (const toast of toasts) {
if (timers.has(toast.id)) {
continue;
}
const timer = setTimeout(() => {
dismiss(toast.id);
}, 4000);
timers.set(toast.id, timer);
}
for (const [id, timer] of timers.entries()) {
if (toasts.some((toast) => toast.id === id)) {
continue;
}
clearTimeout(timer);
timers.delete(id);
}
});
onDestroy(() => {
for (const timer of timers.values()) {
clearTimeout(timer);
}
timers.clear();
});
function dismiss(id: string) {
const timer = timers.get(id);
if (timer) {
clearTimeout(timer);
timers.delete(id);
}
toasts = toasts.filter((toast: ToastItem) => toast.id !== id);
}
</script>
<div class="fixed right-4 bottom-4 z-50 flex flex-col gap-2">
{#each toasts as toast (toast.id)}
<div
role="status"
aria-live="polite"
class="flex items-center gap-3 rounded-lg px-4 py-3 shadow-lg {toast.type === 'error'
? 'bg-red-600 text-white'
: toast.type === 'info'
? 'bg-blue-600 text-white'
: 'bg-green-600 text-white'}"
>
<span class="text-sm">{toast.message}</span>
<button
type="button"
aria-label="Dismiss notification"
onclick={() => dismiss(toast.id)}
class="ml-2 text-xs opacity-70 hover:opacity-100"
>
x
</button>
</div>
{/each}
</div>

View File

@@ -0,0 +1,81 @@
<script lang="ts">
interface WorkerStatus {
index: number;
state: 'idle' | 'running';
jobId: string | null;
repositoryId: string | null;
versionId: string | null;
}
interface WorkersResponse {
concurrency: number;
active: number;
idle: number;
workers: WorkerStatus[];
}
let status = $state<WorkersResponse>({ concurrency: 0, active: 0, idle: 0, workers: [] });
let pollInterval: ReturnType<typeof setInterval> | null = null;
async function fetchStatus() {
try {
const res = await fetch('/api/v1/workers');
if (res.ok) status = await res.json();
} catch {
/* ignore */
}
}
$effect(() => {
void fetchStatus();
const es = new EventSource('/api/v1/jobs/stream');
es.addEventListener('worker-status', (event) => {
try {
status = JSON.parse(event.data);
} catch {
/* ignore */
}
});
es.onerror = () => {
es.close();
if (!pollInterval) {
pollInterval = setInterval(() => void fetchStatus(), 5000);
}
};
return () => {
es.close();
if (pollInterval) {
clearInterval(pollInterval);
pollInterval = null;
}
};
});
</script>
{#if status.concurrency > 0}
<div class="mb-4 rounded-lg border border-gray-200 bg-white p-4 shadow-sm">
<div class="mb-2 flex items-center justify-between">
<h3 class="text-sm font-semibold text-gray-700">Workers</h3>
<span class="text-xs text-gray-500">{status.active} / {status.concurrency} active</span>
</div>
<div class="space-y-1">
{#each status.workers as worker (worker.index)}
<div class="flex items-center gap-2 text-xs">
<span
class="flex h-2 w-2 rounded-full {worker.state === 'running'
? 'animate-pulse bg-green-500'
: 'bg-gray-300'}"
></span>
<span class="text-gray-600">Worker {worker.index}</span>
{#if worker.state === 'running' && worker.repositoryId}
<span class="truncate text-gray-400"
>{worker.repositoryId}{worker.versionId ? ' / ' + worker.versionId : ''}</span
>
{:else}
<span class="text-gray-400">idle</span>
{/if}
</div>
{/each}
</div>
</div>
{/if}

View File

@@ -87,6 +87,7 @@ function makeMetadata(overrides: Partial<ContextResponseMetadata> = {}): Context
return {
localSource: false,
resultCount: 1,
searchModeUsed: 'vector',
repository: {
id: '/facebook/react',
title: 'React',

View File

@@ -143,6 +143,9 @@ export function formatContextTxt(
}
noResults.push(`Result count: ${metadata?.resultCount ?? 0}`);
if (metadata?.searchModeUsed) {
noResults.push(`Search mode: ${metadata.searchModeUsed}`);
}
parts.push(noResults.join('\n'));
return parts.join('\n\n');

View File

@@ -0,0 +1,175 @@
/**
* Unit tests for GitHub Compare API client (TRUEREF-0021).
*/
import { describe, it, expect, vi, beforeEach } from 'vitest';
import { fetchGitHubChangedFiles } from './github-compare.js';
import { GitHubApiError } from './github-tags.js';
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
function mockFetch(status: number, body: unknown): void {
vi.spyOn(global, 'fetch').mockResolvedValueOnce(new Response(JSON.stringify(body), { status }));
}
beforeEach(() => {
vi.restoreAllMocks();
});
// ---------------------------------------------------------------------------
// fetchGitHubChangedFiles
// ---------------------------------------------------------------------------
describe('fetchGitHubChangedFiles', () => {
it('maps added status correctly', async () => {
mockFetch(200, {
status: 'ahead',
files: [{ filename: 'src/new.ts', status: 'added', sha: 'abc123' }]
});
const result = await fetchGitHubChangedFiles('owner', 'repo', 'v1.0.0', 'v1.1.0');
expect(result).toHaveLength(1);
expect(result[0]).toMatchObject({ path: 'src/new.ts', status: 'added', sha: 'abc123' });
});
it('maps modified status correctly', async () => {
mockFetch(200, {
status: 'ahead',
files: [{ filename: 'src/index.ts', status: 'modified', sha: 'def456' }]
});
const result = await fetchGitHubChangedFiles('owner', 'repo', 'v1.0.0', 'v1.1.0');
expect(result[0]).toMatchObject({ path: 'src/index.ts', status: 'modified' });
});
it('maps removed status correctly and omits sha', async () => {
mockFetch(200, {
status: 'ahead',
files: [{ filename: 'src/old.ts', status: 'removed', sha: '000000' }]
});
const result = await fetchGitHubChangedFiles('owner', 'repo', 'v1.0.0', 'v1.1.0');
expect(result[0]).toMatchObject({ path: 'src/old.ts', status: 'removed' });
expect(result[0].sha).toBeUndefined();
});
it('maps renamed status and sets previousPath', async () => {
mockFetch(200, {
status: 'ahead',
files: [
{
filename: 'src/renamed.ts',
status: 'renamed',
sha: 'ghi789',
previous_filename: 'src/original.ts'
}
]
});
const result = await fetchGitHubChangedFiles('owner', 'repo', 'v1.0.0', 'v1.1.0');
expect(result[0]).toMatchObject({
path: 'src/renamed.ts',
status: 'renamed',
previousPath: 'src/original.ts',
sha: 'ghi789'
});
});
it('returns empty array when compare status is identical', async () => {
mockFetch(200, { status: 'identical', files: [] });
const result = await fetchGitHubChangedFiles('owner', 'repo', 'v1.0.0', 'v1.0.0');
expect(result).toEqual([]);
});
it('returns empty array when compare status is behind', async () => {
mockFetch(200, {
status: 'behind',
files: [{ filename: 'src/index.ts', status: 'modified', sha: 'abc' }]
});
const result = await fetchGitHubChangedFiles('owner', 'repo', 'v1.1.0', 'v1.0.0');
expect(result).toEqual([]);
});
it('throws GitHubApiError on 401 unauthorized', async () => {
mockFetch(401, { message: 'Unauthorized' });
await expect(
fetchGitHubChangedFiles('owner', 'private-repo', 'v1.0.0', 'v1.1.0')
).rejects.toThrow(GitHubApiError);
});
it('throws GitHubApiError on 404 not found', async () => {
mockFetch(404, { message: 'Not Found' });
await expect(
fetchGitHubChangedFiles('owner', 'missing-repo', 'v1.0.0', 'v1.1.0')
).rejects.toThrow(GitHubApiError);
});
it('throws GitHubApiError on 422 unprocessable entity', async () => {
mockFetch(422, { message: 'Unprocessable Entity' });
await expect(fetchGitHubChangedFiles('owner', 'repo', 'bad-ref', 'v1.1.0')).rejects.toThrow(
GitHubApiError
);
});
it('returns empty array when files property is missing', async () => {
mockFetch(200, { status: 'ahead' });
const result = await fetchGitHubChangedFiles('owner', 'repo', 'v1.0.0', 'v1.1.0');
expect(result).toEqual([]);
});
it('returns empty array when files array is empty', async () => {
mockFetch(200, { status: 'ahead', files: [] });
const result = await fetchGitHubChangedFiles('owner', 'repo', 'v1.0.0', 'v1.1.0');
expect(result).toEqual([]);
});
it('maps copied status to modified', async () => {
mockFetch(200, {
status: 'ahead',
files: [{ filename: 'src/copy.ts', status: 'copied', sha: 'jkl012' }]
});
const result = await fetchGitHubChangedFiles('owner', 'repo', 'v1.0.0', 'v1.1.0');
expect(result[0]).toMatchObject({ path: 'src/copy.ts', status: 'modified' });
});
it('maps changed status to modified', async () => {
mockFetch(200, {
status: 'ahead',
files: [{ filename: 'src/changed.ts', status: 'changed', sha: 'mno345' }]
});
const result = await fetchGitHubChangedFiles('owner', 'repo', 'v1.0.0', 'v1.1.0');
expect(result[0]).toMatchObject({ path: 'src/changed.ts', status: 'modified' });
});
it('sends Authorization header when token is provided', async () => {
const fetchSpy = vi
.spyOn(global, 'fetch')
.mockResolvedValueOnce(
new Response(JSON.stringify({ status: 'ahead', files: [] }), { status: 200 })
);
await fetchGitHubChangedFiles('owner', 'repo', 'v1.0.0', 'v1.1.0', 'my-token');
const callArgs = fetchSpy.mock.calls[0];
const headers = (callArgs[1] as RequestInit).headers as Record<string, string>;
expect(headers['Authorization']).toBe('Bearer my-token');
});
it('does not send Authorization header when no token provided', async () => {
const fetchSpy = vi
.spyOn(global, 'fetch')
.mockResolvedValueOnce(
new Response(JSON.stringify({ status: 'ahead', files: [] }), { status: 200 })
);
await fetchGitHubChangedFiles('owner', 'repo', 'v1.0.0', 'v1.1.0');
const callArgs = fetchSpy.mock.calls[0];
const headers = (callArgs[1] as RequestInit).headers as Record<string, string>;
expect(headers['Authorization']).toBeUndefined();
});
it('throws GitHubApiError with correct status code', async () => {
mockFetch(403, { message: 'Forbidden' });
try {
await fetchGitHubChangedFiles('owner', 'repo', 'v1.0.0', 'v1.1.0');
expect.fail('should have thrown');
} catch (e) {
expect(e).toBeInstanceOf(GitHubApiError);
expect((e as GitHubApiError).status).toBe(403);
}
});
});

View File

@@ -0,0 +1,104 @@
/**
* GitHub Compare API client for differential tag indexing (TRUEREF-0021).
*
* Uses GET /repos/{owner}/{repo}/compare/{base}...{head} to determine
* which files changed between two refs without downloading full trees.
*/
import { GitHubApiError } from './github-tags.js';
import type { ChangedFile } from './types.js';
const GITHUB_API = 'https://api.github.com';
interface GitHubCompareFile {
filename: string;
status: 'added' | 'modified' | 'removed' | 'renamed' | 'copied' | 'changed' | 'unchanged';
sha: string;
previous_filename?: string;
}
interface GitHubCompareResponse {
status: 'diverged' | 'ahead' | 'behind' | 'identical';
files?: GitHubCompareFile[];
}
/**
* Fetch changed files between two GitHub refs using the Compare API.
*
* @param owner GitHub owner/org
* @param repo GitHub repository name
* @param base Base ref (tag, branch, or commit SHA)
* @param head Head ref (tag, branch, or commit SHA)
* @param token Optional PAT for private repos
* @returns Array of ChangedFile objects; empty array when refs are identical or head is behind base
*/
export async function fetchGitHubChangedFiles(
owner: string,
repo: string,
base: string,
head: string,
token?: string
): Promise<ChangedFile[]> {
const url = `${GITHUB_API}/repos/${owner}/${repo}/compare/${base}...${head}?per_page=300`;
const headers: Record<string, string> = {
Accept: 'application/vnd.github+json',
'X-GitHub-Api-Version': '2022-11-28',
'User-Agent': 'TrueRef/1.0'
};
if (token) headers['Authorization'] = `Bearer ${token}`;
const response = await fetch(url, { headers });
if (!response.ok) {
throw new GitHubApiError(response.status);
}
const data = (await response.json()) as GitHubCompareResponse;
// Identical or behind means no relevant changes to index
if (data.status === 'identical' || data.status === 'behind') {
return [];
}
if (!data.files || data.files.length === 0) {
return [];
}
return data.files.map((file): ChangedFile => {
let status: ChangedFile['status'];
switch (file.status) {
case 'added':
status = 'added';
break;
case 'removed':
status = 'removed';
break;
case 'renamed':
status = 'renamed';
break;
case 'modified':
case 'copied':
case 'changed':
case 'unchanged':
default:
status = 'modified';
break;
}
const result: ChangedFile = {
path: file.filename,
status
};
if (status === 'renamed' && file.previous_filename) {
result.previousPath = file.previous_filename;
}
if (status !== 'removed') {
result.sha = file.sha;
}
return result;
});
}

View File

@@ -413,6 +413,59 @@ describe('LocalCrawler.crawl() — config file detection', () => {
const result = await crawlRoot();
expect(result.files.some((f) => f.path === 'src/index.ts')).toBe(true);
});
it('populates CrawlResult.config with the parsed trueref.json even when folders allowlist excludes the root', async () => {
// Regression test for MULTIVERSION-0001:
// When folders: ["src/"] is set, trueref.json at the root is excluded from
// files[] by shouldIndexFile(). The config must still be returned in
// CrawlResult.config so the indexing pipeline can persist rules.
root = await makeTempRepo({
'trueref.json': JSON.stringify({
folders: ['src/'],
rules: ['Always document public APIs.']
}),
'src/index.ts': 'export {};',
'docs/guide.md': '# Guide'
});
const result = await crawlRoot();
// trueref.json must NOT appear in files (excluded by folders allowlist).
expect(result.files.some((f) => f.path === 'trueref.json')).toBe(false);
// docs/guide.md must NOT appear (outside src/).
expect(result.files.some((f) => f.path === 'docs/guide.md')).toBe(false);
// src/index.ts must appear (inside src/).
expect(result.files.some((f) => f.path === 'src/index.ts')).toBe(true);
// CrawlResult.config must carry the parsed config.
expect(result.config).toBeDefined();
expect(result.config?.rules).toEqual(['Always document public APIs.']);
});
it('populates CrawlResult.config with the parsed context7.json', async () => {
root = await makeTempRepo({
'context7.json': JSON.stringify({ rules: ['Rule from context7.'] }),
'src/index.ts': 'export {};'
});
const result = await crawlRoot();
expect(result.config).toBeDefined();
expect(result.config?.rules).toEqual(['Rule from context7.']);
});
it('CrawlResult.config is undefined when no config file is present', async () => {
root = await makeTempRepo({ 'src/index.ts': 'export {};' });
const result = await crawlRoot();
expect(result.config).toBeUndefined();
});
it('CrawlResult.config is undefined when caller supplies config (caller-provided takes precedence, no auto-detect)', async () => {
root = await makeTempRepo({
'trueref.json': JSON.stringify({ rules: ['From file.'] }),
'src/index.ts': 'export {};'
});
// Caller-supplied config prevents auto-detection; CrawlResult.config
// should carry the caller config (not the file content).
const result = await crawlRoot({ config: { rules: ['From caller.'] } });
expect(result.config?.rules).toEqual(['From caller.']);
});
});
// ---------------------------------------------------------------------------

View File

@@ -230,7 +230,11 @@ export class LocalCrawler {
totalFiles: filteredPaths.length,
skippedFiles: allRelPaths.length - filteredPaths.length,
branch,
commitSha
commitSha,
// Surface the pre-parsed config so the indexing pipeline can read rules
// without needing to find trueref.json inside crawledFiles (which fails
// when a `folders` allowlist excludes the repo root).
config: config ?? undefined
};
}

View File

@@ -35,6 +35,13 @@ export interface CrawlResult {
branch: string;
/** HEAD commit SHA */
commitSha: string;
/**
* Pre-parsed trueref.json / context7.json configuration found at the repo
* root during crawling. Carried here so the indexing pipeline can consume it
* directly without having to locate the config file in `files` — which fails
* when a `folders` allowlist excludes the repo root.
*/
config?: RepoConfig;
}
export interface CrawlOptions {
@@ -48,6 +55,21 @@ export interface CrawlOptions {
config?: RepoConfig;
/** Progress callback invoked after each file is processed */
onProgress?: (processed: number, total: number) => void;
/**
* When provided, the crawler must restrict returned files to only these paths.
* Used by the differential indexing pipeline to skip unchanged files.
*/
allowedPaths?: Set<string>;
}
export interface ChangedFile {
/** Path of the file in the new version (head). For renames, this is the destination path. */
path: string;
status: 'added' | 'modified' | 'removed' | 'renamed';
/** Previous path, only set when status === 'renamed' */
previousPath?: string;
/** Blob SHA of the file content in the head ref (omitted for removed files) */
sha?: string;
}
// ---------------------------------------------------------------------------

View File

@@ -4,6 +4,8 @@
*/
import Database from 'better-sqlite3';
import { env } from '$env/dynamic/private';
import { applySqlitePragmas } from './connection';
import { loadSqliteVec } from './sqlite-vec';
let _client: Database.Database | null = null;
@@ -11,8 +13,8 @@ export function getClient(): Database.Database {
if (!_client) {
if (!env.DATABASE_URL) throw new Error('DATABASE_URL is not set');
_client = new Database(env.DATABASE_URL);
_client.pragma('journal_mode = WAL');
_client.pragma('foreign_keys = ON');
applySqlitePragmas(_client);
loadSqliteVec(_client);
}
return _client;
}

View File

@@ -0,0 +1,14 @@
import type Database from 'better-sqlite3';
export const SQLITE_BUSY_TIMEOUT_MS = 30000;
export function applySqlitePragmas(db: Database.Database): void {
db.pragma('journal_mode = WAL');
db.pragma('foreign_keys = ON');
db.pragma(`busy_timeout = ${SQLITE_BUSY_TIMEOUT_MS}`);
db.pragma('synchronous = NORMAL');
db.pragma('cache_size = -65536');
db.pragma('temp_store = MEMORY');
db.pragma('mmap_size = 268435456');
db.pragma('wal_autocheckpoint = 1000');
}

View File

@@ -5,16 +5,16 @@ import { readFileSync } from 'node:fs';
import { fileURLToPath } from 'node:url';
import { join, dirname } from 'node:path';
import * as schema from './schema';
import { applySqlitePragmas } from './connection';
import { loadSqliteVec } from './sqlite-vec';
import { env } from '$env/dynamic/private';
if (!env.DATABASE_URL) throw new Error('DATABASE_URL is not set');
const client = new Database(env.DATABASE_URL);
// Enable WAL mode for better concurrent read performance.
client.pragma('journal_mode = WAL');
// Enforce foreign key constraints.
client.pragma('foreign_keys = ON');
applySqlitePragmas(client);
loadSqliteVec(client);
export const db = drizzle(client, { schema });
@@ -30,6 +30,7 @@ const __dirname = dirname(fileURLToPath(import.meta.url));
*/
export function initializeDatabase(): void {
const migrationsFolder = join(__dirname, 'migrations');
console.log(`[db] Running migrations from ${migrationsFolder}...`);
migrate(db, { migrationsFolder });
// Apply FTS5 virtual table and trigger DDL (not expressible via Drizzle).

View File

@@ -0,0 +1,30 @@
PRAGMA foreign_keys=OFF;
--> statement-breakpoint
CREATE TABLE `__new_repository_configs` (
`repository_id` text NOT NULL,
`version_id` text,
`project_title` text,
`description` text,
`folders` text,
`exclude_folders` text,
`exclude_files` text,
`rules` text,
`previous_versions` text,
`updated_at` integer NOT NULL,
FOREIGN KEY (`repository_id`) REFERENCES `repositories`(`id`) ON UPDATE no action ON DELETE cascade
);
--> statement-breakpoint
INSERT INTO `__new_repository_configs`
(repository_id, version_id, project_title, description, folders, exclude_folders, exclude_files, rules, previous_versions, updated_at)
SELECT repository_id, NULL, project_title, description, folders, exclude_folders, exclude_files, rules, previous_versions, updated_at
FROM `repository_configs`;
--> statement-breakpoint
DROP TABLE `repository_configs`;
--> statement-breakpoint
ALTER TABLE `__new_repository_configs` RENAME TO `repository_configs`;
--> statement-breakpoint
PRAGMA foreign_keys=ON;
--> statement-breakpoint
CREATE UNIQUE INDEX `uniq_repo_config_base` ON `repository_configs` (`repository_id`) WHERE `version_id` IS NULL;
--> statement-breakpoint
CREATE UNIQUE INDEX `uniq_repo_config_version` ON `repository_configs` (`repository_id`, `version_id`) WHERE `version_id` IS NOT NULL;

View File

@@ -0,0 +1,23 @@
PRAGMA foreign_keys=OFF;--> statement-breakpoint
CREATE TABLE `__new_repository_configs` (
`repository_id` text NOT NULL,
`version_id` text,
`project_title` text,
`description` text,
`folders` text,
`exclude_folders` text,
`exclude_files` text,
`rules` text,
`previous_versions` text,
`updated_at` integer NOT NULL,
FOREIGN KEY (`repository_id`) REFERENCES `repositories`(`id`) ON UPDATE no action ON DELETE cascade
);
--> statement-breakpoint
INSERT INTO `__new_repository_configs`("repository_id", "version_id", "project_title", "description", "folders", "exclude_folders", "exclude_files", "rules", "previous_versions", "updated_at") SELECT "repository_id", "version_id", "project_title", "description", "folders", "exclude_folders", "exclude_files", "rules", "previous_versions", "updated_at" FROM `repository_configs`;--> statement-breakpoint
DROP TABLE `repository_configs`;--> statement-breakpoint
ALTER TABLE `__new_repository_configs` RENAME TO `repository_configs`;--> statement-breakpoint
PRAGMA foreign_keys=ON;--> statement-breakpoint
CREATE UNIQUE INDEX `uniq_repo_config_base` ON `repository_configs` (`repository_id`) WHERE "repository_configs"."version_id" IS NULL;--> statement-breakpoint
CREATE UNIQUE INDEX `uniq_repo_config_version` ON `repository_configs` (`repository_id`,`version_id`) WHERE "repository_configs"."version_id" IS NOT NULL;--> statement-breakpoint
ALTER TABLE `indexing_jobs` ADD `stage` text DEFAULT 'queued' NOT NULL;--> statement-breakpoint
ALTER TABLE `indexing_jobs` ADD `stage_detail` text;

View File

@@ -0,0 +1,6 @@
-- Backfill stage column for historical jobs whose stage was frozen at 'queued'
-- by the DEFAULT in migration 0004. Jobs that completed or failed before
-- TRUEREF-0022 never received stage updates via the worker thread, so their
-- stage column reflects the migration default rather than actual progress.
UPDATE indexing_jobs SET stage = 'done' WHERE status = 'done' AND stage = 'queued';--> statement-breakpoint
UPDATE indexing_jobs SET stage = 'failed' WHERE status = 'failed' AND stage = 'queued';

View File

@@ -0,0 +1,6 @@
CREATE INDEX `idx_embeddings_profile` ON `snippet_embeddings` (`profile_id`,`snippet_id`);--> statement-breakpoint
CREATE INDEX `idx_documents_repo_version` ON `documents` (`repository_id`,`version_id`);--> statement-breakpoint
CREATE INDEX `idx_jobs_repo_status` ON `indexing_jobs` (`repository_id`,`status`);--> statement-breakpoint
CREATE INDEX `idx_repositories_state` ON `repositories` (`state`);--> statement-breakpoint
CREATE INDEX `idx_snippets_repo_version` ON `snippets` (`repository_id`,`version_id`);--> statement-breakpoint
CREATE INDEX `idx_snippets_repo_type` ON `snippets` (`repository_id`,`type`);

View File

@@ -0,0 +1,848 @@
{
"version": "6",
"dialect": "sqlite",
"id": "c326dcbe-1771-4a90-a566-0ebd1eca47ec",
"prevId": "31531dab-a199-4fc5-a889-1884940039cd",
"tables": {
"documents": {
"name": "documents",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"repository_id": {
"name": "repository_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"version_id": {
"name": "version_id",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"file_path": {
"name": "file_path",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"language": {
"name": "language",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"token_count": {
"name": "token_count",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"checksum": {
"name": "checksum",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"indexed_at": {
"name": "indexed_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {
"documents_repository_id_repositories_id_fk": {
"name": "documents_repository_id_repositories_id_fk",
"tableFrom": "documents",
"tableTo": "repositories",
"columnsFrom": ["repository_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
},
"documents_version_id_repository_versions_id_fk": {
"name": "documents_version_id_repository_versions_id_fk",
"tableFrom": "documents",
"tableTo": "repository_versions",
"columnsFrom": ["version_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"embedding_profiles": {
"name": "embedding_profiles",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"provider_kind": {
"name": "provider_kind",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"enabled": {
"name": "enabled",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": true
},
"is_default": {
"name": "is_default",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": false
},
"model": {
"name": "model",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"dimensions": {
"name": "dimensions",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"config": {
"name": "config",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"updated_at": {
"name": "updated_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"indexing_jobs": {
"name": "indexing_jobs",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"repository_id": {
"name": "repository_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"version_id": {
"name": "version_id",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"status": {
"name": "status",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": "'queued'"
},
"progress": {
"name": "progress",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"total_files": {
"name": "total_files",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"processed_files": {
"name": "processed_files",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"stage": {
"name": "stage",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": "'queued'"
},
"stage_detail": {
"name": "stage_detail",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"error": {
"name": "error",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"started_at": {
"name": "started_at",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"completed_at": {
"name": "completed_at",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {
"indexing_jobs_repository_id_repositories_id_fk": {
"name": "indexing_jobs_repository_id_repositories_id_fk",
"tableFrom": "indexing_jobs",
"tableTo": "repositories",
"columnsFrom": ["repository_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"repositories": {
"name": "repositories",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"description": {
"name": "description",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"source": {
"name": "source",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"source_url": {
"name": "source_url",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"branch": {
"name": "branch",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": "'main'"
},
"state": {
"name": "state",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": "'pending'"
},
"total_snippets": {
"name": "total_snippets",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"total_tokens": {
"name": "total_tokens",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"trust_score": {
"name": "trust_score",
"type": "real",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"benchmark_score": {
"name": "benchmark_score",
"type": "real",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"stars": {
"name": "stars",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"github_token": {
"name": "github_token",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"last_indexed_at": {
"name": "last_indexed_at",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"updated_at": {
"name": "updated_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"repository_configs": {
"name": "repository_configs",
"columns": {
"repository_id": {
"name": "repository_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"version_id": {
"name": "version_id",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"project_title": {
"name": "project_title",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"description": {
"name": "description",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"folders": {
"name": "folders",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"exclude_folders": {
"name": "exclude_folders",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"exclude_files": {
"name": "exclude_files",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"rules": {
"name": "rules",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"previous_versions": {
"name": "previous_versions",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"updated_at": {
"name": "updated_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {
"uniq_repo_config_base": {
"name": "uniq_repo_config_base",
"columns": ["repository_id"],
"isUnique": true,
"where": "\"repository_configs\".\"version_id\" IS NULL"
},
"uniq_repo_config_version": {
"name": "uniq_repo_config_version",
"columns": ["repository_id", "version_id"],
"isUnique": true,
"where": "\"repository_configs\".\"version_id\" IS NOT NULL"
}
},
"foreignKeys": {
"repository_configs_repository_id_repositories_id_fk": {
"name": "repository_configs_repository_id_repositories_id_fk",
"tableFrom": "repository_configs",
"tableTo": "repositories",
"columnsFrom": ["repository_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"repository_versions": {
"name": "repository_versions",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"repository_id": {
"name": "repository_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"tag": {
"name": "tag",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"commit_hash": {
"name": "commit_hash",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"state": {
"name": "state",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": "'pending'"
},
"total_snippets": {
"name": "total_snippets",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"indexed_at": {
"name": "indexed_at",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {
"repository_versions_repository_id_repositories_id_fk": {
"name": "repository_versions_repository_id_repositories_id_fk",
"tableFrom": "repository_versions",
"tableTo": "repositories",
"columnsFrom": ["repository_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"settings": {
"name": "settings",
"columns": {
"key": {
"name": "key",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"value": {
"name": "value",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"updated_at": {
"name": "updated_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"snippet_embeddings": {
"name": "snippet_embeddings",
"columns": {
"snippet_id": {
"name": "snippet_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"profile_id": {
"name": "profile_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"model": {
"name": "model",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"dimensions": {
"name": "dimensions",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"embedding": {
"name": "embedding",
"type": "blob",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {
"snippet_embeddings_snippet_id_snippets_id_fk": {
"name": "snippet_embeddings_snippet_id_snippets_id_fk",
"tableFrom": "snippet_embeddings",
"tableTo": "snippets",
"columnsFrom": ["snippet_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
},
"snippet_embeddings_profile_id_embedding_profiles_id_fk": {
"name": "snippet_embeddings_profile_id_embedding_profiles_id_fk",
"tableFrom": "snippet_embeddings",
"tableTo": "embedding_profiles",
"columnsFrom": ["profile_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {
"snippet_embeddings_snippet_id_profile_id_pk": {
"columns": ["snippet_id", "profile_id"],
"name": "snippet_embeddings_snippet_id_profile_id_pk"
}
},
"uniqueConstraints": {},
"checkConstraints": {}
},
"snippets": {
"name": "snippets",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"document_id": {
"name": "document_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"repository_id": {
"name": "repository_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"version_id": {
"name": "version_id",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"type": {
"name": "type",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"content": {
"name": "content",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"language": {
"name": "language",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"breadcrumb": {
"name": "breadcrumb",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"token_count": {
"name": "token_count",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {
"snippets_document_id_documents_id_fk": {
"name": "snippets_document_id_documents_id_fk",
"tableFrom": "snippets",
"tableTo": "documents",
"columnsFrom": ["document_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
},
"snippets_repository_id_repositories_id_fk": {
"name": "snippets_repository_id_repositories_id_fk",
"tableFrom": "snippets",
"tableTo": "repositories",
"columnsFrom": ["repository_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
},
"snippets_version_id_repository_versions_id_fk": {
"name": "snippets_version_id_repository_versions_id_fk",
"tableFrom": "snippets",
"tableTo": "repository_versions",
"columnsFrom": ["version_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
}
},
"views": {},
"enums": {},
"_meta": {
"schemas": {},
"tables": {},
"columns": {}
},
"internal": {
"indexes": {}
}
}

View File

@@ -0,0 +1,883 @@
{
"version": "6",
"dialect": "sqlite",
"id": "b8998bda-f89b-41bc-b923-3f676d153c79",
"prevId": "c326dcbe-1771-4a90-a566-0ebd1eca47ec",
"tables": {
"documents": {
"name": "documents",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"repository_id": {
"name": "repository_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"version_id": {
"name": "version_id",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"file_path": {
"name": "file_path",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"language": {
"name": "language",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"token_count": {
"name": "token_count",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"checksum": {
"name": "checksum",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"indexed_at": {
"name": "indexed_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {
"idx_documents_repo_version": {
"name": "idx_documents_repo_version",
"columns": ["repository_id", "version_id"],
"isUnique": false
}
},
"foreignKeys": {
"documents_repository_id_repositories_id_fk": {
"name": "documents_repository_id_repositories_id_fk",
"tableFrom": "documents",
"tableTo": "repositories",
"columnsFrom": ["repository_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
},
"documents_version_id_repository_versions_id_fk": {
"name": "documents_version_id_repository_versions_id_fk",
"tableFrom": "documents",
"tableTo": "repository_versions",
"columnsFrom": ["version_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"embedding_profiles": {
"name": "embedding_profiles",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"provider_kind": {
"name": "provider_kind",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"enabled": {
"name": "enabled",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": true
},
"is_default": {
"name": "is_default",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": false
},
"model": {
"name": "model",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"dimensions": {
"name": "dimensions",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"config": {
"name": "config",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"updated_at": {
"name": "updated_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"indexing_jobs": {
"name": "indexing_jobs",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"repository_id": {
"name": "repository_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"version_id": {
"name": "version_id",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"status": {
"name": "status",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": "'queued'"
},
"progress": {
"name": "progress",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"total_files": {
"name": "total_files",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"processed_files": {
"name": "processed_files",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"stage": {
"name": "stage",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": "'queued'"
},
"stage_detail": {
"name": "stage_detail",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"error": {
"name": "error",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"started_at": {
"name": "started_at",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"completed_at": {
"name": "completed_at",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {
"idx_jobs_repo_status": {
"name": "idx_jobs_repo_status",
"columns": ["repository_id", "status"],
"isUnique": false
}
},
"foreignKeys": {
"indexing_jobs_repository_id_repositories_id_fk": {
"name": "indexing_jobs_repository_id_repositories_id_fk",
"tableFrom": "indexing_jobs",
"tableTo": "repositories",
"columnsFrom": ["repository_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"repositories": {
"name": "repositories",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"description": {
"name": "description",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"source": {
"name": "source",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"source_url": {
"name": "source_url",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"branch": {
"name": "branch",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": "'main'"
},
"state": {
"name": "state",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": "'pending'"
},
"total_snippets": {
"name": "total_snippets",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"total_tokens": {
"name": "total_tokens",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"trust_score": {
"name": "trust_score",
"type": "real",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"benchmark_score": {
"name": "benchmark_score",
"type": "real",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"stars": {
"name": "stars",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"github_token": {
"name": "github_token",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"last_indexed_at": {
"name": "last_indexed_at",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"updated_at": {
"name": "updated_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {
"idx_repositories_state": {
"name": "idx_repositories_state",
"columns": ["state"],
"isUnique": false
}
},
"foreignKeys": {},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"repository_configs": {
"name": "repository_configs",
"columns": {
"repository_id": {
"name": "repository_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"version_id": {
"name": "version_id",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"project_title": {
"name": "project_title",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"description": {
"name": "description",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"folders": {
"name": "folders",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"exclude_folders": {
"name": "exclude_folders",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"exclude_files": {
"name": "exclude_files",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"rules": {
"name": "rules",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"previous_versions": {
"name": "previous_versions",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"updated_at": {
"name": "updated_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {
"uniq_repo_config_base": {
"name": "uniq_repo_config_base",
"columns": ["repository_id"],
"isUnique": true,
"where": "\"repository_configs\".\"version_id\" IS NULL"
},
"uniq_repo_config_version": {
"name": "uniq_repo_config_version",
"columns": ["repository_id", "version_id"],
"isUnique": true,
"where": "\"repository_configs\".\"version_id\" IS NOT NULL"
}
},
"foreignKeys": {
"repository_configs_repository_id_repositories_id_fk": {
"name": "repository_configs_repository_id_repositories_id_fk",
"tableFrom": "repository_configs",
"tableTo": "repositories",
"columnsFrom": ["repository_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"repository_versions": {
"name": "repository_versions",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"repository_id": {
"name": "repository_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"tag": {
"name": "tag",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"commit_hash": {
"name": "commit_hash",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"state": {
"name": "state",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": "'pending'"
},
"total_snippets": {
"name": "total_snippets",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"indexed_at": {
"name": "indexed_at",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {
"repository_versions_repository_id_repositories_id_fk": {
"name": "repository_versions_repository_id_repositories_id_fk",
"tableFrom": "repository_versions",
"tableTo": "repositories",
"columnsFrom": ["repository_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"settings": {
"name": "settings",
"columns": {
"key": {
"name": "key",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"value": {
"name": "value",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"updated_at": {
"name": "updated_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"snippet_embeddings": {
"name": "snippet_embeddings",
"columns": {
"snippet_id": {
"name": "snippet_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"profile_id": {
"name": "profile_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"model": {
"name": "model",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"dimensions": {
"name": "dimensions",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"embedding": {
"name": "embedding",
"type": "blob",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {
"idx_embeddings_profile": {
"name": "idx_embeddings_profile",
"columns": ["profile_id", "snippet_id"],
"isUnique": false
}
},
"foreignKeys": {
"snippet_embeddings_snippet_id_snippets_id_fk": {
"name": "snippet_embeddings_snippet_id_snippets_id_fk",
"tableFrom": "snippet_embeddings",
"tableTo": "snippets",
"columnsFrom": ["snippet_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
},
"snippet_embeddings_profile_id_embedding_profiles_id_fk": {
"name": "snippet_embeddings_profile_id_embedding_profiles_id_fk",
"tableFrom": "snippet_embeddings",
"tableTo": "embedding_profiles",
"columnsFrom": ["profile_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {
"snippet_embeddings_snippet_id_profile_id_pk": {
"columns": ["snippet_id", "profile_id"],
"name": "snippet_embeddings_snippet_id_profile_id_pk"
}
},
"uniqueConstraints": {},
"checkConstraints": {}
},
"snippets": {
"name": "snippets",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"document_id": {
"name": "document_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"repository_id": {
"name": "repository_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"version_id": {
"name": "version_id",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"type": {
"name": "type",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"content": {
"name": "content",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"language": {
"name": "language",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"breadcrumb": {
"name": "breadcrumb",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"token_count": {
"name": "token_count",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {
"idx_snippets_repo_version": {
"name": "idx_snippets_repo_version",
"columns": ["repository_id", "version_id"],
"isUnique": false
},
"idx_snippets_repo_type": {
"name": "idx_snippets_repo_type",
"columns": ["repository_id", "type"],
"isUnique": false
}
},
"foreignKeys": {
"snippets_document_id_documents_id_fk": {
"name": "snippets_document_id_documents_id_fk",
"tableFrom": "snippets",
"tableTo": "documents",
"columnsFrom": ["document_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
},
"snippets_repository_id_repositories_id_fk": {
"name": "snippets_repository_id_repositories_id_fk",
"tableFrom": "snippets",
"tableTo": "repositories",
"columnsFrom": ["repository_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
},
"snippets_version_id_repository_versions_id_fk": {
"name": "snippets_version_id_repository_versions_id_fk",
"tableFrom": "snippets",
"tableTo": "repository_versions",
"columnsFrom": ["version_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
}
},
"views": {},
"enums": {},
"_meta": {
"schemas": {},
"tables": {},
"columns": {}
},
"internal": {
"indexes": {}
}
}

View File

@@ -22,6 +22,34 @@
"when": 1774461897742,
"tag": "0002_silky_stellaris",
"breakpoints": true
},
{
"idx": 3,
"version": "6",
"when": 1743155877000,
"tag": "0003_multiversion_config",
"breakpoints": true
},
{
"idx": 4,
"version": "6",
"when": 1774880275833,
"tag": "0004_complete_sentry",
"breakpoints": true
},
{
"idx": 5,
"version": "6",
"when": 1774890536284,
"tag": "0005_fix_stage_defaults",
"breakpoints": true
},
{
"idx": 6,
"version": "6",
"when": 1775038799913,
"tag": "0006_yielding_centennial",
"breakpoints": true
}
]
}

View File

@@ -6,6 +6,7 @@ import { readFileSync } from 'node:fs';
import { join } from 'node:path';
import { eq } from 'drizzle-orm';
import * as schema from './schema';
import { loadSqliteVec, sqliteVecRowidTableName, sqliteVecTableName } from './sqlite-vec';
import {
repositories,
repositoryVersions,
@@ -24,6 +25,7 @@ import {
function createTestDb() {
const client = new Database(':memory:');
client.pragma('foreign_keys = ON');
loadSqliteVec(client);
const db = drizzle(client, { schema });
@@ -266,10 +268,11 @@ describe('snippets table', () => {
describe('snippet_embeddings table', () => {
let db: ReturnType<typeof createTestDb>['db'];
let client: Database.Database;
let snippetId: string;
beforeEach(() => {
({ db } = createTestDb());
({ db, client } = createTestDb());
db.insert(repositories).values(makeRepo()).run();
const docId = crypto.randomUUID();
db.insert(documents)
@@ -344,6 +347,30 @@ describe('snippet_embeddings table', () => {
const result = db.select().from(snippetEmbeddings).all();
expect(result).toHaveLength(0);
});
it('keeps the relational schema free of vec_embedding and retains the profile index', () => {
const columns = client.prepare("PRAGMA table_info('snippet_embeddings')").all() as Array<{
name: string;
}>;
expect(columns.map((column) => column.name)).not.toContain('vec_embedding');
const indexes = client.prepare("PRAGMA index_list('snippet_embeddings')").all() as Array<{
name: string;
}>;
expect(indexes.map((index) => index.name)).toContain('idx_embeddings_profile');
});
it('loads sqlite-vec idempotently and derives deterministic per-profile table names', () => {
expect(() => loadSqliteVec(client)).not.toThrow();
const tableName = sqliteVecTableName('local-default');
const rowidTableName = sqliteVecRowidTableName('local-default');
expect(tableName).toMatch(/^snippet_embeddings_vec_local_default_[0-9a-f]{8}$/);
expect(rowidTableName).toMatch(/^snippet_embeddings_vec_rowids_local_default_[0-9a-f]{8}$/);
expect(sqliteVecTableName('local-default')).toBe(tableName);
expect(sqliteVecRowidTableName('local-default')).toBe(rowidTableName);
expect(sqliteVecTableName('local-default')).not.toBe(sqliteVecTableName('openai/custom'));
});
});
describe('indexing_jobs table', () => {

View File

@@ -1,9 +1,21 @@
import { blob, integer, primaryKey, real, sqliteTable, text } from 'drizzle-orm/sqlite-core';
import { sql } from 'drizzle-orm';
import {
blob,
index,
integer,
primaryKey,
real,
sqliteTable,
text,
uniqueIndex
} from 'drizzle-orm/sqlite-core';
// ---------------------------------------------------------------------------
// repositories
// ---------------------------------------------------------------------------
export const repositories = sqliteTable('repositories', {
export const repositories = sqliteTable(
'repositories',
{
id: text('id').primaryKey(), // e.g. "/facebook/react" or "/local/my-sdk"
title: text('title').notNull(),
description: text('description'),
@@ -25,7 +37,9 @@ export const repositories = sqliteTable('repositories', {
lastIndexedAt: integer('last_indexed_at', { mode: 'timestamp' }),
createdAt: integer('created_at', { mode: 'timestamp' }).notNull(),
updatedAt: integer('updated_at', { mode: 'timestamp' }).notNull()
});
},
(t) => [index('idx_repositories_state').on(t.state)]
);
// ---------------------------------------------------------------------------
// repository_versions
@@ -51,7 +65,9 @@ export const repositoryVersions = sqliteTable('repository_versions', {
// ---------------------------------------------------------------------------
// documents
// ---------------------------------------------------------------------------
export const documents = sqliteTable('documents', {
export const documents = sqliteTable(
'documents',
{
id: text('id').primaryKey(), // UUID
repositoryId: text('repository_id')
.notNull()
@@ -63,12 +79,16 @@ export const documents = sqliteTable('documents', {
tokenCount: integer('token_count').default(0),
checksum: text('checksum').notNull(), // SHA-256 of file content
indexedAt: integer('indexed_at', { mode: 'timestamp' }).notNull()
});
},
(t) => [index('idx_documents_repo_version').on(t.repositoryId, t.versionId)]
);
// ---------------------------------------------------------------------------
// snippets
// ---------------------------------------------------------------------------
export const snippets = sqliteTable('snippets', {
export const snippets = sqliteTable(
'snippets',
{
id: text('id').primaryKey(), // UUID
documentId: text('document_id')
.notNull()
@@ -84,7 +104,12 @@ export const snippets = sqliteTable('snippets', {
breadcrumb: text('breadcrumb'), // e.g. "Installation > Getting Started"
tokenCount: integer('token_count').default(0),
createdAt: integer('created_at', { mode: 'timestamp' }).notNull()
});
},
(t) => [
index('idx_snippets_repo_version').on(t.repositoryId, t.versionId),
index('idx_snippets_repo_type').on(t.repositoryId, t.type)
]
);
// ---------------------------------------------------------------------------
// embedding_profiles
@@ -119,13 +144,18 @@ export const snippetEmbeddings = sqliteTable(
embedding: blob('embedding').notNull(), // Float32Array as binary blob
createdAt: integer('created_at').notNull()
},
(table) => [primaryKey({ columns: [table.snippetId, table.profileId] })]
(table) => [
primaryKey({ columns: [table.snippetId, table.profileId] }),
index('idx_embeddings_profile').on(table.profileId, table.snippetId)
]
);
// ---------------------------------------------------------------------------
// indexing_jobs
// ---------------------------------------------------------------------------
export const indexingJobs = sqliteTable('indexing_jobs', {
export const indexingJobs = sqliteTable(
'indexing_jobs',
{
id: text('id').primaryKey(), // UUID
repositoryId: text('repository_id')
.notNull()
@@ -139,19 +169,40 @@ export const indexingJobs = sqliteTable('indexing_jobs', {
progress: integer('progress').default(0), // 0100
totalFiles: integer('total_files').default(0),
processedFiles: integer('processed_files').default(0),
stage: text('stage', {
enum: [
'queued',
'differential',
'crawling',
'cloning',
'parsing',
'storing',
'embedding',
'done',
'failed'
]
})
.notNull()
.default('queued'),
stageDetail: text('stage_detail'),
error: text('error'),
startedAt: integer('started_at', { mode: 'timestamp' }),
completedAt: integer('completed_at', { mode: 'timestamp' }),
createdAt: integer('created_at', { mode: 'timestamp' }).notNull()
});
},
(t) => [index('idx_jobs_repo_status').on(t.repositoryId, t.status)]
);
// ---------------------------------------------------------------------------
// repository_configs
// ---------------------------------------------------------------------------
export const repositoryConfigs = sqliteTable('repository_configs', {
export const repositoryConfigs = sqliteTable(
'repository_configs',
{
repositoryId: text('repository_id')
.primaryKey()
.notNull()
.references(() => repositories.id, { onDelete: 'cascade' }),
versionId: text('version_id'),
projectTitle: text('project_title'),
description: text('description'),
folders: text('folders', { mode: 'json' }).$type<string[]>(),
@@ -162,7 +213,16 @@ export const repositoryConfigs = sqliteTable('repository_configs', {
{ tag: string; title: string; commitHash?: string }[]
>(),
updatedAt: integer('updated_at', { mode: 'timestamp' }).notNull()
});
},
(table) => [
uniqueIndex('uniq_repo_config_base')
.on(table.repositoryId)
.where(sql`${table.versionId} IS NULL`),
uniqueIndex('uniq_repo_config_version')
.on(table.repositoryId, table.versionId)
.where(sql`${table.versionId} IS NOT NULL`)
]
);
// ---------------------------------------------------------------------------
// settings

View File

@@ -0,0 +1,49 @@
import type Database from 'better-sqlite3';
import * as sqliteVec from 'sqlite-vec';
const loadedConnections = new WeakSet<Database.Database>();
function stableHash(value: string): string {
let hash = 2166136261;
for (let index = 0; index < value.length; index += 1) {
hash ^= value.charCodeAt(index);
hash = Math.imul(hash, 16777619);
}
return (hash >>> 0).toString(16).padStart(8, '0');
}
function sanitizeIdentifierPart(value: string): string {
const sanitized = value
.toLowerCase()
.replace(/[^a-z0-9]+/g, '_')
.replace(/^_+|_+$/g, '');
return sanitized.length > 0 ? sanitized.slice(0, 32) : 'profile';
}
export function sqliteVecTableSuffix(profileId: string): string {
return `${sanitizeIdentifierPart(profileId)}_${stableHash(profileId)}`;
}
export function sqliteVecTableName(profileId: string): string {
return `snippet_embeddings_vec_${sqliteVecTableSuffix(profileId)}`;
}
export function sqliteVecRowidTableName(profileId: string): string {
return `snippet_embeddings_vec_rowids_${sqliteVecTableSuffix(profileId)}`;
}
export function quoteSqliteIdentifier(identifier: string): string {
return `"${identifier.replace(/"/g, '""')}"`;
}
export function loadSqliteVec(db: Database.Database): void {
if (loadedConnections.has(db)) {
return;
}
sqliteVec.load(db);
loadedConnections.add(db);
}

View File

@@ -0,0 +1,2 @@
-- Relational vec_embedding bootstrap removed in iteration 2.
-- Downstream sqlite-vec vec0 tables are created on demand in application code.

View File

@@ -12,6 +12,8 @@ import { migrate } from 'drizzle-orm/better-sqlite3/migrator';
import { readFileSync } from 'node:fs';
import { join } from 'node:path';
import * as schema from '../db/schema.js';
import { loadSqliteVec, sqliteVecRowidTableName, sqliteVecTableName } from '../db/sqlite-vec.js';
import { SqliteVecStore } from '../search/sqlite-vec.store.js';
import { NoopEmbeddingProvider, EmbeddingError, type EmbeddingVector } from './provider.js';
import { OpenAIEmbeddingProvider } from './openai.provider.js';
@@ -31,6 +33,7 @@ import { createProviderFromProfile } from './registry.js';
function createTestDb() {
const client = new Database(':memory:');
client.pragma('foreign_keys = ON');
loadSqliteVec(client);
const db = drizzle(client, { schema });
const migrationsFolder = join(import.meta.dirname, '../db/migrations');
@@ -387,10 +390,19 @@ describe('EmbeddingService', () => {
embedding: Buffer;
profile_id: string;
};
expect((row as Record<string, unknown>).vec_embedding).toBeUndefined();
expect(row.model).toBe('test-model');
expect(row.dimensions).toBe(4);
expect(row.profile_id).toBe('local-default');
expect(row.embedding).toBeInstanceOf(Buffer);
const queryEmbedding = service.getEmbedding(snippetId, 'local-default');
const matches = new SqliteVecStore(client).queryNearestNeighbors(queryEmbedding!, {
repositoryId: '/test/embed-repo',
profileId: 'local-default',
limit: 5
});
expect(matches[0]?.snippetId).toBe(snippetId);
});
it('stores embeddings as retrievable Float32Array blobs', async () => {
@@ -408,6 +420,25 @@ describe('EmbeddingService', () => {
expect(embedding![2]).toBeCloseTo(0.2, 5);
});
it('can delegate embedding persistence to an injected writer', async () => {
const snippetId = seedSnippet(db, client);
const provider = makeProvider(4);
const persistEmbeddings = vi.fn().mockResolvedValue(undefined);
const service = new EmbeddingService(client, provider, 'local-default', {
persistEmbeddings
});
await service.embedSnippets([snippetId]);
expect(persistEmbeddings).toHaveBeenCalledTimes(1);
const rows = client
.prepare(
'SELECT COUNT(*) AS cnt FROM snippet_embeddings WHERE snippet_id = ? AND profile_id = ?'
)
.get(snippetId, 'local-default') as { cnt: number };
expect(rows.cnt).toBe(0);
});
it('stores embeddings under the configured profile ID', async () => {
client
.prepare(
@@ -415,16 +446,7 @@ describe('EmbeddingService', () => {
(id, provider_kind, title, enabled, is_default, model, dimensions, config, created_at, updated_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, unixepoch(), unixepoch())`
)
.run(
'openai-custom',
'openai-compatible',
'OpenAI Custom',
1,
0,
'test-model',
4,
'{}'
);
.run('openai-custom', 'openai-compatible', 'OpenAI Custom', 1, 0, 'test-model', 4, '{}');
const snippetId = seedSnippet(db, client);
const provider = makeProvider(4, 'test-model');
@@ -436,6 +458,22 @@ describe('EmbeddingService', () => {
.prepare('SELECT profile_id FROM snippet_embeddings WHERE snippet_id = ?')
.get(snippetId) as { profile_id: string };
expect(row.profile_id).toBe('openai-custom');
const queryEmbedding = service.getEmbedding(snippetId, 'openai-custom');
const store = new SqliteVecStore(client);
const customMatches = store.queryNearestNeighbors(queryEmbedding!, {
repositoryId: '/test/embed-repo',
profileId: 'openai-custom',
limit: 5
});
const defaultMatches = store.queryNearestNeighbors(new Float32Array([1, 0, 0, 0]), {
repositoryId: '/test/embed-repo',
profileId: 'local-default',
limit: 5
});
expect(customMatches[0]?.snippetId).toBe(snippetId);
expect(defaultMatches).toHaveLength(0);
});
it('is idempotent — re-embedding replaces the existing row', async () => {
@@ -450,6 +488,17 @@ describe('EmbeddingService', () => {
.prepare('SELECT COUNT(*) as cnt FROM snippet_embeddings WHERE snippet_id = ?')
.get(snippetId) as { cnt: number };
expect(rows.cnt).toBe(1);
const vecTable = sqliteVecTableName('local-default');
const rowidTable = sqliteVecRowidTableName('local-default');
const vecRows = client.prepare(`SELECT COUNT(*) as cnt FROM "${vecTable}"`).get() as {
cnt: number;
};
const rowidRows = client.prepare(`SELECT COUNT(*) as cnt FROM "${rowidTable}"`).get() as {
cnt: number;
};
expect(vecRows.cnt).toBe(1);
expect(rowidRows.cnt).toBe(1);
});
it('calls onProgress after each batch', async () => {

View File

@@ -5,6 +5,11 @@
import type Database from 'better-sqlite3';
import type { EmbeddingProvider } from './provider.js';
import { SqliteVecStore } from '$lib/server/search/sqlite-vec.store.js';
import {
upsertEmbeddings,
type PersistedEmbedding
} from '$lib/server/pipeline/write-operations.js';
interface SnippetRow {
id: string;
@@ -17,11 +22,18 @@ const BATCH_SIZE = 50;
const TEXT_MAX_CHARS = 2048;
export class EmbeddingService {
private readonly sqliteVecStore: SqliteVecStore;
constructor(
private readonly db: Database.Database,
private readonly provider: EmbeddingProvider,
private readonly profileId: string = 'local-default'
) {}
private readonly profileId: string = 'local-default',
private readonly persistenceDelegate?: {
persistEmbeddings?: (embeddings: PersistedEmbedding[]) => Promise<void>;
}
) {
this.sqliteVecStore = new SqliteVecStore(db);
}
findSnippetIdsMissingEmbeddings(repositoryId: string, versionId: string | null): string[] {
if (versionId) {
@@ -89,31 +101,31 @@ export class EmbeddingService {
[s.title, s.breadcrumb, s.content].filter(Boolean).join('\n').slice(0, TEXT_MAX_CHARS)
);
const insert = this.db.prepare<[string, string, string, number, Buffer]>(`
INSERT OR REPLACE INTO snippet_embeddings (snippet_id, profile_id, model, dimensions, embedding, created_at)
VALUES (?, ?, ?, ?, ?, unixepoch())
`);
for (let i = 0; i < snippets.length; i += BATCH_SIZE) {
const batchSnippets = snippets.slice(i, i + BATCH_SIZE);
const batchTexts = texts.slice(i, i + BATCH_SIZE);
const embeddings = await this.provider.embed(batchTexts);
const insertMany = this.db.transaction(() => {
for (let j = 0; j < batchSnippets.length; j++) {
const snippet = batchSnippets[j];
const embedding = embeddings[j];
insert.run(
snippet.id,
this.profileId,
embedding.model,
embedding.dimensions,
Buffer.from(embedding.values.buffer)
);
}
const persistedEmbeddings: PersistedEmbedding[] = batchSnippets.map((snippet, index) => {
const embedding = embeddings[index];
return {
snippetId: snippet.id,
profileId: this.profileId,
model: embedding.model,
dimensions: embedding.dimensions,
embedding: Buffer.from(
embedding.values.buffer,
embedding.values.byteOffset,
embedding.values.byteLength
)
};
});
insertMany();
if (this.persistenceDelegate?.persistEmbeddings) {
await this.persistenceDelegate.persistEmbeddings(persistedEmbeddings);
} else {
upsertEmbeddings(this.db, persistedEmbeddings);
}
onProgress?.(Math.min(i + BATCH_SIZE, snippets.length), snippets.length);
}

View File

@@ -15,6 +15,7 @@ import { LibrarySearchResult, SnippetSearchResult } from '$lib/server/models/sea
export interface ContextResponseMetadata {
localSource: boolean;
resultCount: number;
searchModeUsed: string;
repository: {
id: string;
title: string;
@@ -130,7 +131,8 @@ export class ContextResponseMapper {
id: metadata.version.id
})
: null,
resultCount: metadata?.resultCount ?? snippets.length
resultCount: metadata?.resultCount ?? snippets.length,
searchModeUsed: metadata?.searchModeUsed ?? 'keyword'
});
}
}

View File

@@ -1,7 +1,4 @@
import {
EmbeddingProfile,
EmbeddingProfileEntity
} from '$lib/server/models/embedding-profile.js';
import { EmbeddingProfile, EmbeddingProfileEntity } from '$lib/server/models/embedding-profile.js';
function parseConfig(config: Record<string, unknown> | string | null): Record<string, unknown> {
if (!config) {

View File

@@ -10,6 +10,8 @@ export class IndexingJobMapper {
progress: entity.progress,
totalFiles: entity.total_files,
processedFiles: entity.processed_files,
stage: entity.stage,
stageDetail: entity.stage_detail,
error: entity.error,
startedAt: entity.started_at != null ? new Date(entity.started_at * 1000) : null,
completedAt: entity.completed_at != null ? new Date(entity.completed_at * 1000) : null,
@@ -26,6 +28,8 @@ export class IndexingJobMapper {
progress: domain.progress,
totalFiles: domain.totalFiles,
processedFiles: domain.processedFiles,
stage: domain.stage,
stageDetail: domain.stageDetail,
error: domain.error,
startedAt: domain.startedAt,
completedAt: domain.completedAt,

View File

@@ -173,6 +173,7 @@ export class ContextJsonResponseDto {
repository: ContextRepositoryJsonDto | null;
version: ContextVersionJsonDto | null;
resultCount: number;
searchModeUsed: string;
constructor(props: ContextJsonResponseDto) {
this.snippets = props.snippets;
@@ -182,5 +183,6 @@ export class ContextJsonResponseDto {
this.repository = props.repository;
this.version = props.version;
this.resultCount = props.resultCount;
this.searchModeUsed = props.searchModeUsed;
}
}

View File

@@ -6,6 +6,8 @@ export interface IndexingJobEntityProps {
progress: number;
total_files: number;
processed_files: number;
stage: string;
stage_detail: string | null;
error: string | null;
started_at: number | null;
completed_at: number | null;
@@ -20,6 +22,8 @@ export class IndexingJobEntity {
progress: number;
total_files: number;
processed_files: number;
stage: string;
stage_detail: string | null;
error: string | null;
started_at: number | null;
completed_at: number | null;
@@ -33,6 +37,8 @@ export class IndexingJobEntity {
this.progress = props.progress;
this.total_files = props.total_files;
this.processed_files = props.processed_files;
this.stage = props.stage;
this.stage_detail = props.stage_detail;
this.error = props.error;
this.started_at = props.started_at;
this.completed_at = props.completed_at;
@@ -48,6 +54,8 @@ export interface IndexingJobProps {
progress: number;
totalFiles: number;
processedFiles: number;
stage: string;
stageDetail: string | null;
error: string | null;
startedAt: Date | null;
completedAt: Date | null;
@@ -62,6 +70,8 @@ export class IndexingJob {
progress: number;
totalFiles: number;
processedFiles: number;
stage: string;
stageDetail: string | null;
error: string | null;
startedAt: Date | null;
completedAt: Date | null;
@@ -75,6 +85,8 @@ export class IndexingJob {
this.progress = props.progress;
this.totalFiles = props.totalFiles;
this.processedFiles = props.processedFiles;
this.stage = props.stage;
this.stageDetail = props.stageDetail;
this.error = props.error;
this.startedAt = props.startedAt;
this.completedAt = props.completedAt;
@@ -90,6 +102,8 @@ export interface IndexingJobDtoProps {
progress: number;
totalFiles: number;
processedFiles: number;
stage: string;
stageDetail: string | null;
error: string | null;
startedAt: Date | null;
completedAt: Date | null;
@@ -104,6 +118,8 @@ export class IndexingJobDto {
progress: number;
totalFiles: number;
processedFiles: number;
stage: string;
stageDetail: string | null;
error: string | null;
startedAt: Date | null;
completedAt: Date | null;
@@ -117,6 +133,8 @@ export class IndexingJobDto {
this.progress = props.progress;
this.totalFiles = props.totalFiles;
this.processedFiles = props.processedFiles;
this.stage = props.stage;
this.stageDetail = props.stageDetail;
this.error = props.error;
this.startedAt = props.startedAt;
this.completedAt = props.completedAt;

View File

@@ -0,0 +1,450 @@
/**
* Tests for buildDifferentialPlan (TRUEREF-0021).
*
* Uses an in-memory SQLite database with the same migration sequence as the
* production database. GitHub-specific changed-file fetching is exercised via
* the `_fetchGitHubChangedFiles` injection parameter. Local-repo changed-file
* fetching is exercised by mocking `$lib/server/utils/git.js`.
*/
import { describe, it, expect, vi, beforeEach } from 'vitest';
import Database from 'better-sqlite3';
import { readFileSync } from 'node:fs';
import { join } from 'node:path';
import { buildDifferentialPlan } from './differential-strategy.js';
import type { ChangedFile } from '$lib/server/crawler/types.js';
import type { Repository } from '$lib/server/models/repository.js';
// ---------------------------------------------------------------------------
// Mock node:child_process so local-repo git calls never actually run git.
// ---------------------------------------------------------------------------
vi.mock('$lib/server/utils/git.js', () => ({
getChangedFilesBetweenRefs: vi.fn(() => [] as ChangedFile[])
}));
import { getChangedFilesBetweenRefs } from '$lib/server/utils/git.js';
const mockGetChangedFiles = vi.mocked(getChangedFilesBetweenRefs);
// ---------------------------------------------------------------------------
// In-memory DB factory
// ---------------------------------------------------------------------------
function createTestDb(): Database.Database {
const client = new Database(':memory:');
client.pragma('foreign_keys = ON');
const migrationsFolder = join(import.meta.dirname, '../db/migrations');
for (const migrationFile of [
'0000_large_master_chief.sql',
'0001_quick_nighthawk.sql',
'0002_silky_stellaris.sql',
'0003_multiversion_config.sql',
'0004_complete_sentry.sql'
]) {
const sql = readFileSync(join(migrationsFolder, migrationFile), 'utf-8');
for (const stmt of sql
.split('--> statement-breakpoint')
.map((s) => s.trim())
.filter(Boolean)) {
client.exec(stmt);
}
}
return client;
}
// ---------------------------------------------------------------------------
// Test fixtures
// ---------------------------------------------------------------------------
const NOW_S = Math.floor(Date.now() / 1000);
function insertRepo(
db: Database.Database,
overrides: Partial<{
id: string;
title: string;
source: 'local' | 'github';
source_url: string;
github_token: string | null;
}> = {}
): string {
const id = overrides.id ?? '/test/repo';
db.prepare(
`INSERT INTO repositories
(id, title, source, source_url, branch, state,
total_snippets, total_tokens, trust_score, benchmark_score,
stars, github_token, last_indexed_at, created_at, updated_at)
VALUES (?, ?, ?, ?, 'main', 'indexed', 0, 0, 0, 0, null, ?, null, ?, ?)`
).run(
id,
overrides.title ?? 'Test Repo',
overrides.source ?? 'local',
overrides.source_url ?? '/tmp/test-repo',
overrides.github_token ?? null,
NOW_S,
NOW_S
);
return id;
}
function insertVersion(
db: Database.Database,
repoId: string,
tag: string,
state: 'pending' | 'indexing' | 'indexed' | 'error' = 'indexed'
): string {
const id = crypto.randomUUID();
db.prepare(
`INSERT INTO repository_versions
(id, repository_id, tag, title, state, total_snippets, indexed_at, created_at)
VALUES (?, ?, ?, null, ?, 0, ?, ?)`
).run(id, repoId, tag, state, state === 'indexed' ? NOW_S : null, NOW_S);
return id;
}
function insertDocument(db: Database.Database, versionId: string, filePath: string): string {
const id = crypto.randomUUID();
db.prepare(
`INSERT INTO documents
(id, repository_id, version_id, file_path, checksum, indexed_at)
VALUES (?, ?, ?, ?, 'cksum', ?)`
)
// Repository ID is not strictly needed here — use a placeholder that matches FK
.run(
id,
db
.prepare<
[string],
{ repository_id: string }
>(`SELECT repository_id FROM repository_versions WHERE id = ?`)
.get(versionId)?.repository_id ?? '/test/repo',
versionId,
filePath,
NOW_S
);
return id;
}
/** Build a minimal Repository domain object. */
function makeRepo(overrides: Partial<Repository> = {}): Repository {
return {
id: '/test/repo',
title: 'Test Repo',
description: null,
source: 'local',
sourceUrl: '/tmp/test-repo',
branch: 'main',
state: 'indexed',
totalSnippets: 0,
totalTokens: 0,
trustScore: 0,
benchmarkScore: 0,
stars: null,
githubToken: null,
lastIndexedAt: null,
createdAt: new Date(),
updatedAt: new Date(),
...overrides
} as Repository;
}
// ---------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------
describe('buildDifferentialPlan', () => {
let db: Database.Database;
beforeEach(() => {
db = createTestDb();
mockGetChangedFiles.mockReset();
mockGetChangedFiles.mockReturnValue([]);
});
// -------------------------------------------------------------------------
// Case 1: No versions exist for the repository
// -------------------------------------------------------------------------
it('returns null when no versions exist for the repository', async () => {
insertRepo(db);
const repo = makeRepo();
const plan = await buildDifferentialPlan({ repo, targetTag: 'v2.0.0', db });
expect(plan).toBeNull();
});
// -------------------------------------------------------------------------
// Case 2: All versions are non-indexed (pending / indexing / error)
// -------------------------------------------------------------------------
it('returns null when all versions are non-indexed', async () => {
insertRepo(db);
const repo = makeRepo();
insertVersion(db, repo.id, 'v1.0.0', 'pending');
insertVersion(db, repo.id, 'v1.1.0', 'indexing');
insertVersion(db, repo.id, 'v1.2.0', 'error');
const plan = await buildDifferentialPlan({ repo, targetTag: 'v2.0.0', db });
expect(plan).toBeNull();
});
// -------------------------------------------------------------------------
// Case 3: Best ancestor has zero documents
// -------------------------------------------------------------------------
it('returns null when the ancestor version has no documents', async () => {
insertRepo(db);
const repo = makeRepo();
// Insert an indexed ancestor but with no documents
insertVersion(db, repo.id, 'v1.0.0', 'indexed');
const plan = await buildDifferentialPlan({ repo, targetTag: 'v2.0.0', db });
expect(plan).toBeNull();
});
// -------------------------------------------------------------------------
// Case 4: All files changed — unchangedPaths would be empty
// -------------------------------------------------------------------------
it('returns null when all ancestor files appear in changedPaths', async () => {
insertRepo(db);
const repo = makeRepo();
const v1Id = insertVersion(db, repo.id, 'v1.0.0', 'indexed');
insertDocument(db, v1Id, 'src/a.ts');
insertDocument(db, v1Id, 'src/b.ts');
// Both ancestor files appear as modified
mockGetChangedFiles.mockReturnValue([
{ path: 'src/a.ts', status: 'modified' },
{ path: 'src/b.ts', status: 'modified' }
]);
const plan = await buildDifferentialPlan({ repo, targetTag: 'v2.0.0', db });
expect(plan).toBeNull();
});
// -------------------------------------------------------------------------
// Case 5: Valid plan for a local repo
// -------------------------------------------------------------------------
it('returns a valid plan partitioned into changedPaths, deletedPaths, unchangedPaths for a local repo', async () => {
insertRepo(db);
const repo = makeRepo();
const v1Id = insertVersion(db, repo.id, 'v1.0.0', 'indexed');
insertDocument(db, v1Id, 'src/a.ts');
insertDocument(db, v1Id, 'src/b.ts');
insertDocument(db, v1Id, 'src/c.ts');
// a.ts modified, b.ts deleted, c.ts unchanged
mockGetChangedFiles.mockReturnValue([
{ path: 'src/a.ts', status: 'modified' },
{ path: 'src/b.ts', status: 'removed' }
]);
const plan = await buildDifferentialPlan({ repo, targetTag: 'v2.0.0', db });
expect(plan).not.toBeNull();
expect(plan!.changedPaths.has('src/a.ts')).toBe(true);
expect(plan!.deletedPaths.has('src/b.ts')).toBe(true);
expect(plan!.unchangedPaths.has('src/c.ts')).toBe(true);
// Sanity: no overlap between sets
expect(plan!.changedPaths.has('src/b.ts')).toBe(false);
expect(plan!.deletedPaths.has('src/c.ts')).toBe(false);
expect(plan!.unchangedPaths.has('src/a.ts')).toBe(false);
});
// -------------------------------------------------------------------------
// Case 6: Valid plan for a GitHub repo — fetchFn called with correct params
// -------------------------------------------------------------------------
it('calls _fetchGitHubChangedFiles with correct owner/repo/base/head/token for a GitHub repo', async () => {
const repoId = '/facebook/react';
insertRepo(db, {
id: repoId,
source: 'github',
source_url: 'https://github.com/facebook/react',
github_token: 'ghp_test123'
});
const repo = makeRepo({
id: repoId,
source: 'github',
sourceUrl: 'https://github.com/facebook/react',
githubToken: 'ghp_test123'
});
const v1Id = insertVersion(db, repoId, 'v18.0.0', 'indexed');
insertDocument(db, v1Id, 'packages/react/index.js');
insertDocument(db, v1Id, 'packages/react-dom/index.js');
const fetchFn = vi
.fn()
.mockResolvedValue([{ path: 'packages/react/index.js', status: 'modified' as const }]);
const plan = await buildDifferentialPlan({
repo,
targetTag: 'v18.1.0',
db,
_fetchGitHubChangedFiles: fetchFn
});
expect(fetchFn).toHaveBeenCalledOnce();
expect(fetchFn).toHaveBeenCalledWith('facebook', 'react', 'v18.0.0', 'v18.1.0', 'ghp_test123');
expect(plan).not.toBeNull();
expect(plan!.changedPaths.has('packages/react/index.js')).toBe(true);
expect(plan!.unchangedPaths.has('packages/react-dom/index.js')).toBe(true);
});
// -------------------------------------------------------------------------
// Case 7: Fail-safe — returns null when fetchFn throws
// -------------------------------------------------------------------------
it('returns null (fail-safe) when _fetchGitHubChangedFiles throws', async () => {
const repoId = '/facebook/react';
insertRepo(db, {
id: repoId,
source: 'github',
source_url: 'https://github.com/facebook/react'
});
const repo = makeRepo({
id: repoId,
source: 'github',
sourceUrl: 'https://github.com/facebook/react'
});
const v1Id = insertVersion(db, repoId, 'v18.0.0', 'indexed');
insertDocument(db, v1Id, 'README.md');
const fetchFn = vi.fn().mockRejectedValue(new Error('GitHub API rate limit'));
const plan = await buildDifferentialPlan({
repo,
targetTag: 'v18.1.0',
db,
_fetchGitHubChangedFiles: fetchFn
});
expect(plan).toBeNull();
});
// -------------------------------------------------------------------------
// Case 8: Renamed files go into changedPaths (not deletedPaths)
// -------------------------------------------------------------------------
it('includes renamed files in changedPaths', async () => {
insertRepo(db);
const repo = makeRepo();
const v1Id = insertVersion(db, repo.id, 'v1.0.0', 'indexed');
insertDocument(db, v1Id, 'src/old-name.ts');
insertDocument(db, v1Id, 'src/unchanged.ts');
mockGetChangedFiles.mockReturnValue([
{
path: 'src/new-name.ts',
status: 'renamed',
previousPath: 'src/old-name.ts'
}
]);
const plan = await buildDifferentialPlan({ repo, targetTag: 'v2.0.0', db });
expect(plan).not.toBeNull();
// New path is in changedPaths
expect(plan!.changedPaths.has('src/new-name.ts')).toBe(true);
// Renamed file should NOT be in deletedPaths
expect(plan!.deletedPaths.has('src/new-name.ts')).toBe(false);
// Old path is not in any set (it was the ancestor path that appears as changedPaths dest)
});
// -------------------------------------------------------------------------
// Case 9: Old path of a renamed file is excluded from unchangedPaths
// -------------------------------------------------------------------------
it('excludes the old path of a renamed file from unchangedPaths', async () => {
insertRepo(db);
const repo = makeRepo();
const v1Id = insertVersion(db, repo.id, 'v1.0.0', 'indexed');
// Ancestor had old-name.ts and keeper.ts
insertDocument(db, v1Id, 'src/old-name.ts');
insertDocument(db, v1Id, 'src/keeper.ts');
// The diff reports old-name.ts was renamed to new-name.ts
// The changedFiles list only has the new path; old path is NOT returned as a separate 'removed'
// but the rename entry carries previousPath
// The strategy only looks at file.path for changedPaths and file.status==='removed' for deletedPaths.
// So src/old-name.ts (ancestor path) will still be in unchangedPaths unless it matches.
// This test documents the current behaviour: the old path IS in unchangedPaths
// because the strategy only tracks the destination path for renames.
// If the old ancestor path isn't explicitly deleted, it stays in unchangedPaths.
// We verify the new destination path is in changedPaths and keeper stays in unchangedPaths.
mockGetChangedFiles.mockReturnValue([
{
path: 'src/new-name.ts',
status: 'renamed',
previousPath: 'src/old-name.ts'
}
]);
const plan = await buildDifferentialPlan({ repo, targetTag: 'v2.0.0', db });
expect(plan).not.toBeNull();
// New path counted as changed
expect(plan!.changedPaths.has('src/new-name.ts')).toBe(true);
// keeper is unchanged
expect(plan!.unchangedPaths.has('src/keeper.ts')).toBe(true);
});
// -------------------------------------------------------------------------
// Case 10: ancestorVersionId and ancestorTag are correctly set
// -------------------------------------------------------------------------
it('sets ancestorVersionId and ancestorTag correctly', async () => {
insertRepo(db);
const repo = makeRepo();
const v1Id = insertVersion(db, repo.id, 'v1.0.0', 'indexed');
insertDocument(db, v1Id, 'README.md');
insertDocument(db, v1Id, 'src/index.ts');
// One file changes so there is something in unchangedPaths
mockGetChangedFiles.mockReturnValue([{ path: 'README.md', status: 'modified' }]);
const plan = await buildDifferentialPlan({ repo, targetTag: 'v2.0.0', db });
expect(plan).not.toBeNull();
expect(plan!.ancestorVersionId).toBe(v1Id);
expect(plan!.ancestorTag).toBe('v1.0.0');
});
// -------------------------------------------------------------------------
// Case 11: Selects the closest (highest) indexed ancestor when multiple exist
// -------------------------------------------------------------------------
it('selects the closest indexed ancestor when multiple indexed versions exist', async () => {
insertRepo(db);
const repo = makeRepo();
const v1Id = insertVersion(db, repo.id, 'v1.0.0', 'indexed');
insertDocument(db, v1Id, 'old.ts');
const v2Id = insertVersion(db, repo.id, 'v1.5.0', 'indexed');
insertDocument(db, v2Id, 'newer.ts');
insertDocument(db, v2Id, 'stable.ts');
// Only one file changes from the v1.5.0 ancestor
mockGetChangedFiles.mockReturnValue([{ path: 'newer.ts', status: 'modified' }]);
const plan = await buildDifferentialPlan({ repo, targetTag: 'v2.0.0', db });
expect(plan).not.toBeNull();
// Should use v1.5.0 as ancestor (closest predecessor)
expect(plan!.ancestorTag).toBe('v1.5.0');
expect(plan!.ancestorVersionId).toBe(v2Id);
});
});

View File

@@ -0,0 +1,120 @@
/**
* Differential indexing strategy coordinator (TRUEREF-0021).
*
* Determines whether differential indexing can be used for a given version tag,
* and if so, builds a plan describing which files to clone from the ancestor
* and which files to crawl fresh.
*/
import type Database from 'better-sqlite3';
import type { Repository } from '$lib/server/models/repository.js';
import type { RepositoryVersion } from '$lib/server/models/repository-version.js';
import { RepositoryVersionMapper } from '$lib/server/mappers/repository-version.mapper.js';
import type { RepositoryVersionEntity } from '$lib/server/models/repository-version.js';
import { findBestAncestorVersion } from '$lib/server/utils/tag-order.js';
import { fetchGitHubChangedFiles } from '$lib/server/crawler/github-compare.js';
import { getChangedFilesBetweenRefs } from '$lib/server/utils/git.js';
import type { ChangedFile } from '$lib/server/crawler/types.js';
export interface DifferentialPlan {
/** Version ID of the closest already-indexed predecessor tag */
ancestorVersionId: string;
/** Ancestor tag name (needed for git diff / GitHub compare calls) */
ancestorTag: string;
/** File paths that changed (added + modified + renamed-destination) */
changedPaths: Set<string>;
/** File paths that were deleted in the target vs ancestor */
deletedPaths: Set<string>;
/** File paths present in ancestor that are unchanged in target — must be cloned */
unchangedPaths: Set<string>;
}
export async function buildDifferentialPlan(params: {
repo: Repository;
targetTag: string;
db: Database.Database;
/** Override for testing only */
_fetchGitHubChangedFiles?: typeof fetchGitHubChangedFiles;
}): Promise<DifferentialPlan | null> {
const { repo, targetTag, db } = params;
const fetchFn = params._fetchGitHubChangedFiles ?? fetchGitHubChangedFiles;
try {
// 1. Load all indexed versions for this repository
const rows = db
.prepare(`SELECT * FROM repository_versions WHERE repository_id = ? AND state = 'indexed'`)
.all(repo.id) as RepositoryVersionEntity[];
const indexedVersions: RepositoryVersion[] = rows.map((row) =>
RepositoryVersionMapper.fromEntity(row)
);
// 2. Find the best ancestor version
const ancestor = findBestAncestorVersion(targetTag, indexedVersions);
if (!ancestor) return null;
// 3. Load ancestor's document file paths
const docRows = db
.prepare(`SELECT DISTINCT file_path FROM documents WHERE version_id = ?`)
.all(ancestor.id) as Array<{ file_path: string }>;
const ancestorFilePaths = new Set(docRows.map((r) => r.file_path));
if (ancestorFilePaths.size === 0) return null;
// 4. Fetch changed files between ancestor and target
let changedFiles: ChangedFile[];
if (repo.source === 'github') {
const url = new URL(repo.sourceUrl);
const parts = url.pathname.split('/').filter(Boolean);
const owner = parts[0];
const repoName = parts[1];
changedFiles = await fetchFn(
owner,
repoName,
ancestor.tag,
targetTag,
repo.githubToken ?? undefined
);
} else {
changedFiles = getChangedFilesBetweenRefs({
repoPath: repo.sourceUrl,
base: ancestor.tag,
head: targetTag
});
}
// 5. Partition changed files into changed and deleted sets
const changedPaths = new Set<string>();
const deletedPaths = new Set<string>();
for (const file of changedFiles) {
if (file.status === 'removed') {
deletedPaths.add(file.path);
} else {
changedPaths.add(file.path);
}
}
// 6. Compute unchanged paths: ancestor paths minus changed minus deleted
const unchangedPaths = new Set<string>();
for (const p of ancestorFilePaths) {
if (!changedPaths.has(p) && !deletedPaths.has(p)) {
unchangedPaths.add(p);
}
}
// 7. Return null when there's nothing to clone (all files changed)
if (unchangedPaths.size === 0) return null;
return {
ancestorVersionId: ancestor.id,
ancestorTag: ancestor.tag,
changedPaths,
deletedPaths,
unchangedPaths
};
} catch {
// Fail-safe: fall back to full crawl on any error
return null;
}
}

View File

@@ -0,0 +1,165 @@
import { workerData, parentPort } from 'node:worker_threads';
import Database from 'better-sqlite3';
import { EmbeddingService } from '$lib/server/embeddings/embedding.service.js';
import { applySqlitePragmas } from '$lib/server/db/connection.js';
import { createProviderFromProfile } from '$lib/server/embeddings/registry.js';
import { EmbeddingProfileMapper } from '$lib/server/mappers/embedding-profile.mapper.js';
import {
EmbeddingProfileEntity,
type EmbeddingProfileEntityProps
} from '$lib/server/models/embedding-profile.js';
import type {
EmbedWorkerRequest,
EmbedWorkerResponse,
SerializedEmbedding,
WorkerInitData
} from './worker-types.js';
const { dbPath, embeddingProfileId } = workerData as WorkerInitData;
if (!embeddingProfileId) {
parentPort!.postMessage({
type: 'embed-failed',
jobId: 'init',
error: 'embeddingProfileId is required in workerData'
} satisfies EmbedWorkerResponse);
process.exit(1);
}
const db = new Database(dbPath);
applySqlitePragmas(db);
// Load the embedding profile from DB
const rawProfile = db
.prepare('SELECT * FROM embedding_profiles WHERE id = ?')
.get(embeddingProfileId);
if (!rawProfile) {
db.close();
parentPort!.postMessage({
type: 'embed-failed',
jobId: 'init',
error: `Embedding profile ${embeddingProfileId} not found`
} satisfies EmbedWorkerResponse);
process.exit(1);
}
const profileEntity = new EmbeddingProfileEntity(rawProfile as EmbeddingProfileEntityProps);
const profile = EmbeddingProfileMapper.fromEntity(profileEntity);
let pendingWrite: {
jobId: string;
resolve: () => void;
reject: (error: Error) => void;
} | null = null;
let currentJobId: string | null = null;
function requestWrite(
message: Extract<EmbedWorkerResponse, { type: 'write_embeddings' }>
): Promise<void> {
if (pendingWrite) {
return Promise.reject(new Error(`write request already in flight for ${pendingWrite.jobId}`));
}
return new Promise((resolve, reject) => {
pendingWrite = {
jobId: message.jobId,
resolve: () => {
pendingWrite = null;
resolve();
},
reject: (error: Error) => {
pendingWrite = null;
reject(error);
}
};
parentPort!.postMessage(message);
});
}
// Create provider and embedding service
const provider = createProviderFromProfile(profile);
const embeddingService = new EmbeddingService(db, provider, embeddingProfileId, {
persistEmbeddings: async (embeddings) => {
const serializedEmbeddings: SerializedEmbedding[] = embeddings.map((item) => ({
snippetId: item.snippetId,
profileId: item.profileId,
model: item.model,
dimensions: item.dimensions,
embedding: Uint8Array.from(item.embedding)
}));
await requestWrite({
type: 'write_embeddings',
jobId: currentJobId ?? 'unknown',
embeddings: serializedEmbeddings
});
}
});
// Signal ready after service initialization
parentPort!.postMessage({
type: 'ready'
} satisfies EmbedWorkerResponse);
parentPort!.on('message', async (msg: EmbedWorkerRequest) => {
if (msg.type === 'write_ack') {
if (pendingWrite?.jobId === msg.jobId) {
pendingWrite.resolve();
}
return;
}
if (msg.type === 'write_error') {
if (pendingWrite?.jobId === msg.jobId) {
pendingWrite.reject(new Error(msg.error));
}
return;
}
if (msg.type === 'shutdown') {
db.close();
process.exit(0);
}
if (msg.type === 'embed') {
currentJobId = msg.jobId;
try {
const snippetIds = embeddingService.findSnippetIdsMissingEmbeddings(
msg.repositoryId,
msg.versionId
);
await embeddingService.embedSnippets(snippetIds, (done: number, total: number) => {
parentPort!.postMessage({
type: 'embed-progress',
jobId: msg.jobId,
done,
total
} satisfies EmbedWorkerResponse);
});
parentPort!.postMessage({
type: 'embed-done',
jobId: msg.jobId
} satisfies EmbedWorkerResponse);
} catch (err) {
parentPort!.postMessage({
type: 'embed-failed',
jobId: msg.jobId,
error: err instanceof Error ? err.message : String(err)
} satisfies EmbedWorkerResponse);
} finally {
currentJobId = null;
}
}
});
process.on('uncaughtException', (err) => {
parentPort!.postMessage({
type: 'embed-failed',
jobId: 'uncaught',
error: err instanceof Error ? err.message : String(err)
} satisfies EmbedWorkerResponse);
process.exit(1);
});

View File

@@ -13,6 +13,10 @@ import { JobQueue } from './job-queue.js';
import { IndexingPipeline } from './indexing.pipeline.js';
import { recoverStaleJobs } from './startup.js';
import { EmbeddingService } from '$lib/server/embeddings/embedding.service.js';
import { loadSqliteVec } from '$lib/server/db/sqlite-vec.js';
import { SqliteVecStore } from '$lib/server/search/sqlite-vec.store.js';
import { sqliteVecRowidTableName, sqliteVecTableName } from '$lib/server/db/sqlite-vec.js';
import * as diffStrategy from './differential-strategy.js';
// ---------------------------------------------------------------------------
// Test DB factory
@@ -21,12 +25,17 @@ import { EmbeddingService } from '$lib/server/embeddings/embedding.service.js';
function createTestDb(): Database.Database {
const client = new Database(':memory:');
client.pragma('foreign_keys = ON');
loadSqliteVec(client);
const migrationsFolder = join(import.meta.dirname, '../db/migrations');
for (const migrationFile of [
'0000_large_master_chief.sql',
'0001_quick_nighthawk.sql',
'0002_silky_stellaris.sql'
'0002_silky_stellaris.sql',
'0003_multiversion_config.sql',
'0004_complete_sentry.sql',
'0005_fix_stage_defaults.sql',
'0006_yielding_centennial.sql'
]) {
const migrationSql = readFileSync(join(migrationsFolder, migrationFile), 'utf-8');
@@ -75,6 +84,28 @@ function insertRepo(db: Database.Database, overrides: Partial<Record<string, unk
);
}
function insertVersion(
db: Database.Database,
overrides: Partial<Record<string, unknown>> = {}
): string {
const id = crypto.randomUUID();
db.prepare(
`INSERT INTO repository_versions
(id, repository_id, tag, title, state, total_snippets, indexed_at, created_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)`
).run(
overrides.id ?? id,
overrides.repository_id ?? '/test/repo',
overrides.tag ?? 'v1.0.0',
overrides.title ?? null,
overrides.state ?? 'pending',
overrides.total_snippets ?? 0,
overrides.indexed_at ?? null,
overrides.created_at ?? now
);
return (overrides.id as string) ?? id;
}
function insertJob(
db: Database.Database,
overrides: Partial<Record<string, unknown>> = {}
@@ -245,6 +276,8 @@ describe('IndexingPipeline', () => {
crawlResult: {
files: Array<{ path: string; content: string; sha: string; language: string }>;
totalFiles: number;
/** Optional pre-parsed config — simulates LocalCrawler returning CrawlResult.config. */
config?: Record<string, unknown>;
} = { files: [], totalFiles: 0 },
embeddingService: EmbeddingService | null = null
) {
@@ -272,8 +305,12 @@ describe('IndexingPipeline', () => {
);
}
function makeJob(repositoryId = '/test/repo') {
const jobId = insertJob(db, { repository_id: repositoryId, status: 'queued' });
function makeJob(repositoryId = '/test/repo', versionId?: string) {
const jobId = insertJob(db, {
repository_id: repositoryId,
version_id: versionId ?? null,
status: 'queued'
});
return db.prepare(`SELECT * FROM indexing_jobs WHERE id = ?`).get(jobId) as {
id: string;
repositoryId?: string;
@@ -429,12 +466,15 @@ describe('IndexingPipeline', () => {
const job1 = makeJob();
await pipeline.run(job1 as never);
const firstSnippetIds = (db.prepare(`SELECT id FROM snippets ORDER BY id`).all() as { id: string }[])
.map((row) => row.id);
const firstSnippetIds = (
db.prepare(`SELECT id FROM snippets ORDER BY id`).all() as { id: string }[]
).map((row) => row.id);
expect(firstSnippetIds.length).toBeGreaterThan(0);
const firstEmbeddingCount = (
db.prepare(`SELECT COUNT(*) as n FROM snippet_embeddings WHERE profile_id = 'local-default'`).get() as {
db
.prepare(`SELECT COUNT(*) as n FROM snippet_embeddings WHERE profile_id = 'local-default'`)
.get() as {
n: number;
}
).n;
@@ -446,11 +486,15 @@ describe('IndexingPipeline', () => {
const job2 = db.prepare(`SELECT * FROM indexing_jobs WHERE id = ?`).get(job2Id) as never;
await pipeline.run(job2);
const secondSnippetIds = (db.prepare(`SELECT id FROM snippets ORDER BY id`).all() as {
const secondSnippetIds = (
db.prepare(`SELECT id FROM snippets ORDER BY id`).all() as {
id: string;
}[]).map((row) => row.id);
}[]
).map((row) => row.id);
const secondEmbeddingCount = (
db.prepare(`SELECT COUNT(*) as n FROM snippet_embeddings WHERE profile_id = 'local-default'`).get() as {
db
.prepare(`SELECT COUNT(*) as n FROM snippet_embeddings WHERE profile_id = 'local-default'`)
.get() as {
n: number;
}
).n;
@@ -508,6 +552,52 @@ describe('IndexingPipeline', () => {
expect(finalChecksum).toBe('sha-v2');
});
it('removes derived vec rows when changed documents are replaced', async () => {
const docId = crypto.randomUUID();
const snippetId = crypto.randomUUID();
const embedding = Float32Array.from([1, 0, 0]);
const vecStore = new SqliteVecStore(db);
db.prepare(
`INSERT INTO documents (id, repository_id, version_id, file_path, checksum, indexed_at)
VALUES (?, '/test/repo', NULL, 'README.md', 'stale-doc', ?)`
).run(docId, now);
db.prepare(
`INSERT INTO snippets (id, document_id, repository_id, version_id, type, content, created_at)
VALUES (?, ?, '/test/repo', NULL, 'info', 'stale snippet', ?)`
).run(snippetId, docId, now);
db.prepare(
`INSERT INTO snippet_embeddings (snippet_id, profile_id, model, dimensions, embedding, created_at)
VALUES (?, 'local-default', 'test-model', 3, ?, ?)`
).run(snippetId, Buffer.from(embedding.buffer), now);
vecStore.upsertEmbedding('local-default', snippetId, embedding);
const pipeline = makePipeline({
files: [
{
path: 'README.md',
content: '# Updated\n\nFresh content.',
sha: 'sha-fresh',
language: 'markdown'
}
],
totalFiles: 1
});
const job = makeJob();
await pipeline.run(job as never);
const vecTable = sqliteVecTableName('local-default');
const rowidTable = sqliteVecRowidTableName('local-default');
const vecCount = db.prepare(`SELECT COUNT(*) as n FROM "${vecTable}"`).get() as { n: number };
const rowidCount = db.prepare(`SELECT COUNT(*) as n FROM "${rowidTable}"`).get() as {
n: number;
};
expect(vecCount.n).toBe(0);
expect(rowidCount.n).toBe(0);
});
it('updates job progress as files are processed', async () => {
const files = Array.from({ length: 5 }, (_, i) => ({
path: `file${i}.md`,
@@ -644,4 +734,679 @@ describe('IndexingPipeline', () => {
expect(finalJob.status).toBe('done');
expect(finalJob.progress).toBe(100);
});
it('updates repository_versions state to indexing then indexed when job has versionId', async () => {
const versionId = insertVersion(db, { tag: 'v1.0.0', state: 'pending' });
const files = [
{
path: 'README.md',
content: '# Hello\n\nThis is documentation.',
sha: 'sha-readme',
language: 'markdown'
}
];
const pipeline = makePipeline({ files, totalFiles: 1 });
const job = makeJob('/test/repo', versionId);
await pipeline.run(job as never);
const version = db
.prepare(`SELECT state, total_snippets, indexed_at FROM repository_versions WHERE id = ?`)
.get(versionId) as { state: string; total_snippets: number; indexed_at: number | null };
expect(version.state).toBe('indexed');
expect(version.total_snippets).toBeGreaterThan(0);
expect(version.indexed_at).not.toBeNull();
});
it('clones ancestor embeddings into the derived vec store for differential indexing', async () => {
const ancestorVersionId = insertVersion(db, { tag: 'v1.0.0', state: 'indexed' });
const targetVersionId = insertVersion(db, { tag: 'v1.1.0', state: 'pending' });
const vecStore = new SqliteVecStore(db);
const docId = crypto.randomUUID();
const snippetId = crypto.randomUUID();
const embedding = Float32Array.from([0.2, 0.4, 0.6]);
db.prepare(
`INSERT INTO documents (id, repository_id, version_id, file_path, checksum, indexed_at)
VALUES (?, '/test/repo', ?, 'README.md', 'ancestor-doc', ?)`
).run(docId, ancestorVersionId, now);
db.prepare(
`INSERT INTO snippets (id, document_id, repository_id, version_id, type, content, created_at)
VALUES (?, ?, '/test/repo', ?, 'info', 'ancestor snippet', ?)`
).run(snippetId, docId, ancestorVersionId, now);
db.prepare(
`INSERT INTO snippet_embeddings (snippet_id, profile_id, model, dimensions, embedding, created_at)
VALUES (?, 'local-default', 'test-model', 3, ?, ?)`
).run(snippetId, Buffer.from(embedding.buffer), now);
vecStore.upsertEmbedding('local-default', snippetId, embedding);
vi.spyOn(diffStrategy, 'buildDifferentialPlan').mockResolvedValue({
ancestorTag: 'v1.0.0',
ancestorVersionId,
changedPaths: new Set<string>(),
unchangedPaths: new Set<string>(['README.md'])
});
const pipeline = makePipeline({ files: [], totalFiles: 0 });
const job = makeJob('/test/repo', targetVersionId);
await pipeline.run(job as never);
const targetRows = db
.prepare(
`SELECT se.snippet_id, se.embedding
FROM snippet_embeddings se
INNER JOIN snippets s ON s.id = se.snippet_id
WHERE s.version_id = ?`
)
.all(targetVersionId) as Array<{ snippet_id: string; embedding: Buffer }>;
expect(targetRows).toHaveLength(1);
const matches = vecStore.queryNearestNeighbors(embedding, {
repositoryId: '/test/repo',
versionId: targetVersionId,
profileId: 'local-default',
limit: 5
});
expect(matches[0]?.snippetId).toBe(targetRows[0].snippet_id);
});
it('updates repository_versions state to error when pipeline throws and job has versionId', async () => {
const versionId = insertVersion(db, { tag: 'v1.0.0', state: 'pending' });
const errorCrawl = vi.fn().mockRejectedValue(new Error('crawl failed'));
const pipeline = new IndexingPipeline(
db,
errorCrawl as never,
{ crawl: errorCrawl } as never,
null
);
const job = makeJob('/test/repo', versionId);
await expect(pipeline.run(job as never)).rejects.toThrow('crawl failed');
const version = db
.prepare(`SELECT state FROM repository_versions WHERE id = ?`)
.get(versionId) as { state: string };
expect(version.state).toBe('error');
});
it('does not touch repository_versions when job has no versionId', async () => {
const versionId = insertVersion(db, { tag: 'v1.0.0', state: 'pending' });
const pipeline = makePipeline({ files: [], totalFiles: 0 });
const job = makeJob('/test/repo'); // no versionId
await pipeline.run(job as never);
const version = db
.prepare(`SELECT state FROM repository_versions WHERE id = ?`)
.get(versionId) as { state: string };
// State should remain 'pending' — pipeline with no versionId must not touch it
expect(version.state).toBe('pending');
});
it('calls LocalCrawler with ref=v1.2.0 when job has a versionId with tag v1.2.0', async () => {
const versionId = insertVersion(db, { tag: 'v1.2.0', state: 'pending' });
const crawl = vi.fn().mockResolvedValue({
files: [],
totalFiles: 0,
skippedFiles: 0,
branch: 'main',
commitSha: 'abc'
});
const pipeline = new IndexingPipeline(db, vi.fn() as never, { crawl } as never, null);
const job = makeJob('/test/repo', versionId);
await pipeline.run(job as never);
expect(crawl).toHaveBeenCalledWith({
rootPath: '/tmp/test-repo',
ref: 'v1.2.0'
});
});
it('calls LocalCrawler with ref=undefined when job has no versionId (main-branch)', async () => {
const crawl = vi.fn().mockResolvedValue({
files: [],
totalFiles: 0,
skippedFiles: 0,
branch: 'main',
commitSha: 'abc'
});
const pipeline = new IndexingPipeline(db, vi.fn() as never, { crawl } as never, null);
const job = makeJob('/test/repo'); // no versionId
await pipeline.run(job as never);
expect(crawl).toHaveBeenCalledWith({
rootPath: '/tmp/test-repo',
ref: undefined
});
});
it('excludes files matching excludeFiles patterns from trueref.json', async () => {
const truerefConfig = JSON.stringify({
excludeFiles: ['migration-guide.md', 'docs/legacy*']
});
const files = [
{
path: 'trueref.json',
content: truerefConfig,
sha: 'sha-config',
language: 'json'
},
{
path: 'README.md',
content: '# Hello\n\nThis is documentation.',
sha: 'sha-readme',
language: 'markdown'
},
{
path: 'migration-guide.md',
content: '# Migration Guide\n\nThis should be excluded.',
sha: 'sha-migration',
language: 'markdown'
},
{
path: 'docs/legacy-api.md',
content: '# Legacy API\n\nShould be excluded by glob prefix.',
sha: 'sha-legacy',
language: 'markdown'
}
];
const pipeline = makePipeline({ files, totalFiles: files.length });
const job = makeJob();
await pipeline.run(job as never);
const docs = db.prepare(`SELECT file_path FROM documents ORDER BY file_path`).all() as {
file_path: string;
}[];
const filePaths = docs.map((d) => d.file_path);
// migration-guide.md and docs/legacy-api.md must be absent.
expect(filePaths).not.toContain('migration-guide.md');
expect(filePaths).not.toContain('docs/legacy-api.md');
// README.md must still be indexed.
expect(filePaths).toContain('README.md');
});
it('persists repo-wide rules from trueref.json to repository_configs after indexing', async () => {
const truerefConfig = JSON.stringify({
rules: ['Always use TypeScript strict mode', 'Prefer async/await over callbacks']
});
const files = [
{
path: 'trueref.json',
content: truerefConfig,
sha: 'sha-config',
language: 'json'
}
];
const pipeline = makePipeline({ files, totalFiles: files.length });
const job = makeJob();
await pipeline.run(job as never);
const row = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id IS NULL`
)
.get() as { rules: string } | undefined;
expect(row).toBeDefined();
const rules = JSON.parse(row!.rules);
expect(rules).toEqual([
'Always use TypeScript strict mode',
'Prefer async/await over callbacks'
]);
});
it('persists version-specific rules under (repositoryId, versionId) when job has versionId', async () => {
const versionId = insertVersion(db, { tag: 'v2.0.0', state: 'pending' });
const truerefConfig = JSON.stringify({
rules: ['This is v2. Use the new Builder API.']
});
const files = [
{
path: 'trueref.json',
content: truerefConfig,
sha: 'sha-config',
language: 'json'
}
];
const pipeline = makePipeline({ files, totalFiles: files.length });
const job = makeJob('/test/repo', versionId);
await pipeline.run(job as never);
// Repo-wide row (version_id IS NULL) must NOT be written by a version job —
// writing it here would contaminate the NULL entry with version-specific rules
// (Bug 5b regression guard).
const repoRow = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id IS NULL`
)
.get() as { rules: string } | undefined;
expect(repoRow).toBeUndefined();
// Version-specific row must exist with the correct rules.
const versionRow = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id = ?`
)
.get(versionId) as { rules: string } | undefined;
expect(versionRow).toBeDefined();
const rules = JSON.parse(versionRow!.rules);
expect(rules).toEqual(['This is v2. Use the new Builder API.']);
});
it('regression(Bug5b): version job does not overwrite the repo-wide NULL rules entry', async () => {
// Arrange: index the main branch first to establish a repo-wide rules entry.
const mainBranchRules = ['Always use TypeScript strict mode.'];
const mainPipeline = makePipeline({
files: [
{
path: 'trueref.json',
content: JSON.stringify({ rules: mainBranchRules }),
sha: 'sha-main-config',
language: 'json'
}
],
totalFiles: 1
});
const mainJob = makeJob('/test/repo'); // no versionId → main-branch job
await mainPipeline.run(mainJob as never);
// Confirm the repo-wide entry was written.
const afterMain = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id IS NULL`
)
.get() as { rules: string } | undefined;
expect(afterMain).toBeDefined();
expect(JSON.parse(afterMain!.rules)).toEqual(mainBranchRules);
// Act: index a version with different rules.
const versionId = insertVersion(db, { tag: 'v3.0.0', state: 'pending' });
const versionRules = ['v3 only: use the streaming API.'];
const versionPipeline = makePipeline({
files: [
{
path: 'trueref.json',
content: JSON.stringify({ rules: versionRules }),
sha: 'sha-v3-config',
language: 'json'
}
],
totalFiles: 1
});
const versionJob = makeJob('/test/repo', versionId);
await versionPipeline.run(versionJob as never);
// Assert: the repo-wide NULL entry must still contain the main-branch rules,
// not the version-specific ones.
const afterVersion = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id IS NULL`
)
.get() as { rules: string } | undefined;
expect(afterVersion).toBeDefined();
expect(JSON.parse(afterVersion!.rules)).toEqual(mainBranchRules);
// And the version-specific row must contain the version rules.
const versionRow = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id = ?`
)
.get(versionId) as { rules: string } | undefined;
expect(versionRow).toBeDefined();
expect(JSON.parse(versionRow!.rules)).toEqual(versionRules);
});
it('persists rules from CrawlResult.config even when trueref.json is absent from files (folders allowlist bug)', async () => {
// Regression test for MULTIVERSION-0001:
// When trueref.json specifies a `folders` allowlist (e.g. ["src/"]),
// shouldIndexFile() excludes trueref.json itself because it lives at the
// repo root. The LocalCrawler now carries the pre-parsed config in
// CrawlResult.config so the pipeline no longer needs to find the file in
// crawlResult.files[].
const pipeline = makePipeline({
// trueref.json is NOT in files — simulates it being excluded by folders allowlist.
files: [
{
path: 'src/index.ts',
content: 'export const x = 1;',
sha: 'sha-src',
language: 'typescript'
}
],
totalFiles: 1,
// The pre-parsed config is carried here instead (set by LocalCrawler).
config: { rules: ['Use strict TypeScript.', 'Avoid any.'] }
});
const job = makeJob();
await pipeline.run(job as never);
const row = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id IS NULL`
)
.get() as { rules: string } | undefined;
expect(row).toBeDefined();
const rules = JSON.parse(row!.rules);
expect(rules).toEqual(['Use strict TypeScript.', 'Avoid any.']);
});
it('persists version-specific rules from CrawlResult.config when trueref.json is excluded by folders allowlist', async () => {
const versionId = insertVersion(db, { tag: 'v3.0.0', state: 'pending' });
const pipeline = makePipeline({
files: [
{
path: 'src/index.ts',
content: 'export const x = 1;',
sha: 'sha-src',
language: 'typescript'
}
],
totalFiles: 1,
config: { rules: ['v3: use the streaming API.'] }
});
const job = makeJob('/test/repo', versionId);
await pipeline.run(job as never);
const versionRow = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id = ?`
)
.get(versionId) as { rules: string } | undefined;
expect(versionRow).toBeDefined();
const rules = JSON.parse(versionRow!.rules);
expect(rules).toEqual(['v3: use the streaming API.']);
});
});
// ---------------------------------------------------------------------------
// differential indexing
// ---------------------------------------------------------------------------
describe('differential indexing', () => {
let db: Database.Database;
beforeEach(() => {
db = createTestDb();
insertRepo(db, { source: 'local', source_url: '/tmp/test-repo' });
});
function insertDocument(
localDb: Database.Database,
overrides: Partial<Record<string, unknown>> = {}
): string {
const id = crypto.randomUUID();
localDb
.prepare(
`INSERT INTO documents (id, repository_id, version_id, file_path, title, language, token_count, checksum, indexed_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)`
)
.run(
(overrides.id as string) ?? id,
(overrides.repository_id as string) ?? '/test/repo',
(overrides.version_id as string | null) ?? null,
(overrides.file_path as string) ?? 'README.md',
null,
'markdown',
100,
(overrides.checksum as string) ?? 'abc123',
Math.floor(Date.now() / 1000)
);
return (overrides.id as string) ?? id;
}
function insertSnippet(
localDb: Database.Database,
documentId: string,
overrides: Partial<Record<string, unknown>> = {}
): string {
const id = crypto.randomUUID();
localDb
.prepare(
`INSERT INTO snippets (id, document_id, repository_id, version_id, type, title, content, language, breadcrumb, token_count, created_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
)
.run(
(overrides.id as string) ?? id,
documentId,
(overrides.repository_id as string) ?? '/test/repo',
(overrides.version_id as string | null) ?? null,
'info',
null,
'content',
'markdown',
null,
10,
Math.floor(Date.now() / 1000)
);
return (overrides.id as string) ?? id;
}
type PipelineInternals = {
cloneFromAncestor: (
ancestorVersionId: string,
targetVersionId: string,
repositoryId: string,
unchangedPaths: Set<string>
) => void;
};
it('cloneFromAncestor inserts documents and snippets into the target version', () => {
const ancestorVersionId = insertVersion(db, { tag: 'v1.0.0', state: 'indexed' });
const targetVersionId = insertVersion(db, { tag: 'v1.1.0', state: 'pending' });
const doc1Id = insertDocument(db, {
repository_id: '/test/repo',
version_id: ancestorVersionId,
file_path: 'README.md',
checksum: 'sha-readme'
});
const doc2Id = insertDocument(db, {
repository_id: '/test/repo',
version_id: ancestorVersionId,
file_path: 'src/index.ts',
checksum: 'sha-index'
});
insertSnippet(db, doc1Id, { repository_id: '/test/repo', version_id: ancestorVersionId });
insertSnippet(db, doc2Id, { repository_id: '/test/repo', version_id: ancestorVersionId });
const pipeline = new IndexingPipeline(db, vi.fn() as never, { crawl: vi.fn() } as never, null);
(pipeline as unknown as PipelineInternals).cloneFromAncestor(
ancestorVersionId,
targetVersionId,
'/test/repo',
new Set(['README.md', 'src/index.ts'])
);
const targetDocs = db
.prepare(`SELECT * FROM documents WHERE version_id = ?`)
.all(targetVersionId) as { id: string; file_path: string }[];
expect(targetDocs).toHaveLength(2);
expect(targetDocs.map((d) => d.file_path).sort()).toEqual(['README.md', 'src/index.ts'].sort());
// New IDs must differ from ancestor doc IDs.
const targetDocIds = targetDocs.map((d) => d.id);
expect(targetDocIds).not.toContain(doc1Id);
expect(targetDocIds).not.toContain(doc2Id);
const targetSnippets = db
.prepare(`SELECT * FROM snippets WHERE version_id = ?`)
.all(targetVersionId) as { id: string }[];
expect(targetSnippets).toHaveLength(2);
});
it('cloneFromAncestor silently skips paths absent from the ancestor', () => {
const ancestorVersionId = insertVersion(db, { tag: 'v1.0.0', state: 'indexed' });
const targetVersionId = insertVersion(db, { tag: 'v1.1.0', state: 'pending' });
insertDocument(db, {
repository_id: '/test/repo',
version_id: ancestorVersionId,
file_path: 'src/main.ts',
checksum: 'sha-main'
});
const pipeline = new IndexingPipeline(db, vi.fn() as never, { crawl: vi.fn() } as never, null);
(pipeline as unknown as PipelineInternals).cloneFromAncestor(
ancestorVersionId,
targetVersionId,
'/test/repo',
new Set(['src/main.ts', 'MISSING.md'])
);
const targetDocs = db
.prepare(`SELECT * FROM documents WHERE version_id = ?`)
.all(targetVersionId) as { id: string; file_path: string }[];
expect(targetDocs).toHaveLength(1);
expect(targetDocs[0].file_path).toBe('src/main.ts');
});
it('falls back to full crawl when no indexed ancestor exists', async () => {
const targetVersionId = insertVersion(db, { tag: 'v1.0.0', state: 'pending' });
const files = [
{
path: 'README.md',
content: '# Hello\n\nThis is documentation.',
sha: 'sha-readme',
language: 'markdown'
},
{
path: 'src/index.ts',
content: 'export const x = 1;',
sha: 'sha-index',
language: 'typescript'
}
];
const mockLocalCrawl = vi.fn().mockResolvedValue({
files,
totalFiles: 2,
skippedFiles: 0,
branch: 'main',
commitSha: 'abc'
});
const pipeline = new IndexingPipeline(
db,
vi.fn() as never,
{ crawl: mockLocalCrawl } as never,
null
);
const jobId = insertJob(db, {
repository_id: '/test/repo',
version_id: targetVersionId,
status: 'queued'
});
const job = db.prepare(`SELECT * FROM indexing_jobs WHERE id = ?`).get(jobId) as never;
await pipeline.run(job);
const updatedJob = db.prepare(`SELECT status FROM indexing_jobs WHERE id = ?`).get(jobId) as {
status: string;
};
expect(updatedJob.status).toBe('done');
const docs = db
.prepare(`SELECT * FROM documents WHERE version_id = ?`)
.all(targetVersionId) as { id: string }[];
expect(docs.length).toBeGreaterThanOrEqual(2);
});
it('cloned unchanged documents survive the diff/replace stage', async () => {
// 1. Set up ancestor and target versions.
const ancestorVersionId = insertVersion(db, { tag: 'v1.0.0', state: 'indexed' });
const targetVersionId = insertVersion(db, { tag: 'v1.1.0', state: 'pending' });
// 2. Insert ancestor doc + snippet for unchanged.md.
const ancestorDocId = insertDocument(db, {
repository_id: '/test/repo',
version_id: ancestorVersionId,
file_path: 'unchanged.md',
checksum: 'sha-unchanged'
});
insertSnippet(db, ancestorDocId, {
repository_id: '/test/repo',
version_id: ancestorVersionId
});
// 3. Crawl returns ONLY changed.md (unchanged.md is absent — differential only).
const mockLocalCrawl = vi.fn().mockResolvedValue({
files: [
{
path: 'changed.md',
content: '# Changed\n\nThis file was added.',
sha: 'sha-changed',
language: 'markdown'
}
],
totalFiles: 1,
skippedFiles: 0,
branch: 'main',
commitSha: 'abc'
});
// 4. Mock buildDifferentialPlan to return a plan with the two paths.
const mockPlan = {
ancestorVersionId,
ancestorTag: 'v1.0.0',
changedPaths: new Set(['changed.md']),
deletedPaths: new Set<string>(),
unchangedPaths: new Set(['unchanged.md'])
};
const spy = vi.spyOn(diffStrategy, 'buildDifferentialPlan').mockResolvedValueOnce(mockPlan);
const pipeline = new IndexingPipeline(
db,
vi.fn() as never,
{ crawl: mockLocalCrawl } as never,
null
);
// 5. Run pipeline for the target version job.
const jobId = insertJob(db, {
repository_id: '/test/repo',
version_id: targetVersionId,
status: 'queued'
});
const job = db.prepare(`SELECT * FROM indexing_jobs WHERE id = ?`).get(jobId) as never;
await pipeline.run(job);
spy.mockRestore();
// 6. Assert job completed and both docs exist under the target version.
const finalJob = db.prepare(`SELECT status FROM indexing_jobs WHERE id = ?`).get(jobId) as {
status: string;
};
expect(finalJob.status).toBe('done');
const targetDocs = db
.prepare(`SELECT file_path FROM documents WHERE version_id = ?`)
.all(targetVersionId) as { file_path: string }[];
const filePaths = targetDocs.map((d) => d.file_path);
// unchanged.md was cloned and must NOT have been deleted by computeDiff.
expect(filePaths).toContain('unchanged.md');
// changed.md was crawled and indexed in this run.
expect(filePaths).toContain('changed.md');
});
});

View File

@@ -15,16 +15,27 @@
import { createHash, randomUUID } from 'node:crypto';
import type Database from 'better-sqlite3';
import type { Document, NewDocument, NewSnippet } from '$lib/types';
import type { Document, NewDocument, NewSnippet, TrueRefConfig, IndexingStage } from '$lib/types';
import type { crawl as GithubCrawlFn } from '$lib/server/crawler/github.crawler.js';
import type { LocalCrawler } from '$lib/server/crawler/local.crawler.js';
import type { EmbeddingService } from '$lib/server/embeddings/embedding.service.js';
import { RepositoryMapper } from '$lib/server/mappers/repository.mapper.js';
import { IndexingJob } from '$lib/server/models/indexing-job.js';
import { Repository, RepositoryEntity } from '$lib/server/models/repository.js';
import { SqliteVecStore } from '$lib/server/search/sqlite-vec.store.js';
import { resolveConfig, type ParsedConfig } from '$lib/server/config/config-parser.js';
import { parseFile } from '$lib/server/parser/index.js';
import { computeTrustScore } from '$lib/server/search/trust-score.js';
import { computeDiff } from './diff.js';
import { buildDifferentialPlan, type DifferentialPlan } from './differential-strategy.js';
import {
cloneFromAncestor as cloneFromAncestorInDatabase,
replaceSnippets as replaceSnippetsInDatabase,
updateRepo as updateRepoInDatabase,
updateVersion as updateVersionInDatabase,
type CloneFromAncestorRequest
} from './write-operations.js';
import type { SerializedFields } from './worker-types.js';
// ---------------------------------------------------------------------------
// Progress calculation
@@ -61,18 +72,47 @@ function sha256(content: string): string {
// ---------------------------------------------------------------------------
export class IndexingPipeline {
private readonly sqliteVecStore: SqliteVecStore;
constructor(
private readonly db: Database.Database,
private readonly githubCrawl: typeof GithubCrawlFn,
private readonly localCrawler: LocalCrawler,
private readonly embeddingService: EmbeddingService | null
) {}
private readonly embeddingService: EmbeddingService | null,
private readonly writeDelegate?: {
persistJobUpdates?: boolean;
replaceSnippets?: (
changedDocIds: string[],
newDocuments: NewDocument[],
newSnippets: NewSnippet[]
) => Promise<void>;
cloneFromAncestor?: (request: CloneFromAncestorRequest) => Promise<void>;
updateRepo?: (repositoryId: string, fields: SerializedFields) => Promise<void>;
updateVersion?: (versionId: string, fields: SerializedFields) => Promise<void>;
upsertRepoConfig?: (
repositoryId: string,
versionId: string | null,
rules: string[]
) => Promise<void>;
}
) {
this.sqliteVecStore = new SqliteVecStore(db);
}
// -------------------------------------------------------------------------
// Public — run a job end to end
// -------------------------------------------------------------------------
async run(job: IndexingJob): Promise<void> {
async run(
job: IndexingJob,
onStageChange?: (
stage: IndexingStage,
detail?: string,
progress?: number,
processedFiles?: number,
totalFiles?: number
) => void
): Promise<void> {
// better-sqlite3 raw queries return snake_case keys; Drizzle types use camelCase.
// Accept both so the pipeline works when called from raw SQL contexts.
const raw = job as unknown as Record<string, unknown>;
@@ -82,6 +122,18 @@ export class IndexingPipeline {
// Rebuild a normalised job view for the rest of this method.
const normJob = { ...job, repositoryId, versionId };
// Helper to report stage transitions and invoke optional callback.
const reportStage = (
stage: IndexingStage,
detail?: string,
progress?: number,
processed?: number,
total?: number
) => {
this.updateJob(job.id, { stage, stageDetail: detail ?? null });
onStageChange?.(stage, detail, progress, processed, total);
};
this.updateJob(job.id, { status: 'running', startedAt: Math.floor(Date.now() / 1000) });
try {
@@ -89,19 +141,104 @@ export class IndexingPipeline {
if (!repo) throw new Error(`Repository ${repositoryId} not found`);
// Mark repo as actively indexing.
this.updateRepo(repo.id, { state: 'indexing' });
await this.updateRepo(repo.id, { state: 'indexing' });
if (normJob.versionId) {
await this.updateVersion(normJob.versionId, { state: 'indexing' });
}
const versionTag = normJob.versionId ? this.getVersionTag(normJob.versionId) : undefined;
// ---- Stage 0: Differential strategy (TRUEREF-0021) ----------------------
// When indexing a tagged version, check if we can inherit unchanged files
// from an already-indexed ancestor version instead of crawling everything.
let differentialPlan: DifferentialPlan | null = null;
if (normJob.versionId && versionTag) {
reportStage('differential');
differentialPlan = await buildDifferentialPlan({
repo,
targetTag: versionTag,
db: this.db
}).catch((err) => {
console.warn(
`[IndexingPipeline] Differential plan failed, falling back to full crawl: ${err instanceof Error ? err.message : String(err)}`
);
return null;
});
}
// If a differential plan exists, clone unchanged files from ancestor.
if (differentialPlan && differentialPlan.unchangedPaths.size > 0) {
reportStage('cloning');
await this.cloneFromAncestor({
ancestorVersionId: differentialPlan.ancestorVersionId,
targetVersionId: normJob.versionId!,
repositoryId: repo.id,
unchangedPaths: [...differentialPlan.unchangedPaths]
});
console.info(
`[IndexingPipeline] Differential indexing: cloned ${differentialPlan.unchangedPaths.size} unchanged files from ${differentialPlan.ancestorTag}`
);
}
// ---- Stage 1: Crawl -------------------------------------------------
const crawlResult = await this.crawl(repo);
const totalFiles = crawlResult.totalFiles;
// Pass changedPaths as allowlist so crawl only fetches/returns changed files.
reportStage('crawling');
const crawlAllowedPaths = differentialPlan ? differentialPlan.changedPaths : undefined;
const crawlResult = await this.crawl(repo, versionTag, crawlAllowedPaths);
// Resolve trueref.json / context7.json configuration.
// Prefer the pre-parsed config carried in the CrawlResult (set by
// LocalCrawler so it is available even when a `folders` allowlist
// excludes the repo root and trueref.json never appears in files[]).
// Fall back to locating the file in crawlResult.files for GitHub crawls
// which do not yet populate CrawlResult.config.
let parsedConfig: ReturnType<typeof resolveConfig> | null = null;
if (crawlResult.config) {
// Config was pre-parsed by the crawler — wrap it in a ParsedConfig
// shell so the rest of the pipeline can use it uniformly.
parsedConfig = {
config: crawlResult.config,
source: 'trueref.json',
warnings: []
} satisfies ParsedConfig;
} else {
const configFile = crawlResult.files.find(
(f) => f.path === 'trueref.json' || f.path === 'context7.json'
);
parsedConfig = configFile
? resolveConfig([{ filename: configFile.path, content: configFile.content }])
: null;
}
const excludeFiles: string[] = parsedConfig?.config.excludeFiles ?? [];
// Filter out excluded files before diff computation.
const filteredFiles =
excludeFiles.length > 0
? crawlResult.files.filter(
(f) =>
!excludeFiles.some((pattern) =>
IndexingPipeline.matchesExcludePattern(f.path, pattern)
)
)
: crawlResult.files;
const totalFiles = filteredFiles.length;
this.updateJob(job.id, { totalFiles });
// ---- Stage 2: Parse & diff ------------------------------------------
// Load all existing documents for this repo so computeDiff can
// classify every crawled file and detect deletions.
const existingDocs = this.getExistingDocuments(repo.id, normJob.versionId);
const diff = computeDiff(crawlResult.files, existingDocs);
// Exclude files that were cloned from the ancestor — they are not candidates
// for deletion or re-processing (computeDiff must not see them in existingDocs).
const clonedPaths = differentialPlan?.unchangedPaths ?? new Set<string>();
const existingDocsForDiff =
clonedPaths.size > 0
? existingDocs.filter((d) => !clonedPaths.has(d.filePath))
: existingDocs;
const diff = computeDiff(filteredFiles, existingDocsForDiff);
// Accumulate new documents/snippets; skip unchanged files.
const newDocuments: NewDocument[] = [];
@@ -110,11 +247,11 @@ export class IndexingPipeline {
// Schedule stale documents (modified + deleted) for deletion.
for (const file of diff.modified) {
const existing = existingDocs.find((d) => d.filePath === file.path);
const existing = existingDocsForDiff.find((d) => d.filePath === file.path);
if (existing) changedDocIds.push(existing.id);
}
for (const filePath of diff.deleted) {
const existing = existingDocs.find((d) => d.filePath === filePath);
const existing = existingDocsForDiff.find((d) => d.filePath === filePath);
if (existing) changedDocIds.push(existing.id);
}
@@ -136,7 +273,21 @@ export class IndexingPipeline {
this.updateJob(job.id, { processedFiles, progress: initialProgress });
}
// Yield the event loop and flush progress every N files.
// Lower = more responsive UI; higher = less overhead.
const YIELD_EVERY = 20;
reportStage('parsing', `0 / ${totalFiles} files`);
for (const [i, file] of filesToProcess.entries()) {
// Yield the Node.js event loop periodically so the HTTP server can
// handle incoming requests (navigation, polling) between file parses.
// Without this, the synchronous parse + SQLite work blocks the thread
// entirely and the UI becomes unresponsive during indexing.
if (i > 0 && i % YIELD_EVERY === 0) {
await new Promise<void>((resolve) => setImmediate(resolve));
}
const checksum = file.sha || sha256(file.content);
// Create new document record.
@@ -168,8 +319,11 @@ export class IndexingPipeline {
newDocuments.push(newDoc);
newSnippets.push(...snippets);
// Count ALL files (including skipped unchanged ones) in progress.
// Write progress to the DB only on yield boundaries or the final file.
// Avoids a synchronous SQLite UPDATE on every single iteration.
const totalProcessed = diff.unchanged.length + i + 1;
const isLast = i === filesToProcess.length - 1;
if (isLast || i % YIELD_EVERY === YIELD_EVERY - 1) {
const progress = calculateProgress(
totalProcessed,
totalFiles,
@@ -178,16 +332,26 @@ export class IndexingPipeline {
this.embeddingService !== null
);
this.updateJob(job.id, { processedFiles: totalProcessed, progress });
reportStage(
'parsing',
`${totalProcessed} / ${totalFiles} files`,
progress,
totalProcessed,
totalFiles
);
}
}
// After the loop processedFiles should reflect the full count.
processedFiles = diff.unchanged.length + filesToProcess.length;
// ---- Stage 3: Atomic replacement ------------------------------------
this.replaceSnippets(repo.id, changedDocIds, newDocuments, newSnippets);
reportStage('storing');
await this.replaceSnippets(repo.id, changedDocIds, newDocuments, newSnippets);
// ---- Stage 4: Embeddings (if provider is configured) ----------------
if (this.embeddingService) {
reportStage('embedding');
const snippetIds = this.embeddingService.findSnippetIdsMissingEmbeddings(
repo.id,
normJob.versionId
@@ -221,7 +385,7 @@ export class IndexingPipeline {
state: 'indexed'
});
this.updateRepo(repo.id, {
await this.updateRepo(repo.id, {
state: 'indexed',
totalSnippets: stats.totalSnippets,
totalTokens: stats.totalTokens,
@@ -229,6 +393,29 @@ export class IndexingPipeline {
lastIndexedAt: Math.floor(Date.now() / 1000)
});
if (normJob.versionId) {
const versionStats = this.computeVersionStats(normJob.versionId);
await this.updateVersion(normJob.versionId, {
state: 'indexed',
totalSnippets: versionStats.totalSnippets,
indexedAt: Math.floor(Date.now() / 1000)
});
}
// ---- Stage 6: Persist rules from config ----------------------------
if (parsedConfig?.config.rules?.length) {
if (!normJob.versionId) {
// Main-branch job: write the repo-wide entry only.
await this.upsertRepoConfig(repo.id, null, parsedConfig.config.rules);
} else {
// Version job: write only the version-specific entry.
// Writing to the NULL row here would overwrite repo-wide rules
// with whatever the last-indexed version happened to carry.
await this.upsertRepoConfig(repo.id, normJob.versionId, parsedConfig.config.rules);
}
}
reportStage('done');
this.updateJob(job.id, {
status: 'done',
progress: 100,
@@ -238,6 +425,7 @@ export class IndexingPipeline {
const message = error instanceof Error ? error.message : String(error);
console.error(`[IndexingPipeline] Job ${job.id} failed: ${message}`);
reportStage('failed');
this.updateJob(job.id, {
status: 'failed',
error: message,
@@ -245,7 +433,10 @@ export class IndexingPipeline {
});
// Restore repo to error state but preserve any existing indexed data.
this.updateRepo(repositoryId, { state: 'error' });
await this.updateRepo(repositoryId, { state: 'error' });
if (normJob.versionId) {
await this.updateVersion(normJob.versionId, { state: 'error' });
}
throw error;
}
@@ -255,9 +446,15 @@ export class IndexingPipeline {
// Private — crawl
// -------------------------------------------------------------------------
private async crawl(repo: Repository): Promise<{
private async crawl(
repo: Repository,
ref?: string,
allowedPaths?: Set<string>
): Promise<{
files: Array<{ path: string; content: string; sha: string; size: number; language: string }>;
totalFiles: number;
/** Pre-parsed trueref.json / context7.json, or undefined when absent. */
config?: TrueRefConfig;
}> {
if (repo.source === 'github') {
// Parse owner/repo from the canonical ID: "/owner/repo"
@@ -272,97 +469,93 @@ export class IndexingPipeline {
const result = await this.githubCrawl({
owner,
repo: repoName,
ref: repo.branch ?? undefined,
ref: ref ?? repo.branch ?? undefined,
token: repo.githubToken ?? undefined
});
return { files: result.files, totalFiles: result.totalFiles };
// Apply allowedPaths filter for differential indexing.
const githubFinalFiles =
allowedPaths && allowedPaths.size > 0
? result.files.filter((f) => allowedPaths.has(f.path))
: result.files;
return { files: githubFinalFiles, totalFiles: result.totalFiles };
} else {
// Local filesystem crawl.
const result = await this.localCrawler.crawl({
rootPath: repo.sourceUrl,
ref: repo.branch !== 'main' ? (repo.branch ?? undefined) : undefined
ref: ref ?? (repo.branch !== 'main' ? (repo.branch ?? undefined) : undefined)
});
return { files: result.files, totalFiles: result.totalFiles };
// Apply allowedPaths filter for differential indexing.
const localFinalFiles =
allowedPaths && allowedPaths.size > 0
? result.files.filter((f) => allowedPaths.has(f.path))
: result.files;
return { files: localFinalFiles, totalFiles: result.totalFiles, config: result.config };
}
}
private getVersionTag(versionId: string): string | undefined {
const row = this.db
.prepare<[string], { tag: string }>(`SELECT tag FROM repository_versions WHERE id = ?`)
.get(versionId);
return row?.tag;
}
// -------------------------------------------------------------------------
// Private — differential clone (TRUEREF-0021)
// -------------------------------------------------------------------------
/**
* Clone documents, snippets, and embeddings from an ancestor version into
* the target version for all unchanged file paths.
*
* Runs in a single SQLite transaction for atomicity.
*/
private async cloneFromAncestor(
requestOrAncestorVersionId: CloneFromAncestorRequest | string,
targetVersionId?: string,
repositoryId?: string,
unchangedPaths?: Set<string>
): Promise<void> {
const request: CloneFromAncestorRequest =
typeof requestOrAncestorVersionId === 'string'
? {
ancestorVersionId: requestOrAncestorVersionId,
targetVersionId: targetVersionId!,
repositoryId: repositoryId!,
unchangedPaths: [...(unchangedPaths ?? new Set<string>())]
}
: requestOrAncestorVersionId;
if (request.unchangedPaths.length === 0) {
return;
}
if (this.writeDelegate?.cloneFromAncestor) {
await this.writeDelegate.cloneFromAncestor(request);
return;
}
cloneFromAncestorInDatabase(this.db, request);
}
// -------------------------------------------------------------------------
// Private — atomic snippet replacement
// -------------------------------------------------------------------------
private replaceSnippets(
private async replaceSnippets(
_repositoryId: string,
changedDocIds: string[],
newDocuments: NewDocument[],
newSnippets: NewSnippet[]
): void {
const insertDoc = this.db.prepare(
`INSERT INTO documents
(id, repository_id, version_id, file_path, title, language,
token_count, checksum, indexed_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)`
);
const insertSnippet = this.db.prepare(
`INSERT INTO snippets
(id, document_id, repository_id, version_id, type, title,
content, language, breadcrumb, token_count, created_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
);
this.db.transaction(() => {
// Delete stale documents (cascade deletes their snippets via FK).
if (changedDocIds.length > 0) {
const placeholders = changedDocIds.map(() => '?').join(',');
this.db
.prepare(`DELETE FROM documents WHERE id IN (${placeholders})`)
.run(...changedDocIds);
): Promise<void> {
if (this.writeDelegate?.replaceSnippets) {
await this.writeDelegate.replaceSnippets(changedDocIds, newDocuments, newSnippets);
return;
}
// Insert new documents.
for (const doc of newDocuments) {
const indexedAtSeconds =
doc.indexedAt instanceof Date
? Math.floor(doc.indexedAt.getTime() / 1000)
: Math.floor(Date.now() / 1000);
insertDoc.run(
doc.id,
doc.repositoryId,
doc.versionId ?? null,
doc.filePath,
doc.title ?? null,
doc.language ?? null,
doc.tokenCount ?? 0,
doc.checksum,
indexedAtSeconds
);
}
// Insert new snippets.
for (const snippet of newSnippets) {
const createdAtSeconds =
snippet.createdAt instanceof Date
? Math.floor(snippet.createdAt.getTime() / 1000)
: Math.floor(Date.now() / 1000);
insertSnippet.run(
snippet.id,
snippet.documentId,
snippet.repositoryId,
snippet.versionId ?? null,
snippet.type,
snippet.title ?? null,
snippet.content,
snippet.language ?? null,
snippet.breadcrumb ?? null,
snippet.tokenCount ?? 0,
createdAtSeconds
);
}
})();
replaceSnippetsInDatabase(this.db, changedDocIds, newDocuments, newSnippets);
}
// -------------------------------------------------------------------------
@@ -384,6 +577,17 @@ export class IndexingPipeline {
};
}
private computeVersionStats(versionId: string): { totalSnippets: number } {
const row = this.db
.prepare<
[string],
{ total_snippets: number }
>(`SELECT COUNT(*) as total_snippets FROM snippets WHERE version_id = ?`)
.get(versionId);
return { totalSnippets: row?.total_snippets ?? 0 };
}
// -------------------------------------------------------------------------
// Private — DB helpers
// -------------------------------------------------------------------------
@@ -417,6 +621,10 @@ export class IndexingPipeline {
}
private updateJob(id: string, fields: Record<string, unknown>): void {
if (this.writeDelegate?.persistJobUpdates === false) {
return;
}
const sets = Object.keys(fields)
.map((k) => `${toSnake(k)} = ?`)
.join(', ');
@@ -424,14 +632,82 @@ export class IndexingPipeline {
this.db.prepare(`UPDATE indexing_jobs SET ${sets} WHERE id = ?`).run(...values);
}
private updateRepo(id: string, fields: Record<string, unknown>): void {
private async updateRepo(id: string, fields: SerializedFields): Promise<void> {
if (this.writeDelegate?.updateRepo) {
await this.writeDelegate.updateRepo(id, fields);
return;
}
updateRepoInDatabase(this.db, id, fields);
}
private async updateVersion(id: string, fields: SerializedFields): Promise<void> {
if (this.writeDelegate?.updateVersion) {
await this.writeDelegate.updateVersion(id, fields);
return;
}
updateVersionInDatabase(this.db, id, fields);
}
private async upsertRepoConfig(
repositoryId: string,
versionId: string | null,
rules: string[]
): Promise<void> {
if (this.writeDelegate?.upsertRepoConfig) {
await this.writeDelegate.upsertRepoConfig(repositoryId, versionId, rules);
return;
}
const now = Math.floor(Date.now() / 1000);
const allFields = { ...fields, updatedAt: now };
const sets = Object.keys(allFields)
.map((k) => `${toSnake(k)} = ?`)
.join(', ');
const values = [...Object.values(allFields), id];
this.db.prepare(`UPDATE repositories SET ${sets} WHERE id = ?`).run(...values);
// Use DELETE + INSERT because ON CONFLICT … DO UPDATE doesn't work reliably
// with partial unique indexes in all SQLite versions.
if (versionId === null) {
this.db
.prepare(`DELETE FROM repository_configs WHERE repository_id = ? AND version_id IS NULL`)
.run(repositoryId);
} else {
this.db
.prepare(`DELETE FROM repository_configs WHERE repository_id = ? AND version_id = ?`)
.run(repositoryId, versionId);
}
this.db
.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, ?, ?, ?)`
)
.run(repositoryId, versionId, JSON.stringify(rules), now);
}
// -------------------------------------------------------------------------
// Private — static helpers
// -------------------------------------------------------------------------
/**
* Returns true when `filePath` matches the given exclude `pattern`.
*
* Supported patterns:
* - Plain filename: `migration-guide.md` matches any path ending in `/migration-guide.md`
* or equal to `migration-guide.md`.
* - Glob prefix with wildcard: `docs/migration*` matches paths that start with `docs/migration`.
* - Exact path: `src/legacy/old-api.ts` matches exactly that path.
*/
private static matchesExcludePattern(filePath: string, pattern: string): boolean {
if (pattern.includes('*')) {
// Glob-style: treat everything before the '*' as a required prefix.
const prefix = pattern.slice(0, pattern.indexOf('*'));
return filePath.startsWith(prefix);
}
// No wildcard — treat as plain name or exact path.
if (!pattern.includes('/')) {
// Plain filename: match basename (path ends with /<pattern> or equals pattern).
return filePath === pattern || filePath.endsWith('/' + pattern);
}
// Contains a slash — exact path match.
return filePath === pattern;
}
}

View File

@@ -9,7 +9,7 @@
import type Database from 'better-sqlite3';
import { IndexingJobMapper } from '$lib/server/mappers/indexing-job.mapper.js';
import { IndexingJob, IndexingJobEntity } from '$lib/server/models/indexing-job.js';
import type { IndexingPipeline } from './indexing.pipeline.js';
import type { WorkerPool } from './worker-pool.js';
// ---------------------------------------------------------------------------
// SQL projection + row mapper (mirrors repository.service.ts pattern)
@@ -17,17 +17,65 @@ import type { IndexingPipeline } from './indexing.pipeline.js';
const JOB_SELECT = `SELECT * FROM indexing_jobs`;
type JobStatusFilter = IndexingJob['status'] | Array<IndexingJob['status']>;
function escapeLikePattern(value: string): string {
return value.replaceAll('\\', '\\\\').replaceAll('%', '\\%').replaceAll('_', '\\_');
}
function isSpecificRepositoryId(repositoryId: string): boolean {
return repositoryId.split('/').filter(Boolean).length >= 2;
}
function normalizeStatuses(status?: JobStatusFilter): Array<IndexingJob['status']> {
if (!status) {
return [];
}
const statuses = Array.isArray(status) ? status : [status];
return [...new Set(statuses)];
}
function buildJobFilterQuery(options?: { repositoryId?: string; status?: JobStatusFilter }): {
where: string;
params: unknown[];
} {
const conditions: string[] = [];
const params: unknown[] = [];
if (options?.repositoryId) {
if (isSpecificRepositoryId(options.repositoryId)) {
conditions.push('repository_id = ?');
params.push(options.repositoryId);
} else {
conditions.push(`(repository_id = ? OR repository_id LIKE ? ESCAPE '\\')`);
params.push(options.repositoryId, `${escapeLikePattern(options.repositoryId)}/%`);
}
}
const statuses = normalizeStatuses(options?.status);
if (statuses.length > 0) {
conditions.push(`status IN (${statuses.map(() => '?').join(', ')})`);
params.push(...statuses);
}
return {
where: conditions.length > 0 ? `WHERE ${conditions.join(' AND ')}` : '',
params
};
}
export class JobQueue {
private isRunning = false;
private pipeline: IndexingPipeline | null = null;
private workerPool: WorkerPool | null = null;
constructor(private readonly db: Database.Database) {}
/**
* Inject the pipeline dependency (avoids circular construction order).
* Inject the worker pool dependency (alternative to direct pipeline calling).
* When set, enqueue() will delegate to the pool instead of calling processNext().
*/
setPipeline(pipeline: IndexingPipeline): void {
this.pipeline = pipeline;
setWorkerPool(pool: WorkerPool): void {
this.workerPool = pool;
}
/**
@@ -50,7 +98,9 @@ export class JobQueue {
if (activeRaw) {
// Ensure the queue is draining even if enqueue was called concurrently.
if (!this.isRunning) setImmediate(() => this.processNext());
if (!this.workerPool) {
setImmediate(() => this.processNext());
}
return IndexingJobMapper.fromEntity(new IndexingJobEntity(activeRaw));
}
@@ -63,6 +113,8 @@ export class JobQueue {
progress: 0,
totalFiles: 0,
processedFiles: 0,
stage: 'queued',
stageDetail: null,
error: null,
startedAt: null,
completedAt: null,
@@ -73,8 +125,8 @@ export class JobQueue {
.prepare(
`INSERT INTO indexing_jobs
(id, repository_id, version_id, status, progress, total_files,
processed_files, error, started_at, completed_at, created_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
processed_files, stage, stage_detail, error, started_at, completed_at, created_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
)
.run(
job.id,
@@ -84,14 +136,18 @@ export class JobQueue {
job.progress,
job.totalFiles,
job.processedFiles,
job.stage,
job.stageDetail,
job.error,
job.startedAt,
job.completedAt,
now
);
// Kick off sequential processing if not already running.
if (!this.isRunning) {
// Delegate to worker pool if available, otherwise fall back to direct processing
if (this.workerPool) {
this.workerPool.enqueue(job.id, repositoryId, versionId ?? null);
} else {
setImmediate(() => this.processNext());
}
@@ -102,15 +158,15 @@ export class JobQueue {
}
/**
* Pick the oldest queued job and run it through the pipeline.
* Called recursively via setImmediate so the event loop stays unblocked.
* Pick the oldest queued job and run it through the pipeline directly.
* This is now a fallback method used only when no WorkerPool is set.
* Called via setImmediate so the event loop stays unblocked.
*/
private async processNext(): Promise<void> {
if (this.isRunning) return;
if (!this.pipeline) {
console.warn('[JobQueue] No pipeline configured — cannot process jobs.');
return;
}
// Fallback path: no worker pool configured, run directly (used by tests and dev mode)
console.warn(
'[JobQueue] Running in fallback mode (no worker pool) — direct pipeline execution.'
);
const rawJob = this.db
.prepare<[], IndexingJobEntity>(
@@ -122,26 +178,9 @@ export class JobQueue {
if (!rawJob) return;
const job = IndexingJobMapper.fromEntity(new IndexingJobEntity(rawJob));
this.isRunning = true;
try {
await this.pipeline.run(job);
} catch (err) {
// Error is logged inside pipeline.run(); no action needed here.
console.error(
`[JobQueue] Job ${job.id} failed: ${err instanceof Error ? err.message : String(err)}`
console.warn(
'[JobQueue] processNext: no pipeline or pool configured — skipping job processing'
);
} finally {
this.isRunning = false;
// Check whether another job was queued while this one ran.
const next = this.db
.prepare<[], { id: string }>(`SELECT id FROM indexing_jobs WHERE status = 'queued' LIMIT 1`)
.get();
if (next) {
setImmediate(() => this.processNext());
}
}
}
/**
@@ -157,23 +196,11 @@ export class JobQueue {
*/
listJobs(options?: {
repositoryId?: string;
status?: IndexingJob['status'];
status?: JobStatusFilter;
limit?: number;
}): IndexingJob[] {
const limit = Math.min(options?.limit ?? 20, 200);
const conditions: string[] = [];
const params: unknown[] = [];
if (options?.repositoryId) {
conditions.push('repository_id = ?');
params.push(options.repositoryId);
}
if (options?.status) {
conditions.push('status = ?');
params.push(options.status);
}
const where = conditions.length > 0 ? `WHERE ${conditions.join(' AND ')}` : '';
const { where, params } = buildJobFilterQuery(options);
const sql = `${JOB_SELECT} ${where} ORDER BY created_at DESC LIMIT ?`;
params.push(limit);
@@ -184,10 +211,21 @@ export class JobQueue {
/**
* Trigger processing of any queued jobs (e.g. after server restart).
* Safe to call multiple times; a no-op if the queue is already running.
* If a worker pool is configured, delegates to it. Otherwise falls back to direct processing.
* Safe to call multiple times.
*/
drainQueued(): void {
if (!this.isRunning) {
if (this.workerPool) {
// Delegate all queued jobs to the worker pool
const queued = this.db
.prepare<[], IndexingJobEntity>(`${JOB_SELECT} WHERE status = 'queued'`)
.all();
for (const rawJob of queued) {
const job = IndexingJobMapper.fromEntity(new IndexingJobEntity(rawJob));
this.workerPool.enqueue(job.id, job.repositoryId, job.versionId);
}
} else {
// Fallback: direct pipeline processing
setImmediate(() => this.processNext());
}
}
@@ -196,19 +234,7 @@ export class JobQueue {
* Count all jobs matching optional filters.
*/
countJobs(options?: { repositoryId?: string; status?: IndexingJob['status'] }): number {
const conditions: string[] = [];
const params: unknown[] = [];
if (options?.repositoryId) {
conditions.push('repository_id = ?');
params.push(options.repositoryId);
}
if (options?.status) {
conditions.push('status = ?');
params.push(options.status);
}
const where = conditions.length > 0 ? `WHERE ${conditions.join(' AND')}` : '';
const { where, params } = buildJobFilterQuery(options);
const sql = `SELECT COUNT(*) as n FROM indexing_jobs ${where}`;
const row = this.db.prepare<unknown[], { n: number }>(sql).get(...params);
return row?.n ?? 0;

View File

@@ -0,0 +1,197 @@
import { describe, it, expect } from 'vitest';
import { ProgressBroadcaster } from './progress-broadcaster.js';
describe('ProgressBroadcaster', () => {
it('subscribe returns a readable stream', async () => {
const broadcaster = new ProgressBroadcaster();
const stream = broadcaster.subscribe('job-1');
expect(stream).toBeInstanceOf(ReadableStream);
});
it('broadcast sends to subscribed job listeners', async () => {
const broadcaster = new ProgressBroadcaster();
const stream = broadcaster.subscribe('job-1');
const reader = stream.getReader();
broadcaster.broadcast('job-1', '/repo/1', 'progress', { stage: 'parsing', progress: 50 });
const { value } = await reader.read();
expect(value).toBeDefined();
const text = value as string;
expect(text).toContain('event: progress');
expect(text).toContain('id: 1');
expect(text).toContain('"stage":"parsing"');
expect(text).toContain('"progress":50');
reader.cancel();
});
it('broadcast sends to subscribed repository listeners', async () => {
const broadcaster = new ProgressBroadcaster();
const stream = broadcaster.subscribeRepository('/repo/1');
const reader = stream.getReader();
broadcaster.broadcast('job-1', '/repo/1', 'repo-event', { data: 'test' });
const { value } = await reader.read();
expect(value).toBeDefined();
const text = value as string;
expect(text).toContain('event: repo-event');
expect(text).toContain('"data":"test"');
reader.cancel();
});
it('broadcast sends to all subscribers', async () => {
const broadcaster = new ProgressBroadcaster();
const stream = broadcaster.subscribeAll();
const reader = stream.getReader();
broadcaster.broadcast('job-1', '/repo/1', 'global-event', { value: 42 });
const { value } = await reader.read();
expect(value).toBeDefined();
const text = value as string;
expect(text).toContain('event: global-event');
expect(text).toContain('"value":42');
reader.cancel();
});
it('getLastEvent returns cached events', () => {
const broadcaster = new ProgressBroadcaster();
broadcaster.broadcast('job-1', '/repo/1', 'event1', { msg: 'first' });
broadcaster.broadcast('job-1', '/repo/1', 'event2', { msg: 'second' });
const lastEvent = broadcaster.getLastEvent('job-1');
expect(lastEvent).toBeDefined();
expect(lastEvent?.id).toBe(2);
expect(lastEvent?.event).toBe('event2');
expect(lastEvent?.data).toBe('{"msg":"second"}');
});
it('getLastEvent returns null for unknown job', () => {
const broadcaster = new ProgressBroadcaster();
const lastEvent = broadcaster.getLastEvent('unknown-job');
expect(lastEvent).toBeNull();
});
it('cleanup removes subscribers and cache', async () => {
const broadcaster = new ProgressBroadcaster();
const stream = broadcaster.subscribe('job-1');
const reader = stream.getReader();
broadcaster.broadcast('job-1', '/repo/1', 'event', { data: 'test' });
const lastEventBefore = broadcaster.getLastEvent('job-1');
expect(lastEventBefore).toBeDefined();
broadcaster.cleanup('job-1');
const lastEventAfter = broadcaster.getLastEvent('job-1');
expect(lastEventAfter).toBeNull();
reader.cancel();
});
it('increments event IDs per job', () => {
const broadcaster = new ProgressBroadcaster();
broadcaster.broadcast('job-1', '/repo/1', 'event1', { n: 1 });
broadcaster.broadcast('job-1', '/repo/1', 'event2', { n: 2 });
broadcaster.broadcast('job-2', '/repo/2', 'event3', { n: 3 });
expect(broadcaster.getLastEvent('job-1')?.id).toBe(2);
expect(broadcaster.getLastEvent('job-2')?.id).toBe(1);
});
it('sends reconnect event with last event ID on subscribe', async () => {
const broadcaster = new ProgressBroadcaster();
// Publish first event
broadcaster.broadcast('job-1', '/repo/1', 'progress', { value: 10 });
// Subscribe later
const stream = broadcaster.subscribe('job-1');
const reader = stream.getReader();
const { value } = await reader.read();
const text = value as string;
expect(text).toContain('event: reconnect');
expect(text).toContain('"lastEventId":1');
reader.cancel();
});
it('SSE format is correct', async () => {
const broadcaster = new ProgressBroadcaster();
const stream = broadcaster.subscribe('job-1');
const reader = stream.getReader();
broadcaster.broadcast('job-1', '/repo/1', 'test', { msg: 'hello' });
const { value } = await reader.read();
const text = value as string;
// SSE format: id: N\nevent: name\ndata: json\n\n
expect(text).toMatch(/^id: \d+\n/);
expect(text).toMatch(/event: test\n/);
expect(text).toMatch(/data: {[^}]+}\n\n$/);
expect(text.endsWith('\n\n')).toBe(true);
reader.cancel();
});
it('handles multiple concurrent subscribers', async () => {
const broadcaster = new ProgressBroadcaster();
const stream1 = broadcaster.subscribe('job-1');
const stream2 = broadcaster.subscribe('job-1');
const reader1 = stream1.getReader();
const reader2 = stream2.getReader();
broadcaster.broadcast('job-1', '/repo/1', 'event', { data: 'test' });
const { value: value1 } = await reader1.read();
const { value: value2 } = await reader2.read();
expect(value1).toBeDefined();
expect(value2).toBeDefined();
reader1.cancel();
reader2.cancel();
});
it('broadcastWorkerStatus sends worker-status events to global subscribers', async () => {
const broadcaster = new ProgressBroadcaster();
const stream = broadcaster.subscribeAll();
const reader = stream.getReader();
broadcaster.broadcastWorkerStatus({
concurrency: 2,
active: 1,
idle: 1,
workers: [
{ index: 0, state: 'running', jobId: 'job-1', repositoryId: '/repo/1', versionId: null }
]
});
const { value } = await reader.read();
const text = value as string;
expect(text).toContain('event: worker-status');
expect(text).toContain('"active":1');
reader.cancel();
});
});

View File

@@ -0,0 +1,198 @@
export interface SSEEvent {
id: number;
event: string;
data: string;
}
export class ProgressBroadcaster {
private jobSubscribers = new Map<string, Set<ReadableStreamDefaultController<string>>>();
private repoSubscribers = new Map<string, Set<ReadableStreamDefaultController<string>>>();
private allSubscribers = new Set<ReadableStreamDefaultController<string>>();
private lastEventCache = new Map<string, SSEEvent>();
private eventCounters = new Map<string, number>();
private globalEventCounter = 0;
subscribe(jobId: string): ReadableStream<string> {
return new ReadableStream({
start: (controller: ReadableStreamDefaultController<string>) => {
if (!this.jobSubscribers.has(jobId)) {
this.jobSubscribers.set(jobId, new Set());
}
this.jobSubscribers.get(jobId)!.add(controller);
// Send last event on reconnect if available
const lastEvent = this.getLastEvent(jobId);
if (lastEvent) {
controller.enqueue(`event: reconnect\ndata: {"lastEventId":${lastEvent.id}}\n\n`);
}
},
cancel: () => {
const set = this.jobSubscribers.get(jobId);
if (set) {
set.forEach((controller) => {
try {
controller.close();
} catch {
// Controller already closed
}
});
set.clear();
}
}
});
}
subscribeRepository(repositoryId: string): ReadableStream<string> {
return new ReadableStream({
start: (controller: ReadableStreamDefaultController<string>) => {
if (!this.repoSubscribers.has(repositoryId)) {
this.repoSubscribers.set(repositoryId, new Set());
}
this.repoSubscribers.get(repositoryId)!.add(controller);
},
cancel: () => {
const set = this.repoSubscribers.get(repositoryId);
if (set) {
set.forEach((controller) => {
try {
controller.close();
} catch {
// Controller already closed
}
});
set.clear();
}
}
});
}
subscribeAll(): ReadableStream<string> {
return new ReadableStream({
start: (controller: ReadableStreamDefaultController<string>) => {
this.allSubscribers.add(controller);
},
cancel: () => {
this.allSubscribers.forEach((controller) => {
try {
controller.close();
} catch {
// Controller already closed
}
});
this.allSubscribers.clear();
}
});
}
broadcast(jobId: string, repositoryId: string, eventName: string, data: object): void {
// Increment event counter for this job
const counter = (this.eventCounters.get(jobId) ?? 0) + 1;
this.eventCounters.set(jobId, counter);
// Create SSE event
const event: SSEEvent = {
id: counter,
event: eventName,
data: JSON.stringify(data)
};
// Cache the event
this.lastEventCache.set(jobId, event);
// Format as SSE
const sse = this.formatSSE(event);
// Write to job-specific subscribers
const jobSubscribers = this.jobSubscribers.get(jobId);
if (jobSubscribers) {
for (const controller of jobSubscribers) {
try {
controller.enqueue(sse);
} catch {
// Controller might be closed or errored
}
}
}
// Write to repo-specific subscribers
const repoSubscribers = this.repoSubscribers.get(repositoryId);
if (repoSubscribers) {
for (const controller of repoSubscribers) {
try {
controller.enqueue(sse);
} catch {
// Controller might be closed or errored
}
}
}
// Write to all-subscribers
for (const controller of this.allSubscribers) {
try {
controller.enqueue(sse);
} catch {
// Controller might be closed or errored
}
}
}
broadcastWorkerStatus(data: object): void {
this.globalEventCounter += 1;
const event: SSEEvent = {
id: this.globalEventCounter,
event: 'worker-status',
data: JSON.stringify(data)
};
const sse = this.formatSSE(event);
for (const controller of this.allSubscribers) {
try {
controller.enqueue(sse);
} catch {
// Controller might be closed or errored
}
}
}
getLastEvent(jobId: string): SSEEvent | null {
return this.lastEventCache.get(jobId) ?? null;
}
cleanup(jobId: string): void {
// Close and remove job subscribers
const jobSubscribers = this.jobSubscribers.get(jobId);
if (jobSubscribers) {
for (const controller of jobSubscribers) {
try {
controller.close();
} catch {
// Already closed
}
}
jobSubscribers.clear();
this.jobSubscribers.delete(jobId);
}
// Remove cache and counter
this.lastEventCache.delete(jobId);
this.eventCounters.delete(jobId);
}
private formatSSE(event: SSEEvent): string {
return `id: ${event.id}\nevent: ${event.event}\ndata: ${event.data}\n\n`;
}
}
// Singleton instance
let broadcaster: ProgressBroadcaster | null = null;
export function initBroadcaster(): ProgressBroadcaster {
if (!broadcaster) {
broadcaster = new ProgressBroadcaster();
}
return broadcaster;
}
export function getBroadcaster(): ProgressBroadcaster | null {
return broadcaster;
}

View File

@@ -15,6 +15,12 @@ import { crawl as githubCrawl } from '$lib/server/crawler/github.crawler.js';
import { LocalCrawler } from '$lib/server/crawler/local.crawler.js';
import { IndexingPipeline } from './indexing.pipeline.js';
import { JobQueue } from './job-queue.js';
import { WorkerPool } from './worker-pool.js';
import { initBroadcaster } from './progress-broadcaster.js';
import type { ProgressBroadcaster } from './progress-broadcaster.js';
import path from 'node:path';
import { existsSync } from 'node:fs';
import { fileURLToPath } from 'node:url';
// ---------------------------------------------------------------------------
// Stale-job recovery
@@ -49,6 +55,23 @@ export function recoverStaleJobs(db: Database.Database): void {
let _queue: JobQueue | null = null;
let _pipeline: IndexingPipeline | null = null;
let _pool: WorkerPool | null = null;
let _broadcaster: ProgressBroadcaster | null = null;
function resolveWorkerScript(...segments: string[]): string {
const candidates = [
path.resolve(process.cwd(), ...segments),
path.resolve(path.dirname(fileURLToPath(import.meta.url)), '../../../../', ...segments)
];
for (const candidate of candidates) {
if (existsSync(candidate)) {
return candidate;
}
}
return candidates[0];
}
/**
* Initialise (or return the existing) JobQueue + IndexingPipeline pair.
@@ -59,11 +82,13 @@ let _pipeline: IndexingPipeline | null = null;
*
* @param db - Raw better-sqlite3 Database instance.
* @param embeddingService - Optional embedding service; pass null to disable.
* @param options - Optional configuration for worker pool (concurrency, dbPath).
* @returns An object with `queue` and `pipeline` accessors.
*/
export function initializePipeline(
db: Database.Database,
embeddingService: EmbeddingService | null = null
embeddingService: EmbeddingService | null = null,
options?: { concurrency?: number; dbPath?: string }
): { queue: JobQueue; pipeline: IndexingPipeline } {
if (_queue && _pipeline) {
return { queue: _queue, pipeline: _pipeline };
@@ -76,7 +101,78 @@ export function initializePipeline(
const pipeline = new IndexingPipeline(db, githubCrawl, localCrawler, embeddingService);
const queue = new JobQueue(db);
queue.setPipeline(pipeline);
// If worker pool options are provided, create and wire the pool
if (options?.dbPath) {
_broadcaster = initBroadcaster();
const getRepositoryIdForJob = (jobId: string): string => {
const row = db
.prepare<
[string],
{ repository_id: string }
>(`SELECT repository_id FROM indexing_jobs WHERE id = ?`)
.get(jobId);
return row?.repository_id ?? '';
};
const workerScript = resolveWorkerScript('build', 'workers', 'worker-entry.mjs');
const embedWorkerScript = resolveWorkerScript('build', 'workers', 'embed-worker-entry.mjs');
const writeWorkerScript = resolveWorkerScript('build', 'workers', 'write-worker-entry.mjs');
try {
_pool = new WorkerPool({
concurrency: options.concurrency ?? 2,
workerScript,
embedWorkerScript,
writeWorkerScript,
dbPath: options.dbPath,
onProgress: (jobId, msg) => {
// Broadcast progress event
if (_broadcaster) {
_broadcaster.broadcast(jobId, getRepositoryIdForJob(jobId), 'job-progress', {
...msg,
status: 'running'
});
}
},
onJobDone: (jobId: string) => {
// Broadcast done event
if (_broadcaster) {
_broadcaster.broadcast(jobId, getRepositoryIdForJob(jobId), 'job-done', {
jobId,
status: 'done'
});
}
},
onJobFailed: (jobId: string, error: string) => {
// Broadcast failed event
if (_broadcaster) {
_broadcaster.broadcast(jobId, getRepositoryIdForJob(jobId), 'job-failed', {
jobId,
status: 'failed',
error
});
}
},
onEmbedDone: (jobId: string) => {
console.log('[WorkerPool] Embedding complete for job:', jobId);
},
onEmbedFailed: (jobId: string, error: string) => {
console.error('[WorkerPool] Embedding failed for job:', jobId, error);
},
onWorkerStatus: (status) => {
_broadcaster?.broadcastWorkerStatus(status);
}
});
queue.setWorkerPool(_pool);
} catch (err) {
console.warn(
'[startup] Failed to create WorkerPool (worker scripts may not exist yet):',
err instanceof Error ? err.message : String(err)
);
}
}
_queue = queue;
_pipeline = pipeline;
@@ -87,11 +183,7 @@ export function initializePipeline(
.prepare<[], { id: string }>(`SELECT id FROM indexing_jobs WHERE status = 'queued' LIMIT 1`)
.get();
if (pending) {
// Re-enqueue logic is handled inside JobQueue.processNext; we trigger
// it by asking the queue for any job that is already queued.
// The simplest way is to call enqueue on a repo that has a queued job —
// but since enqueue deduplicates, we just trigger processNext directly.
// We do this via a public helper to avoid exposing private methods.
// Re-enqueue logic is handled inside JobQueue.drainQueued; we trigger it here.
queue.drainQueued();
}
});
@@ -100,23 +192,39 @@ export function initializePipeline(
}
/**
* Return the current JobQueue singleton, or null if not yet initialised.
* Accessor for the JobQueue singleton.
*/
export function getQueue(): JobQueue | null {
return _queue;
}
/**
* Return the current IndexingPipeline singleton, or null if not yet initialised.
* Accessor for the IndexingPipeline singleton.
*/
export function getPipeline(): IndexingPipeline | null {
return _pipeline;
}
/**
* Reset singletons — intended for use in tests only.
* Accessor for the WorkerPool singleton.
*/
export function getPool(): WorkerPool | null {
return _pool;
}
/**
* Accessor for the ProgressBroadcaster singleton.
*/
export function getBroadcaster(): ProgressBroadcaster | null {
return _broadcaster;
}
/**
* Reset singletons (for testing).
*/
export function _resetSingletons(): void {
_queue = null;
_pipeline = null;
_pool = null;
_broadcaster = null;
}

View File

@@ -0,0 +1,239 @@
import { workerData, parentPort } from 'node:worker_threads';
import Database from 'better-sqlite3';
import { IndexingPipeline } from './indexing.pipeline.js';
import { crawl as githubCrawl } from '$lib/server/crawler/github.crawler.js';
import { LocalCrawler } from '$lib/server/crawler/local.crawler.js';
import { IndexingJobMapper } from '$lib/server/mappers/indexing-job.mapper.js';
import { IndexingJobEntity, type IndexingJobEntityProps } from '$lib/server/models/indexing-job.js';
import { applySqlitePragmas } from '$lib/server/db/connection.js';
import type {
ParseWorkerRequest,
ParseWorkerResponse,
SerializedDocument,
SerializedSnippet,
WorkerInitData
} from './worker-types.js';
import type { IndexingStage } from '$lib/types.js';
const { dbPath } = workerData as WorkerInitData;
const db = new Database(dbPath);
applySqlitePragmas(db);
let pendingWrite: {
jobId: string;
resolve: () => void;
reject: (error: Error) => void;
} | null = null;
function serializeDocument(document: {
id: string;
repositoryId: string;
versionId?: string | null;
filePath: string;
title?: string | null;
language?: string | null;
tokenCount?: number | null;
checksum: string;
indexedAt: Date;
}): SerializedDocument {
return {
id: document.id,
repositoryId: document.repositoryId,
versionId: document.versionId ?? null,
filePath: document.filePath,
title: document.title ?? null,
language: document.language ?? null,
tokenCount: document.tokenCount ?? 0,
checksum: document.checksum,
indexedAt: Math.floor(document.indexedAt.getTime() / 1000)
};
}
function serializeSnippet(snippet: {
id: string;
documentId: string;
repositoryId: string;
versionId?: string | null;
type: 'code' | 'info';
title?: string | null;
content: string;
language?: string | null;
breadcrumb?: string | null;
tokenCount?: number | null;
createdAt: Date;
}): SerializedSnippet {
return {
id: snippet.id,
documentId: snippet.documentId,
repositoryId: snippet.repositoryId,
versionId: snippet.versionId ?? null,
type: snippet.type,
title: snippet.title ?? null,
content: snippet.content,
language: snippet.language ?? null,
breadcrumb: snippet.breadcrumb ?? null,
tokenCount: snippet.tokenCount ?? 0,
createdAt: Math.floor(snippet.createdAt.getTime() / 1000)
};
}
function requestWrite(
message: Extract<
ParseWorkerResponse,
{
type:
| 'write_replace'
| 'write_clone'
| 'write_repo_update'
| 'write_version_update'
| 'write_repo_config';
}
>
): Promise<void> {
if (pendingWrite) {
return Promise.reject(new Error(`write request already in flight for ${pendingWrite.jobId}`));
}
return new Promise((resolve, reject) => {
pendingWrite = {
jobId: message.jobId,
resolve: () => {
pendingWrite = null;
resolve();
},
reject: (error: Error) => {
pendingWrite = null;
reject(error);
}
};
parentPort!.postMessage(message);
});
}
const pipeline = new IndexingPipeline(db, githubCrawl, new LocalCrawler(), null, {
persistJobUpdates: false,
replaceSnippets: async (changedDocIds, newDocuments, newSnippets) => {
await requestWrite({
type: 'write_replace',
jobId: currentJobId ?? 'unknown',
changedDocIds,
documents: newDocuments.map(serializeDocument),
snippets: newSnippets.map(serializeSnippet)
});
},
cloneFromAncestor: async (request) => {
await requestWrite({
type: 'write_clone',
jobId: currentJobId ?? 'unknown',
ancestorVersionId: request.ancestorVersionId,
targetVersionId: request.targetVersionId,
repositoryId: request.repositoryId,
unchangedPaths: request.unchangedPaths
});
},
updateRepo: async (repositoryId, fields) => {
await requestWrite({
type: 'write_repo_update',
jobId: currentJobId ?? 'unknown',
repositoryId,
fields
});
},
updateVersion: async (versionId, fields) => {
await requestWrite({
type: 'write_version_update',
jobId: currentJobId ?? 'unknown',
versionId,
fields
});
},
upsertRepoConfig: async (repositoryId, versionId, rules) => {
await requestWrite({
type: 'write_repo_config',
jobId: currentJobId ?? 'unknown',
repositoryId,
versionId,
rules
});
}
});
let currentJobId: string | null = null;
parentPort!.on('message', async (msg: ParseWorkerRequest) => {
if (msg.type === 'write_ack') {
if (pendingWrite?.jobId === msg.jobId) {
pendingWrite.resolve();
}
return;
}
if (msg.type === 'write_error') {
if (pendingWrite?.jobId === msg.jobId) {
pendingWrite.reject(new Error(msg.error));
}
return;
}
if (msg.type === 'shutdown') {
db.close();
process.exit(0);
}
if (msg.type === 'run') {
currentJobId = msg.jobId;
try {
const rawJob = db.prepare('SELECT * FROM indexing_jobs WHERE id = ?').get(msg.jobId);
if (!rawJob) {
throw new Error(`Job ${msg.jobId} not found`);
}
const job = IndexingJobMapper.fromEntity(
new IndexingJobEntity(rawJob as IndexingJobEntityProps)
);
await pipeline.run(
job,
(
stage: IndexingStage,
detail?: string,
progress?: number,
processedFiles?: number,
totalFiles?: number
) => {
parentPort!.postMessage({
type: 'progress',
jobId: msg.jobId,
stage,
stageDetail: detail,
progress: progress ?? 0,
processedFiles: processedFiles ?? 0,
totalFiles: totalFiles ?? 0
} satisfies ParseWorkerResponse);
}
);
parentPort!.postMessage({
type: 'done',
jobId: msg.jobId
} satisfies ParseWorkerResponse);
} catch (err) {
parentPort!.postMessage({
type: 'failed',
jobId: msg.jobId,
error: err instanceof Error ? err.message : String(err)
} satisfies ParseWorkerResponse);
} finally {
currentJobId = null;
}
}
});
process.on('uncaughtException', (err) => {
if (currentJobId) {
parentPort!.postMessage({
type: 'failed',
jobId: currentJobId,
error: err instanceof Error ? err.message : String(err)
} satisfies ParseWorkerResponse);
}
process.exit(1);
});

View File

@@ -0,0 +1,417 @@
/**
* Tests for WorkerPool (TRUEREF-0022).
*
* Real node:worker_threads Workers are replaced by FakeWorker (an EventEmitter)
* so no subprocess is ever spawned. We maintain our own registry of created
* FakeWorker instances so we can inspect postMessage calls and emit events.
*/
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
import { writeFileSync, unlinkSync, existsSync } from 'node:fs';
// ---------------------------------------------------------------------------
// Hoist FakeWorker + registry so vi.mock can reference them.
// ---------------------------------------------------------------------------
const { createdWorkers, FakeWorker } = vi.hoisted(() => {
// eslint-disable-next-line @typescript-eslint/no-require-imports
const { EventEmitter } = require('node:events') as typeof import('node:events');
const createdWorkers: InstanceType<typeof FakeWorkerClass>[] = [];
class FakeWorkerClass extends EventEmitter {
threadId = Math.floor(Math.random() * 100_000);
// Auto-emit 'exit' with code 0 when a shutdown message is received
postMessage = vi.fn((msg: { type: string }) => {
if (msg.type === 'shutdown') {
// Emit exit asynchronously so shutdown() loop can process it
setImmediate(() => {
this.emit('exit', 0);
this.threadId = 0; // signal exited
});
}
});
terminate = vi.fn(() => {
this.threadId = 0;
});
constructor() {
super();
createdWorkers.push(this);
}
}
return { createdWorkers, FakeWorker: FakeWorkerClass };
});
// ---------------------------------------------------------------------------
// Mock node:worker_threads BEFORE importing WorkerPool.
// ---------------------------------------------------------------------------
vi.mock('node:worker_threads', () => {
return { Worker: FakeWorker };
});
import { WorkerPool, type WorkerPoolOptions } from './worker-pool.js';
// ---------------------------------------------------------------------------
// Test helpers
// ---------------------------------------------------------------------------
const FAKE_SCRIPT = '/tmp/fake-worker-pool-test.mjs';
const MISSING_SCRIPT = '/tmp/this-file-does-not-exist-worker-pool.mjs';
function makeOpts(overrides: Partial<WorkerPoolOptions> = {}): WorkerPoolOptions {
return {
concurrency: 2,
workerScript: FAKE_SCRIPT,
embedWorkerScript: MISSING_SCRIPT,
writeWorkerScript: MISSING_SCRIPT,
dbPath: ':memory:',
onProgress: vi.fn(),
onJobDone: vi.fn(),
onJobFailed: vi.fn(),
onEmbedDone: vi.fn(),
onEmbedFailed: vi.fn(),
...overrides
} as unknown as WorkerPoolOptions;
}
// ---------------------------------------------------------------------------
// Setup / teardown
// ---------------------------------------------------------------------------
beforeEach(() => {
// Create the fake script so existsSync returns true
writeFileSync(FAKE_SCRIPT, '// placeholder\n');
// Clear registry and reset all mocks
createdWorkers.length = 0;
vi.clearAllMocks();
});
afterEach(() => {
if (existsSync(FAKE_SCRIPT)) unlinkSync(FAKE_SCRIPT);
});
// ---------------------------------------------------------------------------
// Fallback mode (no real workers)
// ---------------------------------------------------------------------------
describe('WorkerPool fallback mode', () => {
it('enters fallback mode when workerScript does not exist', () => {
const pool = new WorkerPool(makeOpts({ workerScript: MISSING_SCRIPT }));
expect(pool.isFallbackMode).toBe(true);
});
it('does not throw when constructed in fallback mode', () => {
expect(() => new WorkerPool(makeOpts({ workerScript: MISSING_SCRIPT }))).not.toThrow();
});
it('enqueue is a no-op in fallback mode — callbacks are never called', () => {
const opts = makeOpts({ workerScript: MISSING_SCRIPT });
const pool = new WorkerPool(opts);
pool.enqueue('job-1', '/repo/1');
expect(opts.onJobDone).not.toHaveBeenCalled();
expect(opts.onProgress).not.toHaveBeenCalled();
});
it('spawns no workers in fallback mode', () => {
new WorkerPool(makeOpts({ workerScript: MISSING_SCRIPT }));
expect(createdWorkers).toHaveLength(0);
});
});
// ---------------------------------------------------------------------------
// Normal mode
// ---------------------------------------------------------------------------
describe('WorkerPool normal mode', () => {
it('isFallbackMode is false when workerScript exists', () => {
const pool = new WorkerPool(makeOpts({ concurrency: 1 }));
expect(pool.isFallbackMode).toBe(false);
});
it('spawns `concurrency` parse workers on construction', () => {
new WorkerPool(makeOpts({ concurrency: 3 }));
expect(createdWorkers).toHaveLength(3);
});
it('spawns a write worker when writeWorkerScript exists', () => {
new WorkerPool(makeOpts({ concurrency: 2, writeWorkerScript: FAKE_SCRIPT }));
expect(createdWorkers).toHaveLength(3);
});
// -------------------------------------------------------------------------
// enqueue dispatches to an idle worker
// -------------------------------------------------------------------------
it('enqueue sends { type: "run", jobId } to an idle worker', () => {
const pool = new WorkerPool(makeOpts({ concurrency: 1 }));
pool.enqueue('job-42', '/repo/1');
expect(createdWorkers).toHaveLength(1);
expect(createdWorkers[0].postMessage).toHaveBeenCalledWith({ type: 'run', jobId: 'job-42' });
});
// -------------------------------------------------------------------------
// "done" message — onJobDone called, next queued job dispatched
// -------------------------------------------------------------------------
it('calls onJobDone and dispatches the next queued job when a worker emits "done"', () => {
const opts = makeOpts({ concurrency: 1 });
const pool = new WorkerPool(opts);
// Enqueue two jobs — second must wait because concurrency=1
pool.enqueue('job-A', '/repo/1');
pool.enqueue('job-B', '/repo/2');
const worker = createdWorkers[0];
// Simulate job-A completing
worker.emit('message', { type: 'done', jobId: 'job-A' });
expect(opts.onJobDone).toHaveBeenCalledWith('job-A');
// The same worker should now run job-B
expect(worker.postMessage).toHaveBeenCalledWith({ type: 'run', jobId: 'job-B' });
});
// -------------------------------------------------------------------------
// "failed" message — onJobFailed called
// -------------------------------------------------------------------------
it('calls onJobFailed when a worker emits a "failed" message', () => {
const opts = makeOpts({ concurrency: 1 });
const pool = new WorkerPool(opts);
pool.enqueue('job-fail', '/repo/1');
const worker = createdWorkers[0];
worker.emit('message', { type: 'failed', jobId: 'job-fail', error: 'parse error' });
expect(opts.onJobFailed).toHaveBeenCalledWith('job-fail', 'parse error');
});
// -------------------------------------------------------------------------
// Per-repo serialization
// -------------------------------------------------------------------------
it('does not dispatch a second job for the same repo while first is running', () => {
const opts = makeOpts({ concurrency: 2 });
const pool = new WorkerPool(opts);
pool.enqueue('job-1', '/repo/same');
pool.enqueue('job-2', '/repo/same');
// Only job-1 should have been dispatched (run message sent)
const runCalls = createdWorkers.flatMap((w) =>
w.postMessage.mock.calls.filter((c) => (c[0] as { type: string })?.type === 'run')
);
expect(
runCalls.filter((c) => (c[0] as unknown as { jobId: string }).jobId === 'job-1')
).toHaveLength(1);
expect(
runCalls.filter((c) => (c[0] as unknown as { jobId: string }).jobId === 'job-2')
).toHaveLength(0);
});
it('starts jobs for different repos concurrently', () => {
const opts = makeOpts({ concurrency: 2 });
const pool = new WorkerPool(opts);
pool.enqueue('job-alpha', '/repo/alpha');
pool.enqueue('job-beta', '/repo/beta');
const runCalls = createdWorkers.flatMap((w) =>
w.postMessage.mock.calls.filter((c) => (c[0] as { type: string })?.type === 'run')
);
const dispatchedIds = runCalls.map((c) => (c[0] as unknown as { jobId: string }).jobId);
expect(dispatchedIds).toContain('job-alpha');
expect(dispatchedIds).toContain('job-beta');
});
it('dispatches same-repo jobs concurrently when versionIds differ', () => {
const pool = new WorkerPool(makeOpts({ concurrency: 2 }));
pool.enqueue('job-v1', '/repo/same', 'v1');
pool.enqueue('job-v2', '/repo/same', 'v2');
const runCalls = createdWorkers.flatMap((w) =>
w.postMessage.mock.calls.filter((c) => (c[0] as { type: string })?.type === 'run')
);
const dispatchedIds = runCalls.map((c) => (c[0] as unknown as { jobId: string }).jobId);
expect(dispatchedIds).toContain('job-v1');
expect(dispatchedIds).toContain('job-v2');
});
it('forwards write worker acknowledgements back to the originating parse worker', () => {
new WorkerPool(makeOpts({ concurrency: 1, writeWorkerScript: FAKE_SCRIPT }));
const parseWorker = createdWorkers[0];
const writeWorker = createdWorkers[1];
writeWorker.emit('message', { type: 'ready' });
parseWorker.emit('message', {
type: 'write_replace',
jobId: 'job-write',
changedDocIds: [],
documents: [],
snippets: []
});
writeWorker.emit('message', { type: 'write_ack', jobId: 'job-write' });
expect(writeWorker.postMessage).toHaveBeenCalledWith({
type: 'write_replace',
jobId: 'job-write',
changedDocIds: [],
documents: [],
snippets: []
});
expect(parseWorker.postMessage).toHaveBeenCalledWith({ type: 'write_ack', jobId: 'job-write' });
});
it('forwards write worker acknowledgements back to the embed worker', () => {
new WorkerPool(
makeOpts({
concurrency: 1,
writeWorkerScript: FAKE_SCRIPT,
embedWorkerScript: FAKE_SCRIPT,
embeddingProfileId: 'local-default'
})
);
const parseWorker = createdWorkers[0];
const embedWorker = createdWorkers[1];
const writeWorker = createdWorkers[2];
writeWorker.emit('message', { type: 'ready' });
embedWorker.emit('message', { type: 'ready' });
embedWorker.emit('message', {
type: 'write_embeddings',
jobId: 'job-embed',
embeddings: []
});
writeWorker.emit('message', { type: 'write_ack', jobId: 'job-embed', embeddingCount: 0 });
expect(parseWorker.postMessage).not.toHaveBeenCalledWith({
type: 'write_ack',
jobId: 'job-embed'
});
expect(writeWorker.postMessage).toHaveBeenCalledWith({
type: 'write_embeddings',
jobId: 'job-embed',
embeddings: []
});
expect(embedWorker.postMessage).toHaveBeenCalledWith({
type: 'write_ack',
jobId: 'job-embed',
embeddingCount: 0
});
});
// -------------------------------------------------------------------------
// Worker crash (exit code != 0)
// -------------------------------------------------------------------------
it('calls onJobFailed and spawns a replacement worker when a worker exits with code 1', () => {
const opts = makeOpts({ concurrency: 1 });
const pool = new WorkerPool(opts);
pool.enqueue('job-crash', '/repo/1');
const originalWorker = createdWorkers[0];
// Simulate crash while the job is running
originalWorker.emit('exit', 1);
expect(opts.onJobFailed).toHaveBeenCalledWith('job-crash', expect.stringContaining('1'));
// A replacement worker must have been spawned
expect(createdWorkers.length).toBeGreaterThan(1);
});
it('does NOT call onJobFailed when a worker exits cleanly (code 0)', () => {
const opts = makeOpts({ concurrency: 1 });
new WorkerPool(opts);
// Exit without any running job
const worker = createdWorkers[0];
worker.emit('exit', 0);
expect(opts.onJobFailed).not.toHaveBeenCalled();
});
// -------------------------------------------------------------------------
// setMaxConcurrency — scale up
// -------------------------------------------------------------------------
it('spawns additional workers when setMaxConcurrency is increased', () => {
const pool = new WorkerPool(makeOpts({ concurrency: 1 }));
const before = createdWorkers.length; // 1
pool.setMaxConcurrency(3);
expect(createdWorkers.length).toBe(before + 2);
});
// -------------------------------------------------------------------------
// setMaxConcurrency — scale down
// -------------------------------------------------------------------------
it('sends "shutdown" to idle workers when setMaxConcurrency is decreased', () => {
const opts = makeOpts({ concurrency: 3 });
const pool = new WorkerPool(opts);
pool.setMaxConcurrency(1);
const shutdownWorkers = createdWorkers.filter((w) =>
w.postMessage.mock.calls.some((c) => (c[0] as { type: string })?.type === 'shutdown')
);
// Two workers should have received shutdown
expect(shutdownWorkers.length).toBeGreaterThanOrEqual(2);
});
// -------------------------------------------------------------------------
// shutdown
// -------------------------------------------------------------------------
it('sends "shutdown" to all workers on pool.shutdown()', () => {
const opts = makeOpts({ concurrency: 2 });
const pool = new WorkerPool(opts);
// Don't await — shutdown() is async but the postMessage calls happen synchronously
void pool.shutdown();
for (const worker of createdWorkers) {
const hasShutdown = worker.postMessage.mock.calls.some(
(c) => (c[0] as { type: string })?.type === 'shutdown'
);
expect(hasShutdown).toBe(true);
}
});
// -------------------------------------------------------------------------
// Enqueue after shutdown is a no-op
// -------------------------------------------------------------------------
it('ignores enqueue calls after shutdown is initiated', () => {
const opts = makeOpts({ concurrency: 1 });
const pool = new WorkerPool(opts);
// Don't await — shutdown() sets shuttingDown=true synchronously
void pool.shutdown();
// Reset postMessage mocks to isolate post-shutdown calls
for (const w of createdWorkers) w.postMessage.mockClear();
pool.enqueue('job-late', '/repo/1');
const runCalls = createdWorkers.flatMap((w) =>
w.postMessage.mock.calls.filter((c) => (c[0] as { type: string })?.type === 'run')
);
expect(runCalls).toHaveLength(0);
});
});

View File

@@ -0,0 +1,559 @@
import { Worker } from 'node:worker_threads';
import { existsSync } from 'node:fs';
import type {
ParseWorkerRequest,
ParseWorkerResponse,
EmbedWorkerRequest,
EmbedWorkerResponse,
WorkerInitData,
WriteWorkerRequest,
WriteWorkerResponse
} from './worker-types.js';
type InFlightWriteRequest = Exclude<WriteWorkerRequest, { type: 'shutdown' }>;
export interface WorkerPoolOptions {
concurrency: number;
workerScript: string;
embedWorkerScript: string;
writeWorkerScript?: string;
dbPath: string;
embeddingProfileId?: string;
onProgress: (jobId: string, msg: Extract<ParseWorkerResponse, { type: 'progress' }>) => void;
onJobDone: (jobId: string) => void;
onJobFailed: (jobId: string, error: string) => void;
onEmbedDone: (jobId: string) => void;
onEmbedFailed: (jobId: string, error: string) => void;
onWorkerStatus?: (status: WorkerPoolStatus) => void;
}
export interface WorkerStatusEntry {
index: number;
state: 'idle' | 'running';
jobId: string | null;
repositoryId: string | null;
versionId: string | null;
}
export interface WorkerPoolStatus {
concurrency: number;
active: number;
idle: number;
workers: WorkerStatusEntry[];
}
interface QueuedJob {
jobId: string;
repositoryId: string;
versionId?: string | null;
}
interface RunningJob {
jobId: string;
repositoryId: string;
versionId?: string | null;
}
interface EmbedQueuedJob {
jobId: string;
repositoryId: string;
versionId: string | null;
}
export class WorkerPool {
private workers: Worker[] = [];
private idleWorkers: Worker[] = [];
private embedWorker: Worker | null = null;
private writeWorker: Worker | null = null;
private embedReady = false;
private writeReady = false;
private jobQueue: QueuedJob[] = [];
private runningJobs = new Map<Worker, RunningJob>();
private runningJobKeys = new Set<string>();
private embedQueue: EmbedQueuedJob[] = [];
private pendingWriteWorkers = new Map<string, Worker>();
private options: WorkerPoolOptions;
private fallbackMode = false;
private shuttingDown = false;
constructor(options: WorkerPoolOptions) {
this.options = options;
// Check if worker script exists
if (!existsSync(options.workerScript)) {
console.warn(`Worker script not found at ${options.workerScript}, entering fallback mode`);
this.fallbackMode = true;
return;
}
// Spawn parse workers
for (let i = 0; i < options.concurrency; i++) {
const worker = this.spawnParseWorker();
this.workers.push(worker);
this.idleWorkers.push(worker);
}
// Optionally spawn embed worker
if (options.embeddingProfileId && existsSync(options.embedWorkerScript)) {
this.embedWorker = this.spawnEmbedWorker();
}
if (options.writeWorkerScript && existsSync(options.writeWorkerScript)) {
this.writeWorker = this.spawnWriteWorker(options.writeWorkerScript);
}
this.emitStatusChanged();
}
private spawnParseWorker(): Worker {
const worker = new Worker(this.options.workerScript, {
workerData: {
dbPath: this.options.dbPath
} satisfies WorkerInitData
});
worker.on('message', (msg: ParseWorkerResponse) => this.onWorkerMessage(worker, msg));
worker.on('exit', (code: number) => this.onWorkerExit(worker, code));
return worker;
}
private spawnEmbedWorker(): Worker {
const worker = new Worker(this.options.embedWorkerScript, {
workerData: {
dbPath: this.options.dbPath,
embeddingProfileId: this.options.embeddingProfileId
} satisfies WorkerInitData
});
worker.on('message', (msg: EmbedWorkerResponse) => this.onEmbedWorkerMessage(msg));
return worker;
}
private spawnWriteWorker(writeWorkerScript: string): Worker {
const worker = new Worker(writeWorkerScript, {
workerData: {
dbPath: this.options.dbPath
} satisfies WorkerInitData
});
worker.on('message', (msg: WriteWorkerResponse) => this.onWriteWorkerMessage(msg));
worker.on('exit', () => {
this.writeReady = false;
this.writeWorker = null;
});
return worker;
}
public enqueue(jobId: string, repositoryId: string, versionId?: string | null): void {
if (this.shuttingDown) {
console.warn('WorkerPool is shutting down, ignoring enqueue request');
return;
}
if (this.fallbackMode) {
console.warn(`WorkerPool in fallback mode for job ${jobId} - delegating to main thread`);
return;
}
this.jobQueue.push({ jobId, repositoryId, versionId });
this.dispatch();
}
private static jobKey(repositoryId: string, versionId?: string | null): string {
return `${repositoryId}:${versionId ?? ''}`;
}
private dispatch(): void {
let statusChanged = false;
while (this.idleWorkers.length > 0 && this.jobQueue.length > 0) {
// Find first job whose (repositoryId, versionId) compound key is not currently running
const jobIdx = this.jobQueue.findIndex(
(j) => !this.runningJobKeys.has(WorkerPool.jobKey(j.repositoryId, j.versionId))
);
if (jobIdx === -1) {
// No eligible job found (all repos have running jobs)
break;
}
const job = this.jobQueue.splice(jobIdx, 1)[0];
const worker = this.idleWorkers.pop()!;
this.runningJobs.set(worker, {
jobId: job.jobId,
repositoryId: job.repositoryId,
versionId: job.versionId
});
this.runningJobKeys.add(WorkerPool.jobKey(job.repositoryId, job.versionId));
statusChanged = true;
const msg: ParseWorkerRequest = { type: 'run', jobId: job.jobId };
worker.postMessage(msg);
}
if (statusChanged) {
this.emitStatusChanged();
}
}
private postWriteRequest(request: InFlightWriteRequest, worker?: Worker): void {
if (!this.writeWorker || !this.writeReady) {
if (worker) {
worker.postMessage({
type: 'write_error',
jobId: request.jobId,
error: 'Write worker is not ready'
} satisfies ParseWorkerRequest);
}
return;
}
if (worker) {
this.pendingWriteWorkers.set(request.jobId, worker);
}
this.writeWorker.postMessage(request);
}
private onWorkerMessage(worker: Worker, msg: ParseWorkerResponse): void {
if (msg.type === 'progress') {
this.postWriteRequest({
type: 'write_job_update',
jobId: msg.jobId,
fields: {
status: 'running',
startedAt: Math.floor(Date.now() / 1000),
stage: msg.stage,
stageDetail: msg.stageDetail ?? null,
progress: msg.progress,
processedFiles: msg.processedFiles,
totalFiles: msg.totalFiles
}
});
this.options.onProgress(msg.jobId, msg);
} else if (
msg.type === 'write_replace' ||
msg.type === 'write_clone' ||
msg.type === 'write_repo_update' ||
msg.type === 'write_version_update' ||
msg.type === 'write_repo_config'
) {
this.postWriteRequest(msg, worker);
} else if (msg.type === 'done') {
const runningJob = this.runningJobs.get(worker);
this.postWriteRequest({
type: 'write_job_update',
jobId: msg.jobId,
fields: {
status: 'done',
stage: 'done',
progress: 100,
completedAt: Math.floor(Date.now() / 1000)
}
});
if (runningJob) {
this.runningJobs.delete(worker);
this.runningJobKeys.delete(
WorkerPool.jobKey(runningJob.repositoryId, runningJob.versionId)
);
}
this.idleWorkers.push(worker);
this.options.onJobDone(msg.jobId);
this.emitStatusChanged();
// If embedding configured, enqueue embed request
if (this.embedWorker && this.options.embeddingProfileId) {
const runningJobData = runningJob || {
jobId: msg.jobId,
repositoryId: '',
versionId: null
};
this.enqueueEmbed(msg.jobId, runningJobData.repositoryId, runningJobData.versionId ?? null);
}
this.dispatch();
} else if (msg.type === 'failed') {
const runningJob = this.runningJobs.get(worker);
this.postWriteRequest({
type: 'write_job_update',
jobId: msg.jobId,
fields: {
status: 'failed',
stage: 'failed',
error: msg.error,
completedAt: Math.floor(Date.now() / 1000)
}
});
if (runningJob) {
this.runningJobs.delete(worker);
this.runningJobKeys.delete(
WorkerPool.jobKey(runningJob.repositoryId, runningJob.versionId)
);
}
this.idleWorkers.push(worker);
this.options.onJobFailed(msg.jobId, msg.error);
this.emitStatusChanged();
this.dispatch();
}
}
private onWorkerExit(worker: Worker, code: number): void {
if (this.shuttingDown) {
return;
}
// Remove from idle if present
const idleIdx = this.idleWorkers.indexOf(worker);
if (idleIdx !== -1) {
this.idleWorkers.splice(idleIdx, 1);
}
// Check if there's a running job
const runningJob = this.runningJobs.get(worker);
if (runningJob && code !== 0) {
this.runningJobs.delete(worker);
this.runningJobKeys.delete(WorkerPool.jobKey(runningJob.repositoryId, runningJob.versionId));
this.options.onJobFailed(runningJob.jobId, `Worker crashed with code ${code}`);
} else if (runningJob) {
this.runningJobs.delete(worker);
this.runningJobKeys.delete(WorkerPool.jobKey(runningJob.repositoryId, runningJob.versionId));
}
this.emitStatusChanged();
// Remove from workers array
const workerIdx = this.workers.indexOf(worker);
if (workerIdx !== -1) {
this.workers.splice(workerIdx, 1);
}
// Spawn replacement worker if not shutting down and we haven't reached target
if (!this.shuttingDown && this.workers.length < this.options.concurrency) {
const newWorker = this.spawnParseWorker();
this.workers.push(newWorker);
this.idleWorkers.push(newWorker);
this.dispatch();
}
}
private onEmbedWorkerMessage(msg: EmbedWorkerResponse): void {
if (msg.type === 'ready') {
this.embedReady = true;
// Process any queued embed requests
this.processEmbedQueue();
} else if (msg.type === 'write_embeddings') {
const embedWorker = this.embedWorker;
if (!embedWorker) {
return;
}
if (!this.writeWorker || !this.writeReady) {
embedWorker.postMessage({
type: 'write_error',
jobId: msg.jobId,
error: 'Write worker is not ready'
} satisfies EmbedWorkerRequest);
return;
}
this.postWriteRequest(msg, embedWorker);
} else if (msg.type === 'embed-progress') {
// Progress message - could be tracked but not strictly required
} else if (msg.type === 'embed-done') {
this.options.onEmbedDone(msg.jobId);
} else if (msg.type === 'embed-failed') {
this.options.onEmbedFailed(msg.jobId, msg.error);
}
}
private onWriteWorkerMessage(msg: WriteWorkerResponse): void {
if (msg.type === 'ready') {
this.writeReady = true;
return;
}
const worker = this.pendingWriteWorkers.get(msg.jobId);
if (worker) {
this.pendingWriteWorkers.delete(msg.jobId);
worker.postMessage(msg satisfies ParseWorkerRequest);
}
if (msg.type === 'write_error') {
console.error('[WorkerPool] Write worker failed for job:', msg.jobId, msg.error);
}
}
private processEmbedQueue(): void {
if (!this.embedWorker || !this.embedReady) {
return;
}
while (this.embedQueue.length > 0) {
const job = this.embedQueue.shift();
if (job) {
const msg: EmbedWorkerRequest = {
type: 'embed',
jobId: job.jobId,
repositoryId: job.repositoryId,
versionId: job.versionId
};
this.embedWorker.postMessage(msg);
}
}
}
public enqueueEmbed(jobId: string, repositoryId: string, versionId: string | null): void {
if (!this.embedWorker) {
return; // no-op if embedding not configured
}
if (this.embedReady) {
const msg: EmbedWorkerRequest = {
type: 'embed',
jobId,
repositoryId,
versionId
};
this.embedWorker.postMessage(msg);
} else {
this.embedQueue.push({ jobId, repositoryId, versionId });
}
}
public setMaxConcurrency(n: number): void {
this.options.concurrency = n;
const current = this.workers.length;
if (n > current) {
// Spawn additional workers
for (let i = current; i < n; i++) {
const worker = this.spawnParseWorker();
this.workers.push(worker);
this.idleWorkers.push(worker);
}
} else if (n < current) {
// Shut down excess idle workers
const excess = current - n;
for (let i = 0; i < excess; i++) {
if (this.idleWorkers.length > 0) {
const worker = this.idleWorkers.pop()!;
const workerIdx = this.workers.indexOf(worker);
if (workerIdx !== -1) {
this.workers.splice(workerIdx, 1);
}
const msg: ParseWorkerRequest = { type: 'shutdown' };
worker.postMessage(msg);
}
}
}
this.emitStatusChanged();
}
public async shutdown(): Promise<void> {
this.shuttingDown = true;
const msg: ParseWorkerRequest = { type: 'shutdown' };
// Send shutdown to all parse workers
for (const worker of this.workers) {
try {
worker.postMessage(msg);
} catch {
// Worker might already be exited
}
}
// Send shutdown to embed worker if exists
if (this.embedWorker) {
try {
const embedMsg: EmbedWorkerRequest = { type: 'shutdown' };
this.embedWorker.postMessage(embedMsg);
} catch {
// Worker might already be exited
}
}
if (this.writeWorker) {
try {
this.writeWorker.postMessage({ type: 'shutdown' });
} catch {
// Worker might already be exited
}
}
// Wait for workers to exit with timeout
const timeout = 5000;
const startTime = Date.now();
const checkAllExited = (): boolean => {
return this.workers.length === 0 && (!this.embedWorker || !this.embedWorker.threadId);
};
while (!checkAllExited() && Date.now() - startTime < timeout) {
await new Promise((resolve) => setTimeout(resolve, 100));
}
// Force kill any remaining workers
for (const worker of this.workers) {
try {
worker.terminate();
} catch {
// Already terminated
}
}
if (this.embedWorker) {
try {
this.embedWorker.terminate();
} catch {
// Already terminated
}
}
if (this.writeWorker) {
try {
this.writeWorker.terminate();
} catch {
// Already terminated
}
}
this.workers = [];
this.idleWorkers = [];
this.embedWorker = null;
this.writeWorker = null;
this.pendingWriteWorkers.clear();
this.emitStatusChanged();
}
public getStatus(): WorkerPoolStatus {
return {
concurrency: this.options.concurrency,
active: this.runningJobs.size,
idle: this.idleWorkers.length,
workers: this.workers.map((worker, index) => {
const runningJob = this.runningJobs.get(worker);
return {
index,
state: runningJob ? 'running' : 'idle',
jobId: runningJob?.jobId ?? null,
repositoryId: runningJob?.repositoryId ?? null,
versionId: runningJob?.versionId ?? null
};
})
};
}
private emitStatusChanged(): void {
this.options.onWorkerStatus?.(this.getStatus());
}
public get isFallbackMode(): boolean {
return this.fallbackMode;
}
}

View File

@@ -0,0 +1,174 @@
import type { IndexingStage } from '$lib/types.js';
export type ParseWorkerRequest =
| { type: 'run'; jobId: string }
| { type: 'write_ack'; jobId: string }
| { type: 'write_error'; jobId: string; error: string }
| { type: 'shutdown' };
export type ParseWorkerResponse =
| {
type: 'progress';
jobId: string;
stage: IndexingStage;
stageDetail?: string;
progress: number;
processedFiles: number;
totalFiles: number;
}
| { type: 'done'; jobId: string }
| { type: 'failed'; jobId: string; error: string }
| WriteReplaceRequest
| WriteCloneRequest
| WriteRepoUpdateRequest
| WriteVersionUpdateRequest
| WriteRepoConfigRequest;
export type EmbedWorkerRequest =
| { type: 'embed'; jobId: string; repositoryId: string; versionId: string | null }
| {
type: 'write_ack';
jobId: string;
documentCount?: number;
snippetCount?: number;
embeddingCount?: number;
}
| { type: 'write_error'; jobId: string; error: string }
| { type: 'shutdown' };
export type EmbedWorkerResponse =
| { type: 'ready' }
| { type: 'embed-progress'; jobId: string; done: number; total: number }
| { type: 'embed-done'; jobId: string }
| { type: 'embed-failed'; jobId: string; error: string }
| WriteEmbeddingsRequest;
export type WriteWorkerRequest =
| ReplaceWriteRequest
| CloneWriteRequest
| JobUpdateWriteRequest
| RepoUpdateWriteRequest
| VersionUpdateWriteRequest
| RepoConfigWriteRequest
| EmbeddingsWriteRequest
| { type: 'shutdown' };
export type WriteWorkerResponse = { type: 'ready' } | WriteAck | WriteError;
export interface WorkerInitData {
dbPath: string;
embeddingProfileId?: string;
}
// Write worker message types (Phase 6)
export interface SerializedDocument {
id: string;
repositoryId: string;
versionId: string | null;
filePath: string;
title: string | null;
language: string | null;
tokenCount: number;
checksum: string;
indexedAt: number;
}
export interface SerializedSnippet {
id: string;
documentId: string;
repositoryId: string;
versionId: string | null;
type: 'code' | 'info';
title: string | null;
content: string;
language: string | null;
breadcrumb: string | null;
tokenCount: number;
createdAt: number;
}
export interface SerializedEmbedding {
snippetId: string;
profileId: string;
model: string;
dimensions: number;
embedding: Uint8Array;
}
export type SerializedFieldValue = string | number | null;
export type SerializedFields = Record<string, SerializedFieldValue>;
export type ReplaceWriteRequest = {
type: 'write_replace';
jobId: string;
changedDocIds: string[];
documents: SerializedDocument[];
snippets: SerializedSnippet[];
};
export type CloneWriteRequest = {
type: 'write_clone';
jobId: string;
ancestorVersionId: string;
targetVersionId: string;
repositoryId: string;
unchangedPaths: string[];
};
export type WriteReplaceRequest = ReplaceWriteRequest;
export type WriteCloneRequest = CloneWriteRequest;
export type EmbeddingsWriteRequest = {
type: 'write_embeddings';
jobId: string;
embeddings: SerializedEmbedding[];
};
export type RepoUpdateWriteRequest = {
type: 'write_repo_update';
jobId: string;
repositoryId: string;
fields: SerializedFields;
};
export type VersionUpdateWriteRequest = {
type: 'write_version_update';
jobId: string;
versionId: string;
fields: SerializedFields;
};
export type RepoConfigWriteRequest = {
type: 'write_repo_config';
jobId: string;
repositoryId: string;
versionId: string | null;
rules: string[];
};
export type JobUpdateWriteRequest = {
type: 'write_job_update';
jobId: string;
fields: SerializedFields;
};
export type WriteEmbeddingsRequest = EmbeddingsWriteRequest;
export type WriteRepoUpdateRequest = RepoUpdateWriteRequest;
export type WriteVersionUpdateRequest = VersionUpdateWriteRequest;
export type WriteRepoConfigRequest = RepoConfigWriteRequest;
export type WriteAck = {
type: 'write_ack';
jobId: string;
documentCount?: number;
snippetCount?: number;
embeddingCount?: number;
};
export type WriteError = {
type: 'write_error';
jobId: string;
error: string;
};

View File

@@ -0,0 +1,343 @@
import { randomUUID } from 'node:crypto';
import type Database from 'better-sqlite3';
import type { NewDocument, NewSnippet } from '$lib/types';
import { SqliteVecStore } from '$lib/server/search/sqlite-vec.store.js';
import type {
SerializedDocument,
SerializedEmbedding,
SerializedFields,
SerializedSnippet
} from './worker-types.js';
type DocumentLike = Pick<
NewDocument,
| 'id'
| 'repositoryId'
| 'versionId'
| 'filePath'
| 'title'
| 'language'
| 'tokenCount'
| 'checksum'
> & {
indexedAt: Date | number;
};
type SnippetLike = Pick<
NewSnippet,
| 'id'
| 'documentId'
| 'repositoryId'
| 'versionId'
| 'type'
| 'title'
| 'content'
| 'language'
| 'breadcrumb'
| 'tokenCount'
> & {
createdAt: Date | number;
};
export interface CloneFromAncestorRequest {
ancestorVersionId: string;
targetVersionId: string;
repositoryId: string;
unchangedPaths: string[];
}
export interface PersistedEmbedding {
snippetId: string;
profileId: string;
model: string;
dimensions: number;
embedding: Buffer | Uint8Array;
}
function toEpochSeconds(value: Date | number): number {
return value instanceof Date ? Math.floor(value.getTime() / 1000) : value;
}
function toSnake(key: string): string {
return key.replace(/[A-Z]/g, (char) => `_${char.toLowerCase()}`);
}
function replaceSnippetsInternal(
db: Database.Database,
changedDocIds: string[],
newDocuments: DocumentLike[],
newSnippets: SnippetLike[]
): void {
const sqliteVecStore = new SqliteVecStore(db);
const insertDoc = db.prepare(
`INSERT INTO documents
(id, repository_id, version_id, file_path, title, language,
token_count, checksum, indexed_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)`
);
const insertSnippet = db.prepare(
`INSERT INTO snippets
(id, document_id, repository_id, version_id, type, title,
content, language, breadcrumb, token_count, created_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
);
db.transaction(() => {
sqliteVecStore.deleteEmbeddingsForDocumentIds(changedDocIds);
if (changedDocIds.length > 0) {
const placeholders = changedDocIds.map(() => '?').join(',');
db.prepare(`DELETE FROM documents WHERE id IN (${placeholders})`).run(...changedDocIds);
}
for (const doc of newDocuments) {
insertDoc.run(
doc.id,
doc.repositoryId,
doc.versionId ?? null,
doc.filePath,
doc.title ?? null,
doc.language ?? null,
doc.tokenCount ?? 0,
doc.checksum,
toEpochSeconds(doc.indexedAt)
);
}
for (const snippet of newSnippets) {
insertSnippet.run(
snippet.id,
snippet.documentId,
snippet.repositoryId,
snippet.versionId ?? null,
snippet.type,
snippet.title ?? null,
snippet.content,
snippet.language ?? null,
snippet.breadcrumb ?? null,
snippet.tokenCount ?? 0,
toEpochSeconds(snippet.createdAt)
);
}
})();
}
export function replaceSnippets(
db: Database.Database,
changedDocIds: string[],
newDocuments: NewDocument[],
newSnippets: NewSnippet[]
): void {
replaceSnippetsInternal(db, changedDocIds, newDocuments, newSnippets);
}
export function replaceSerializedSnippets(
db: Database.Database,
changedDocIds: string[],
documents: SerializedDocument[],
snippets: SerializedSnippet[]
): void {
replaceSnippetsInternal(db, changedDocIds, documents, snippets);
}
export function cloneFromAncestor(db: Database.Database, request: CloneFromAncestorRequest): void {
const sqliteVecStore = new SqliteVecStore(db);
const { ancestorVersionId, targetVersionId, repositoryId, unchangedPaths } = request;
db.transaction(() => {
const pathList = [...unchangedPaths];
if (pathList.length === 0) {
return;
}
const placeholders = pathList.map(() => '?').join(',');
const ancestorDocs = db
.prepare(`SELECT * FROM documents WHERE version_id = ? AND file_path IN (${placeholders})`)
.all(ancestorVersionId, ...pathList) as Array<{
id: string;
repository_id: string;
file_path: string;
title: string | null;
language: string | null;
token_count: number;
checksum: string;
indexed_at: number;
}>;
const docIdMap = new Map<string, string>();
const nowEpoch = Math.floor(Date.now() / 1000);
for (const doc of ancestorDocs) {
const newDocId = randomUUID();
docIdMap.set(doc.id, newDocId);
db.prepare(
`INSERT INTO documents (id, repository_id, version_id, file_path, title, language, token_count, checksum, indexed_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)`
).run(
newDocId,
repositoryId,
targetVersionId,
doc.file_path,
doc.title,
doc.language,
doc.token_count,
doc.checksum,
nowEpoch
);
}
if (docIdMap.size === 0) return;
const oldDocIds = [...docIdMap.keys()];
const snippetPlaceholders = oldDocIds.map(() => '?').join(',');
const ancestorSnippets = db
.prepare(`SELECT * FROM snippets WHERE document_id IN (${snippetPlaceholders})`)
.all(...oldDocIds) as Array<{
id: string;
document_id: string;
repository_id: string;
version_id: string | null;
type: string;
title: string | null;
content: string;
language: string | null;
breadcrumb: string | null;
token_count: number;
created_at: number;
}>;
const snippetIdMap = new Map<string, string>();
for (const snippet of ancestorSnippets) {
const newSnippetId = randomUUID();
snippetIdMap.set(snippet.id, newSnippetId);
const newDocId = docIdMap.get(snippet.document_id)!;
db.prepare(
`INSERT INTO snippets (id, document_id, repository_id, version_id, type, title, content, language, breadcrumb, token_count, created_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
).run(
newSnippetId,
newDocId,
repositoryId,
targetVersionId,
snippet.type,
snippet.title,
snippet.content,
snippet.language,
snippet.breadcrumb,
snippet.token_count,
snippet.created_at
);
}
if (snippetIdMap.size === 0) {
return;
}
const oldSnippetIds = [...snippetIdMap.keys()];
const embPlaceholders = oldSnippetIds.map(() => '?').join(',');
const ancestorEmbeddings = db
.prepare(`SELECT * FROM snippet_embeddings WHERE snippet_id IN (${embPlaceholders})`)
.all(...oldSnippetIds) as Array<{
snippet_id: string;
profile_id: string;
model: string;
dimensions: number;
embedding: Buffer;
created_at: number;
}>;
for (const emb of ancestorEmbeddings) {
const newSnippetId = snippetIdMap.get(emb.snippet_id)!;
db.prepare(
`INSERT INTO snippet_embeddings (snippet_id, profile_id, model, dimensions, embedding, created_at)
VALUES (?, ?, ?, ?, ?, ?)`
).run(newSnippetId, emb.profile_id, emb.model, emb.dimensions, emb.embedding, emb.created_at);
sqliteVecStore.upsertEmbeddingBuffer(
emb.profile_id,
newSnippetId,
emb.embedding,
emb.dimensions
);
}
})();
}
export function upsertEmbeddings(db: Database.Database, embeddings: PersistedEmbedding[]): void {
if (embeddings.length === 0) {
return;
}
const sqliteVecStore = new SqliteVecStore(db);
const insert = db.prepare<[string, string, string, number, Buffer]>(`
INSERT OR REPLACE INTO snippet_embeddings (snippet_id, profile_id, model, dimensions, embedding, created_at)
VALUES (?, ?, ?, ?, ?, unixepoch())
`);
db.transaction(() => {
for (const item of embeddings) {
const embeddingBuffer = Buffer.isBuffer(item.embedding)
? item.embedding
: Buffer.from(item.embedding);
insert.run(item.snippetId, item.profileId, item.model, item.dimensions, embeddingBuffer);
sqliteVecStore.upsertEmbeddingBuffer(
item.profileId,
item.snippetId,
embeddingBuffer,
item.dimensions
);
}
})();
}
export function upsertSerializedEmbeddings(
db: Database.Database,
embeddings: SerializedEmbedding[]
): void {
upsertEmbeddings(
db,
embeddings.map((item) => ({
snippetId: item.snippetId,
profileId: item.profileId,
model: item.model,
dimensions: item.dimensions,
embedding: item.embedding
}))
);
}
export function updateRepo(
db: Database.Database,
repositoryId: string,
fields: SerializedFields
): void {
const now = Math.floor(Date.now() / 1000);
const allFields = { ...fields, updatedAt: now };
const sets = Object.keys(allFields)
.map((key) => `${toSnake(key)} = ?`)
.join(', ');
const values = [...Object.values(allFields), repositoryId];
db.prepare(`UPDATE repositories SET ${sets} WHERE id = ?`).run(...values);
}
export function updateJob(db: Database.Database, jobId: string, fields: SerializedFields): void {
const sets = Object.keys(fields)
.map((key) => `${toSnake(key)} = ?`)
.join(', ');
const values = [...Object.values(fields), jobId];
db.prepare(`UPDATE indexing_jobs SET ${sets} WHERE id = ?`).run(...values);
}
export function updateVersion(
db: Database.Database,
versionId: string,
fields: SerializedFields
): void {
const sets = Object.keys(fields)
.map((key) => `${toSnake(key)} = ?`)
.join(', ');
const values = [...Object.values(fields), versionId];
db.prepare(`UPDATE repository_versions SET ${sets} WHERE id = ?`).run(...values);
}

View File

@@ -0,0 +1,169 @@
import { workerData, parentPort } from 'node:worker_threads';
import Database from 'better-sqlite3';
import { applySqlitePragmas } from '$lib/server/db/connection.js';
import { loadSqliteVec } from '$lib/server/db/sqlite-vec.js';
import type { WorkerInitData, WriteWorkerRequest, WriteWorkerResponse } from './worker-types.js';
import {
cloneFromAncestor,
replaceSerializedSnippets,
updateJob,
updateRepo,
updateVersion,
upsertSerializedEmbeddings
} from './write-operations.js';
const { dbPath } = workerData as WorkerInitData;
const db = new Database(dbPath);
applySqlitePragmas(db);
loadSqliteVec(db);
parentPort?.postMessage({ type: 'ready' } satisfies WriteWorkerResponse);
parentPort?.on('message', (msg: WriteWorkerRequest) => {
if (msg.type === 'shutdown') {
db.close();
process.exit(0);
}
if (msg.type === 'write_replace') {
try {
replaceSerializedSnippets(db, msg.changedDocIds, msg.documents, msg.snippets);
parentPort?.postMessage({
type: 'write_ack',
jobId: msg.jobId,
documentCount: msg.documents.length,
snippetCount: msg.snippets.length
} satisfies WriteWorkerResponse);
} catch (error) {
parentPort?.postMessage({
type: 'write_error',
jobId: msg.jobId,
error: error instanceof Error ? error.message : String(error)
} satisfies WriteWorkerResponse);
}
return;
}
if (msg.type === 'write_clone') {
try {
cloneFromAncestor(db, {
ancestorVersionId: msg.ancestorVersionId,
targetVersionId: msg.targetVersionId,
repositoryId: msg.repositoryId,
unchangedPaths: msg.unchangedPaths
});
parentPort?.postMessage({
type: 'write_ack',
jobId: msg.jobId
} satisfies WriteWorkerResponse);
} catch (error) {
parentPort?.postMessage({
type: 'write_error',
jobId: msg.jobId,
error: error instanceof Error ? error.message : String(error)
} satisfies WriteWorkerResponse);
}
return;
}
if (msg.type === 'write_embeddings') {
try {
upsertSerializedEmbeddings(db, msg.embeddings);
parentPort?.postMessage({
type: 'write_ack',
jobId: msg.jobId,
embeddingCount: msg.embeddings.length
} satisfies WriteWorkerResponse);
} catch (error) {
parentPort?.postMessage({
type: 'write_error',
jobId: msg.jobId,
error: error instanceof Error ? error.message : String(error)
} satisfies WriteWorkerResponse);
}
return;
}
if (msg.type === 'write_job_update') {
try {
updateJob(db, msg.jobId, msg.fields);
parentPort?.postMessage({
type: 'write_ack',
jobId: msg.jobId
} satisfies WriteWorkerResponse);
} catch (error) {
parentPort?.postMessage({
type: 'write_error',
jobId: msg.jobId,
error: error instanceof Error ? error.message : String(error)
} satisfies WriteWorkerResponse);
}
return;
}
if (msg.type === 'write_repo_update') {
try {
updateRepo(db, msg.repositoryId, msg.fields);
parentPort?.postMessage({
type: 'write_ack',
jobId: msg.jobId
} satisfies WriteWorkerResponse);
} catch (error) {
parentPort?.postMessage({
type: 'write_error',
jobId: msg.jobId,
error: error instanceof Error ? error.message : String(error)
} satisfies WriteWorkerResponse);
}
return;
}
if (msg.type === 'write_version_update') {
try {
updateVersion(db, msg.versionId, msg.fields);
parentPort?.postMessage({
type: 'write_ack',
jobId: msg.jobId
} satisfies WriteWorkerResponse);
} catch (error) {
parentPort?.postMessage({
type: 'write_error',
jobId: msg.jobId,
error: error instanceof Error ? error.message : String(error)
} satisfies WriteWorkerResponse);
}
return;
}
if (msg.type === 'write_repo_config') {
try {
const now = Math.floor(Date.now() / 1000);
if (msg.versionId === null) {
db.prepare(
`DELETE FROM repository_configs WHERE repository_id = ? AND version_id IS NULL`
).run(msg.repositoryId);
} else {
db.prepare(`DELETE FROM repository_configs WHERE repository_id = ? AND version_id = ?`).run(
msg.repositoryId,
msg.versionId
);
}
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, ?, ?, ?)`
).run(msg.repositoryId, msg.versionId, JSON.stringify(msg.rules), now);
parentPort?.postMessage({
type: 'write_ack',
jobId: msg.jobId
} satisfies WriteWorkerResponse);
} catch (error) {
parentPort?.postMessage({
type: 'write_error',
jobId: msg.jobId,
error: error instanceof Error ? error.message : String(error)
} satisfies WriteWorkerResponse);
}
}
});

View File

@@ -15,6 +15,8 @@ import { HybridSearchService } from './hybrid.search.service.js';
import { VectorSearch, cosineSimilarity } from './vector.search.js';
import { reciprocalRankFusion } from './rrf.js';
import type { EmbeddingProvider, EmbeddingVector } from '../embeddings/provider.js';
import { loadSqliteVec } from '../db/sqlite-vec.js';
import { SqliteVecStore } from './sqlite-vec.store.js';
// ---------------------------------------------------------------------------
// In-memory DB factory
@@ -23,6 +25,7 @@ import type { EmbeddingProvider, EmbeddingVector } from '../embeddings/provider.
function createTestDb(): Database.Database {
const client = new Database(':memory:');
client.pragma('foreign_keys = ON');
loadSqliteVec(client);
const migrationsFolder = join(import.meta.dirname, '../db/migrations');
@@ -30,7 +33,11 @@ function createTestDb(): Database.Database {
const migrations = [
'0000_large_master_chief.sql',
'0001_quick_nighthawk.sql',
'0002_silky_stellaris.sql'
'0002_silky_stellaris.sql',
'0003_multiversion_config.sql',
'0004_complete_sentry.sql',
'0005_fix_stage_defaults.sql',
'0006_yielding_centennial.sql'
];
for (const migrationFile of migrations) {
const migrationSql = readFileSync(join(migrationsFolder, migrationFile), 'utf-8');
@@ -121,6 +128,7 @@ function seedEmbedding(
VALUES (?, ?, ?, ?, ?, ?)`
)
.run(snippetId, profileId, model, values.length, Buffer.from(f32.buffer), NOW_S);
new SqliteVecStore(client).upsertEmbedding(profileId, snippetId, f32);
}
// ---------------------------------------------------------------------------
@@ -368,6 +376,53 @@ describe('VectorSearch', () => {
const results = vs.vectorSearch(new Float32Array([-0.5, 0.5]), { repositoryId: repoId });
expect(results[0].score).toBeCloseTo(1.0, 4);
});
it('filters by profileId using per-profile vec tables', () => {
client
.prepare(
`INSERT INTO embedding_profiles (id, provider_kind, title, enabled, is_default, model, dimensions, config, created_at, updated_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
)
.run(
'secondary-profile',
'local-transformers',
'Secondary',
1,
0,
'test-model',
2,
'{}',
NOW_S,
NOW_S
);
const defaultSnippet = seedSnippet(client, {
repositoryId: repoId,
documentId: docId,
content: 'default profile snippet'
});
const secondarySnippet = seedSnippet(client, {
repositoryId: repoId,
documentId: docId,
content: 'secondary profile snippet'
});
seedEmbedding(client, defaultSnippet, [1, 0], 'local-default');
seedEmbedding(client, secondarySnippet, [1, 0], 'secondary-profile');
const vs = new VectorSearch(client);
const defaultResults = vs.vectorSearch(new Float32Array([1, 0]), {
repositoryId: repoId,
profileId: 'local-default'
});
const secondaryResults = vs.vectorSearch(new Float32Array([1, 0]), {
repositoryId: repoId,
profileId: 'secondary-profile'
});
expect(defaultResults.map((result) => result.snippetId)).toEqual([defaultSnippet]);
expect(secondaryResults.map((result) => result.snippetId)).toEqual([secondarySnippet]);
});
});
// ===========================================================================
@@ -395,7 +450,7 @@ describe('HybridSearchService', () => {
seedSnippet(client, { repositoryId: repoId, documentId: docId, content: 'hello world' });
const svc = new HybridSearchService(client, searchService, null);
const results = await svc.search('hello', { repositoryId: repoId });
const { results } = await svc.search('hello', { repositoryId: repoId });
expect(results.length).toBeGreaterThan(0);
expect(results[0].snippet.content).toBe('hello world');
@@ -406,14 +461,14 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('alpha zero', { repositoryId: repoId, alpha: 0 });
const { results } = await svc.search('alpha zero', { repositoryId: repoId, alpha: 0 });
expect(results.length).toBeGreaterThan(0);
});
it('returns empty array when FTS5 query is blank and no provider', async () => {
const svc = new HybridSearchService(client, searchService, null);
const results = await svc.search(' ', { repositoryId: repoId });
const { results } = await svc.search(' ', { repositoryId: repoId });
expect(results).toHaveLength(0);
});
@@ -425,7 +480,7 @@ describe('HybridSearchService', () => {
});
const svc = new HybridSearchService(client, searchService, makeNoopProvider());
const results = await svc.search('noop fallback', { repositoryId: repoId });
const { results } = await svc.search('noop fallback', { repositoryId: repoId });
expect(results.length).toBeGreaterThan(0);
});
@@ -445,7 +500,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0, 0, 0]]);
const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('hybrid search', {
const { results } = await svc.search('hybrid search', {
repositoryId: repoId,
alpha: 0.5
});
@@ -464,7 +519,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('deduplicate snippet', {
const { results } = await svc.search('deduplicate snippet', {
repositoryId: repoId,
alpha: 0.5
});
@@ -487,7 +542,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('pagination test', {
const { results } = await svc.search('pagination test', {
repositoryId: repoId,
limit: 3,
alpha: 0.5
@@ -519,7 +574,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('anything', {
const { results } = await svc.search('anything', {
repositoryId: repoId,
alpha: 1
});
@@ -543,7 +598,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('metadata check', {
const { results } = await svc.search('metadata check', {
repositoryId: repoId,
alpha: 0.5
});
@@ -580,7 +635,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('repository keyword', {
const { results } = await svc.search('repository keyword', {
repositoryId: repoId,
alpha: 0.5
});
@@ -607,7 +662,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider);
const codeResults = await svc.search('function example', {
const { results: codeResults } = await svc.search('function example', {
repositoryId: repoId,
type: 'code',
alpha: 0.5
@@ -632,7 +687,7 @@ describe('HybridSearchService', () => {
const svc = new HybridSearchService(client, searchService, provider);
// Should not throw and should return results.
const results = await svc.search('default alpha hybrid', { repositoryId: repoId });
const { results } = await svc.search('default alpha hybrid', { repositoryId: repoId });
expect(Array.isArray(results)).toBe(true);
});
@@ -761,7 +816,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, mockProvider);
const results = await hybridService.search('keyword', {
const { results } = await hybridService.search('keyword', {
repositoryId: repoId,
searchMode: 'keyword'
});
@@ -820,7 +875,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, mockProvider);
const results = await hybridService.search('semantic', {
const { results } = await hybridService.search('semantic', {
repositoryId: repoId,
searchMode: 'semantic',
profileId: 'test-profile'
@@ -848,7 +903,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, null);
const results = await hybridService.search('test query', {
const { results } = await hybridService.search('test query', {
repositoryId: repoId,
searchMode: 'semantic'
});
@@ -867,7 +922,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, mockProvider);
const results = await hybridService.search(' ', {
const { results } = await hybridService.search(' ', {
repositoryId: repoId,
searchMode: 'semantic'
});
@@ -885,7 +940,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, noopProvider);
const results = await hybridService.search('test query', {
const { results } = await hybridService.search('test query', {
repositoryId: repoId,
searchMode: 'semantic'
});
@@ -951,7 +1006,7 @@ describe('HybridSearchService', () => {
const hybridService = new HybridSearchService(client, searchService, mockProvider);
// Query with heavy punctuation that preprocesses to nothing.
const results = await hybridService.search('!!!@@@###', {
const { results } = await hybridService.search('!!!@@@###', {
repositoryId: repoId,
searchMode: 'auto',
profileId: 'test-profile'
@@ -978,7 +1033,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, mockProvider);
const results = await hybridService.search('hello', {
const { results } = await hybridService.search('hello', {
repositoryId: repoId,
searchMode: 'auto'
});
@@ -1038,7 +1093,7 @@ describe('HybridSearchService', () => {
const hybridService = new HybridSearchService(client, searchService, mockProvider);
// Query that won't match through FTS after punctuation normalization.
const results = await hybridService.search('%%%vector%%%', {
const { results } = await hybridService.search('%%%vector%%%', {
repositoryId: repoId,
searchMode: 'hybrid',
alpha: 0.5,
@@ -1064,7 +1119,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, null);
const results = await hybridService.search('!!!@@@###$$$', {
const { results } = await hybridService.search('!!!@@@###$$$', {
repositoryId: repoId
});

View File

@@ -101,9 +101,12 @@ export class HybridSearchService {
*
* @param query - Raw search string (preprocessing handled by SearchService).
* @param options - Search parameters including repositoryId and alpha blend.
* @returns Ranked array of SnippetSearchResult, deduplicated by snippet ID.
* @returns Object with ranked results array and the search mode actually used.
*/
async search(query: string, options: HybridSearchOptions): Promise<SnippetSearchResult[]> {
async search(
query: string,
options: HybridSearchOptions
): Promise<{ results: SnippetSearchResult[]; searchModeUsed: string }> {
const limit = options.limit ?? 20;
const mode = options.searchMode ?? 'auto';
@@ -127,12 +130,12 @@ export class HybridSearchService {
// Semantic mode: skip FTS entirely and use vector search only.
if (mode === 'semantic') {
if (!this.embeddingProvider || !query.trim()) {
return [];
return { results: [], searchModeUsed: 'semantic' };
}
const embeddings = await this.embeddingProvider.embed([query]);
if (embeddings.length === 0) {
return [];
return { results: [], searchModeUsed: 'semantic' };
}
const queryEmbedding = embeddings[0].values;
@@ -144,7 +147,15 @@ export class HybridSearchService {
});
const topIds = vectorResults.slice(0, limit).map((r) => r.snippetId);
return this.fetchSnippetsByIds(topIds, options.repositoryId, options.type);
return {
results: this.fetchSnippetsByIds(
topIds,
options.repositoryId,
options.versionId,
options.type
),
searchModeUsed: 'semantic'
};
}
// FTS5 mode (keyword) or hybrid/auto modes: try FTS first.
@@ -157,7 +168,7 @@ export class HybridSearchService {
// Degenerate cases: no provider or pure FTS5 mode.
if (!this.embeddingProvider || alpha === 0) {
return ftsResults.slice(0, limit);
return { results: ftsResults.slice(0, limit), searchModeUsed: 'keyword' };
}
// For auto/hybrid modes: if FTS yielded results, use them; otherwise try vector.
@@ -168,14 +179,14 @@ export class HybridSearchService {
// No FTS results: try vector search as a fallback in auto/hybrid modes.
if (!query.trim()) {
// Query is empty; no point embedding it.
return [];
return { results: [], searchModeUsed: 'keyword_fallback' };
}
const embeddings = await this.embeddingProvider.embed([query]);
// If provider fails (Noop returns empty array), we're done.
if (embeddings.length === 0) {
return [];
return { results: [], searchModeUsed: 'keyword_fallback' };
}
const queryEmbedding = embeddings[0].values;
@@ -187,7 +198,15 @@ export class HybridSearchService {
});
const topIds = vectorResults.slice(0, limit).map((r) => r.snippetId);
return this.fetchSnippetsByIds(topIds, options.repositoryId, options.type);
return {
results: this.fetchSnippetsByIds(
topIds,
options.repositoryId,
options.versionId,
options.type
),
searchModeUsed: 'keyword_fallback'
};
}
// FTS has results: use RRF to blend with vector search (if alpha < 1).
@@ -195,7 +214,7 @@ export class HybridSearchService {
// Provider may be a Noop (returns empty array) — fall back to FTS gracefully.
if (embeddings.length === 0) {
return ftsResults.slice(0, limit);
return { results: ftsResults.slice(0, limit), searchModeUsed: 'keyword' };
}
const queryEmbedding = embeddings[0].values;
@@ -210,7 +229,15 @@ export class HybridSearchService {
// Pure vector mode: skip RRF and return vector results directly.
if (alpha === 1) {
const topIds = vectorResults.slice(0, limit).map((r) => r.snippetId);
return this.fetchSnippetsByIds(topIds, options.repositoryId, options.type);
return {
results: this.fetchSnippetsByIds(
topIds,
options.repositoryId,
options.versionId,
options.type
),
searchModeUsed: 'semantic'
};
}
// Build ranked lists for RRF. Score field is unused by RRF — only
@@ -221,7 +248,15 @@ export class HybridSearchService {
const fused = reciprocalRankFusion(ftsRanked, vecRanked);
const topIds = fused.slice(0, limit).map((r) => r.id);
return this.fetchSnippetsByIds(topIds, options.repositoryId, options.type);
return {
results: this.fetchSnippetsByIds(
topIds,
options.repositoryId,
options.versionId,
options.type
),
searchModeUsed: 'hybrid'
};
}
// -------------------------------------------------------------------------
@@ -238,13 +273,19 @@ export class HybridSearchService {
private fetchSnippetsByIds(
ids: string[],
repositoryId: string,
versionId?: string,
type?: 'code' | 'info'
): SnippetSearchResult[] {
if (ids.length === 0) return [];
const placeholders = ids.map(() => '?').join(', ');
const params: unknown[] = [...ids, repositoryId];
let versionClause = '';
let typeClause = '';
if (versionId !== undefined) {
versionClause = ' AND s.version_id = ?';
params.push(versionId);
}
if (type !== undefined) {
typeClause = ' AND s.type = ?';
params.push(type);
@@ -261,7 +302,7 @@ export class HybridSearchService {
FROM snippets s
JOIN repositories r ON r.id = s.repository_id
WHERE s.id IN (${placeholders})
AND s.repository_id = ?${typeClause}`
AND s.repository_id = ?${versionClause}${typeClause}`
)
.all(...params) as RawSnippetById[];

View File

@@ -0,0 +1,390 @@
import type Database from 'better-sqlite3';
import {
loadSqliteVec,
quoteSqliteIdentifier,
sqliteVecRowidTableName,
sqliteVecTableName
} from '$lib/server/db/sqlite-vec.js';
export interface SqliteVecQueryOptions {
repositoryId: string;
versionId?: string;
profileId?: string;
limit?: number;
}
export interface SqliteVecQueryResult {
snippetId: string;
score: number;
distance: number;
}
interface ProfileDimensionsRow {
dimensions: number;
}
interface StoredDimensionsRow {
count: number;
min_dimensions: number | null;
max_dimensions: number | null;
}
interface SnippetRowidRow {
rowid: number;
}
interface RawKnnRow {
snippet_id: string;
distance: number;
}
interface CanonicalEmbeddingRow {
snippet_id: string;
embedding: Buffer;
}
interface StoredEmbeddingRef {
profile_id: string;
snippet_id: string;
}
interface ProfileStoreTables {
vectorTableName: string;
rowidTableName: string;
quotedVectorTableName: string;
quotedRowidTableName: string;
dimensions: number;
}
function toEmbeddingBuffer(values: Float32Array): Buffer {
return Buffer.from(values.buffer, values.byteOffset, values.byteLength);
}
function distanceToScore(distance: number): number {
return 1 / (1 + distance);
}
export class SqliteVecStore {
constructor(private readonly db: Database.Database) {}
ensureProfileStore(profileId: string, preferredDimensions?: number): number {
const tables = this.getProfileStoreTables(profileId, preferredDimensions);
this.db.exec(`
CREATE TABLE IF NOT EXISTS ${tables.quotedRowidTableName} (
rowid INTEGER PRIMARY KEY,
snippet_id TEXT NOT NULL UNIQUE REFERENCES snippets(id) ON DELETE CASCADE
);
`);
this.db.exec(`
CREATE VIRTUAL TABLE IF NOT EXISTS ${tables.quotedVectorTableName}
USING vec0(embedding float[${tables.dimensions}]);
`);
return tables.dimensions;
}
upsertEmbedding(profileId: string, snippetId: string, embedding: Float32Array): void {
const tables = this.getProfileStoreTables(profileId, embedding.length);
this.ensureProfileStore(profileId, tables.dimensions);
const existingRow = this.db
.prepare<
[string],
SnippetRowidRow
>(`SELECT rowid FROM ${tables.quotedRowidTableName} WHERE snippet_id = ?`)
.get(snippetId);
const embeddingBuffer = toEmbeddingBuffer(embedding);
if (existingRow) {
this.db
.prepare<
[Buffer, number]
>(`UPDATE ${tables.quotedVectorTableName} SET embedding = ? WHERE rowid = ?`)
.run(embeddingBuffer, existingRow.rowid);
return;
}
const insertResult = this.db
.prepare<[Buffer]>(`INSERT INTO ${tables.quotedVectorTableName} (embedding) VALUES (?)`)
.run(embeddingBuffer);
this.db
.prepare<
[number, string]
>(`INSERT INTO ${tables.quotedRowidTableName} (rowid, snippet_id) VALUES (?, ?)`)
.run(Number(insertResult.lastInsertRowid), snippetId);
}
upsertEmbeddingBuffer(
profileId: string,
snippetId: string,
embedding: Buffer,
dimensions?: number
): void {
const vector = new Float32Array(
embedding.buffer,
embedding.byteOffset,
dimensions ?? Math.floor(embedding.byteLength / Float32Array.BYTES_PER_ELEMENT)
);
this.upsertEmbedding(profileId, snippetId, vector);
}
deleteEmbedding(profileId: string, snippetId: string): void {
const tables = this.getProfileStoreTables(profileId);
this.ensureProfileStore(profileId);
const existingRow = this.db
.prepare<
[string],
SnippetRowidRow
>(`SELECT rowid FROM ${tables.quotedRowidTableName} WHERE snippet_id = ?`)
.get(snippetId);
if (!existingRow) {
return;
}
this.db
.prepare<[number]>(`DELETE FROM ${tables.quotedVectorTableName} WHERE rowid = ?`)
.run(existingRow.rowid);
this.db
.prepare<[string]>(`DELETE FROM ${tables.quotedRowidTableName} WHERE snippet_id = ?`)
.run(snippetId);
}
deleteEmbeddingsForDocumentIds(documentIds: string[]): void {
if (documentIds.length === 0) {
return;
}
const placeholders = documentIds.map(() => '?').join(', ');
const rows = this.db
.prepare<unknown[], StoredEmbeddingRef>(
`SELECT DISTINCT se.profile_id, se.snippet_id
FROM snippet_embeddings se
INNER JOIN snippets s ON s.id = se.snippet_id
WHERE s.document_id IN (${placeholders})`
)
.all(...documentIds);
this.deleteEmbeddingRefs(rows);
}
deleteEmbeddingsForRepository(repositoryId: string): void {
const rows = this.db
.prepare<[string], StoredEmbeddingRef>(
`SELECT DISTINCT se.profile_id, se.snippet_id
FROM snippet_embeddings se
INNER JOIN snippets s ON s.id = se.snippet_id
WHERE s.repository_id = ?`
)
.all(repositoryId);
this.deleteEmbeddingRefs(rows);
}
deleteEmbeddingsForVersion(repositoryId: string, versionId: string): void {
const rows = this.db
.prepare<[string, string], StoredEmbeddingRef>(
`SELECT DISTINCT se.profile_id, se.snippet_id
FROM snippet_embeddings se
INNER JOIN snippets s ON s.id = se.snippet_id
WHERE s.repository_id = ? AND s.version_id = ?`
)
.all(repositoryId, versionId);
this.deleteEmbeddingRefs(rows);
}
queryNearestNeighbors(
queryEmbedding: Float32Array,
options: SqliteVecQueryOptions
): SqliteVecQueryResult[] {
const { repositoryId, versionId, profileId = 'local-default', limit = 50 } = options;
if (limit <= 0) {
return [];
}
const tables = this.getProfileStoreTables(profileId, queryEmbedding.length);
this.ensureProfileStore(profileId, tables.dimensions);
const totalRows = this.synchronizeProfileStore(profileId, tables);
if (totalRows === 0) {
return [];
}
let sql = `
SELECT rowids.snippet_id, vec.distance
FROM ${tables.quotedVectorTableName} vec
JOIN ${tables.quotedRowidTableName} rowids ON rowids.rowid = vec.rowid
JOIN snippets s ON s.id = rowids.snippet_id
WHERE vec.embedding MATCH ?
AND vec.k = ?
AND s.repository_id = ?
`;
const params: unknown[] = [toEmbeddingBuffer(queryEmbedding), totalRows, repositoryId];
if (versionId !== undefined) {
sql += ' AND s.version_id = ?';
params.push(versionId);
}
sql += ' ORDER BY vec.distance ASC LIMIT ?';
params.push(limit);
const rows = this.db.prepare<unknown[], RawKnnRow>(sql).all(...params);
return rows.map((row) => ({
snippetId: row.snippet_id,
score: distanceToScore(row.distance),
distance: row.distance
}));
}
private synchronizeProfileStore(profileId: string, tables: ProfileStoreTables): number {
this.db
.prepare<[string, number]>(
`DELETE FROM ${tables.quotedRowidTableName}
WHERE rowid IN (
SELECT rowids.rowid
FROM ${tables.quotedRowidTableName} rowids
LEFT JOIN snippet_embeddings se
ON se.snippet_id = rowids.snippet_id
AND se.profile_id = ?
AND se.dimensions = ?
LEFT JOIN ${tables.quotedVectorTableName} vec ON vec.rowid = rowids.rowid
WHERE se.snippet_id IS NULL OR vec.rowid IS NULL
)`
)
.run(profileId, tables.dimensions);
this.db
.prepare(
`DELETE FROM ${tables.quotedVectorTableName}
WHERE rowid NOT IN (SELECT rowid FROM ${tables.quotedRowidTableName})`
)
.run();
const missingRows = this.db
.prepare<[string, number], CanonicalEmbeddingRow>(
`SELECT se.snippet_id, se.embedding
FROM snippet_embeddings se
LEFT JOIN ${tables.quotedRowidTableName} rowids ON rowids.snippet_id = se.snippet_id
WHERE se.profile_id = ?
AND se.dimensions = ?
AND rowids.snippet_id IS NULL`
)
.all(profileId, tables.dimensions);
if (missingRows.length > 0) {
const backfill = this.db.transaction((rows: CanonicalEmbeddingRow[]) => {
for (const row of rows) {
this.upsertEmbedding(
profileId,
row.snippet_id,
new Float32Array(row.embedding.buffer, row.embedding.byteOffset, tables.dimensions)
);
}
});
backfill(missingRows);
}
return (
this.db
.prepare<[], { count: number }>(
`SELECT COUNT(*) AS count
FROM ${tables.quotedVectorTableName} vec
JOIN ${tables.quotedRowidTableName} rowids ON rowids.rowid = vec.rowid`
)
.get()?.count ?? 0
);
}
private deleteEmbeddingRefs(rows: StoredEmbeddingRef[]): void {
if (rows.length === 0) {
return;
}
const removeRows = this.db.transaction((refs: StoredEmbeddingRef[]) => {
for (const ref of refs) {
this.deleteEmbedding(ref.profile_id, ref.snippet_id);
}
});
removeRows(rows);
}
private getProfileStoreTables(
profileId: string,
preferredDimensions?: number
): ProfileStoreTables {
loadSqliteVec(this.db);
const dimensionsRow = this.db
.prepare<
[string],
ProfileDimensionsRow
>('SELECT dimensions FROM embedding_profiles WHERE id = ?')
.get(profileId);
if (!dimensionsRow) {
throw new Error(`Embedding profile not found: ${profileId}`);
}
const storedDimensions = this.db
.prepare<[string], StoredDimensionsRow>(
`SELECT
COUNT(*) AS count,
MIN(dimensions) AS min_dimensions,
MAX(dimensions) AS max_dimensions
FROM snippet_embeddings
WHERE profile_id = ?`
)
.get(profileId);
const effectiveDimensions = this.resolveDimensions(
profileId,
dimensionsRow.dimensions,
storedDimensions,
preferredDimensions
);
const vectorTableName = sqliteVecTableName(profileId);
const rowidTableName = sqliteVecRowidTableName(profileId);
return {
vectorTableName,
rowidTableName,
quotedVectorTableName: quoteSqliteIdentifier(vectorTableName),
quotedRowidTableName: quoteSqliteIdentifier(rowidTableName),
dimensions: effectiveDimensions
};
}
private resolveDimensions(
profileId: string,
profileDimensions: number,
storedDimensions: StoredDimensionsRow | undefined,
preferredDimensions?: number
): number {
if (storedDimensions && storedDimensions.count > 0) {
if (storedDimensions.min_dimensions !== storedDimensions.max_dimensions) {
throw new Error(`Stored embedding dimensions are inconsistent for profile ${profileId}`);
}
const canonicalDimensions = storedDimensions.min_dimensions;
if (canonicalDimensions === null) {
throw new Error(`Stored embedding dimensions are missing for profile ${profileId}`);
}
if (preferredDimensions !== undefined && preferredDimensions !== canonicalDimensions) {
throw new Error(
`Embedding dimension mismatch for profile ${profileId}: expected ${canonicalDimensions}, received ${preferredDimensions}`
);
}
return canonicalDimensions;
}
return preferredDimensions ?? profileDimensions;
}
}

View File

@@ -1,16 +1,12 @@
/**
* Vector similarity search over stored snippet embeddings.
*
* SQLite does not natively support vector operations, so cosine similarity is
* computed in JavaScript after loading candidate embeddings from the
* snippet_embeddings table.
*
* Performance note: For repositories with > 50k snippets, pre-filtering by
* FTS5 candidates before computing cosine similarity is recommended. For v1,
* in-memory computation is acceptable.
* Uses sqlite-vec vector_top_k() for ANN search instead of in-memory cosine
* similarity computation over all embeddings.
*/
import type Database from 'better-sqlite3';
import { SqliteVecStore } from './sqlite-vec.store.js';
// ---------------------------------------------------------------------------
// Types
@@ -28,12 +24,6 @@ export interface VectorSearchOptions {
limit?: number;
}
/** Raw DB row from snippet_embeddings joined with snippets. */
interface RawEmbeddingRow {
snippet_id: string;
embedding: Buffer;
}
// ---------------------------------------------------------------------------
// Math helpers
// ---------------------------------------------------------------------------
@@ -69,46 +59,26 @@ export function cosineSimilarity(a: Float32Array, b: Float32Array): number {
// ---------------------------------------------------------------------------
export class VectorSearch {
constructor(private readonly db: Database.Database) {}
private readonly sqliteVecStore: SqliteVecStore;
constructor(private readonly db: Database.Database) {
this.sqliteVecStore = new SqliteVecStore(db);
}
/**
* Search stored embeddings by cosine similarity to the query embedding.
*
* Uses in-memory cosine similarity computation. The vec_embedding column
* stores raw Float32 bytes for forward compatibility with vector-capable
* libSQL builds; scoring is performed in JS using the same bytes.
*
* @param queryEmbedding - The embedded representation of the search query.
* @param options - Search options including repositoryId, optional versionId, profileId, and limit.
* @returns Results sorted by descending cosine similarity score.
*/
vectorSearch(queryEmbedding: Float32Array, options: VectorSearchOptions): VectorSearchResult[] {
const { repositoryId, versionId, profileId = 'local-default', limit = 50 } = options;
let sql = `
SELECT se.snippet_id, se.embedding
FROM snippet_embeddings se
JOIN snippets s ON s.id = se.snippet_id
WHERE s.repository_id = ?
AND se.profile_id = ?
`;
const params: unknown[] = [repositoryId, profileId];
if (versionId) {
sql += ' AND s.version_id = ?';
params.push(versionId);
}
const rows = this.db.prepare<unknown[], RawEmbeddingRow>(sql).all(...params);
const scored: VectorSearchResult[] = rows.map((row) => {
const embedding = new Float32Array(
row.embedding.buffer,
row.embedding.byteOffset,
row.embedding.byteLength / 4
);
return {
snippetId: row.snippet_id,
score: cosineSimilarity(queryEmbedding, embedding)
};
});
return scored.sort((a, b) => b.score - a.score).slice(0, limit);
return this.sqliteVecStore
.queryNearestNeighbors(queryEmbedding, options)
.map((result) => ({ snippetId: result.snippetId, score: result.score }));
}
}

View File

@@ -1,6 +1,9 @@
import type Database from 'better-sqlite3';
import type { EmbeddingSettingsUpdateDto } from '$lib/dtos/embedding-settings.js';
import { createProviderFromProfile, getDefaultLocalProfile } from '$lib/server/embeddings/registry.js';
import {
createProviderFromProfile,
getDefaultLocalProfile
} from '$lib/server/embeddings/registry.js';
import { EmbeddingProfileMapper } from '$lib/server/mappers/embedding-profile.mapper.js';
import { EmbeddingProfile, EmbeddingProfileEntity } from '$lib/server/models/embedding-profile.js';
import { EmbeddingSettings } from '$lib/server/models/embedding-settings.js';
@@ -94,7 +97,10 @@ export class EmbeddingSettingsService {
private getCreatedAt(id: string, fallback: number): number {
return (
this.db
.prepare<[string], { created_at: number }>('SELECT created_at FROM embedding_profiles WHERE id = ?')
.prepare<
[string],
{ created_at: number }
>('SELECT created_at FROM embedding_profiles WHERE id = ?')
.get(id)?.created_at ?? fallback
);
}

View File

@@ -11,6 +11,12 @@ import Database from 'better-sqlite3';
import { readFileSync } from 'node:fs';
import { join } from 'node:path';
import { RepositoryService } from './repository.service';
import {
loadSqliteVec,
sqliteVecRowidTableName,
sqliteVecTableName
} from '$lib/server/db/sqlite-vec.js';
import { SqliteVecStore } from '$lib/server/search/sqlite-vec.store.js';
import {
AlreadyExistsError,
InvalidInputError,
@@ -25,13 +31,18 @@ import {
function createTestDb(): Database.Database {
const client = new Database(':memory:');
client.pragma('foreign_keys = ON');
loadSqliteVec(client);
const migrationsFolder = join(import.meta.dirname, '../db/migrations');
for (const migration of [
'0000_large_master_chief.sql',
'0001_quick_nighthawk.sql',
'0002_silky_stellaris.sql'
'0002_silky_stellaris.sql',
'0003_multiversion_config.sql',
'0004_complete_sentry.sql',
'0005_fix_stage_defaults.sql',
'0006_yielding_centennial.sql'
]) {
const statements = readFileSync(join(migrationsFolder, migration), 'utf-8')
.split('--> statement-breakpoint')
@@ -329,6 +340,41 @@ describe('RepositoryService.remove()', () => {
it('throws NotFoundError when the repository does not exist', () => {
expect(() => service.remove('/not/found')).toThrow(NotFoundError);
});
it('removes derived vec rows before the repository cascade deletes snippets', () => {
const docId = crypto.randomUUID();
const snippetId = crypto.randomUUID();
const embedding = Float32Array.from([1, 0, 0]);
const vecStore = new SqliteVecStore((service as unknown as { db: Database.Database }).db);
const db = (service as unknown as { db: Database.Database }).db;
const now = Math.floor(Date.now() / 1000);
db.prepare(
`INSERT INTO documents (id, repository_id, version_id, file_path, checksum, indexed_at)
VALUES (?, '/facebook/react', NULL, 'README.md', 'repo-doc', ?)`
).run(docId, now);
db.prepare(
`INSERT INTO snippets (id, document_id, repository_id, version_id, type, content, created_at)
VALUES (?, ?, '/facebook/react', NULL, 'info', 'repo snippet', ?)`
).run(snippetId, docId, now);
db.prepare(
`INSERT INTO snippet_embeddings (snippet_id, profile_id, model, dimensions, embedding, created_at)
VALUES (?, 'local-default', 'test-model', 3, ?, ?)`
).run(snippetId, Buffer.from(embedding.buffer), now);
vecStore.upsertEmbedding('local-default', snippetId, embedding);
service.remove('/facebook/react');
const vecTable = sqliteVecTableName('local-default');
const rowidTable = sqliteVecRowidTableName('local-default');
const vecCount = db.prepare(`SELECT COUNT(*) as n FROM "${vecTable}"`).get() as { n: number };
const rowidCount = db.prepare(`SELECT COUNT(*) as n FROM "${rowidTable}"`).get() as {
n: number;
};
expect(vecCount.n).toBe(0);
expect(rowidCount.n).toBe(0);
});
});
// ---------------------------------------------------------------------------
@@ -423,7 +469,11 @@ describe('RepositoryService.getIndexSummary()', () => {
beforeEach(() => {
client = createTestDb();
service = makeService(client);
service.add({ source: 'github', sourceUrl: 'https://github.com/facebook/react', branch: 'main' });
service.add({
source: 'github',
sourceUrl: 'https://github.com/facebook/react',
branch: 'main'
});
});
it('returns embedding counts and indexed version labels', () => {

View File

@@ -8,6 +8,7 @@ import { RepositoryMapper } from '$lib/server/mappers/repository.mapper.js';
import { IndexingJobMapper } from '$lib/server/mappers/indexing-job.mapper.js';
import { Repository, RepositoryEntity } from '$lib/server/models/repository.js';
import { IndexingJob, IndexingJobEntity } from '$lib/server/models/indexing-job.js';
import { SqliteVecStore } from '$lib/server/search/sqlite-vec.store.js';
import { resolveGitHubId, resolveLocalId } from '$lib/server/utils/id-resolver';
import {
AlreadyExistsError,
@@ -230,7 +231,11 @@ export class RepositoryService {
const existing = this.get(id);
if (!existing) throw new NotFoundError(`Repository ${id} not found`);
const sqliteVecStore = new SqliteVecStore(this.db);
this.db.transaction(() => {
sqliteVecStore.deleteEmbeddingsForRepository(id);
this.db.prepare(`DELETE FROM repositories WHERE id = ?`).run(id);
})();
}
/**
@@ -342,6 +347,8 @@ export class RepositoryService {
progress: 0,
totalFiles: 0,
processedFiles: 0,
stage: 'queued',
stageDetail: null,
error: null,
startedAt: null,
completedAt: null,
@@ -355,6 +362,8 @@ export class RepositoryService {
progress: job.progress,
total_files: job.totalFiles,
processed_files: job.processedFiles,
stage: 'queued',
stage_detail: null,
error: job.error,
started_at: null,
completed_at: null,

View File

@@ -10,6 +10,12 @@ import { describe, it, expect } from 'vitest';
import Database from 'better-sqlite3';
import { readFileSync } from 'node:fs';
import { join } from 'node:path';
import {
loadSqliteVec,
sqliteVecRowidTableName,
sqliteVecTableName
} from '$lib/server/db/sqlite-vec.js';
import { SqliteVecStore } from '$lib/server/search/sqlite-vec.store.js';
import { VersionService } from './version.service';
import { RepositoryService } from './repository.service';
import { AlreadyExistsError, NotFoundError } from '$lib/server/utils/validation';
@@ -21,31 +27,27 @@ import { AlreadyExistsError, NotFoundError } from '$lib/server/utils/validation'
function createTestDb(): Database.Database {
const client = new Database(':memory:');
client.pragma('foreign_keys = ON');
loadSqliteVec(client);
const migrationsFolder = join(import.meta.dirname, '../db/migrations');
// Apply all migration files in order
const migration0 = readFileSync(join(migrationsFolder, '0000_large_master_chief.sql'), 'utf-8');
const migration1 = readFileSync(join(migrationsFolder, '0001_quick_nighthawk.sql'), 'utf-8');
// Apply first migration
const statements0 = migration0
for (const migration of [
'0000_large_master_chief.sql',
'0001_quick_nighthawk.sql',
'0002_silky_stellaris.sql',
'0003_multiversion_config.sql',
'0004_complete_sentry.sql',
'0005_fix_stage_defaults.sql',
'0006_yielding_centennial.sql'
]) {
const statements = readFileSync(join(migrationsFolder, migration), 'utf-8')
.split('--> statement-breakpoint')
.map((s) => s.trim())
.map((statement) => statement.trim())
.filter(Boolean);
for (const stmt of statements0) {
client.exec(stmt);
for (const statement of statements) {
client.exec(statement);
}
// Apply second migration
const statements1 = migration1
.split('--> statement-breakpoint')
.map((s) => s.trim())
.filter(Boolean);
for (const stmt of statements1) {
client.exec(stmt);
}
return client;
@@ -198,6 +200,50 @@ describe('VersionService.remove()', () => {
const doc = client.prepare(`SELECT id FROM documents WHERE id = ?`).get(docId);
expect(doc).toBeUndefined();
});
it('removes derived vec rows before deleting the version', () => {
const { client, versionService } = setup();
const version = versionService.add('/facebook/react', 'v18.3.0');
const docId = crypto.randomUUID();
const snippetId = crypto.randomUUID();
const embedding = Float32Array.from([0.5, 0.25, 0.125]);
const now = Math.floor(Date.now() / 1000);
const vecStore = new SqliteVecStore(client);
client
.prepare(
`INSERT INTO documents (id, repository_id, version_id, file_path, checksum, indexed_at)
VALUES (?, '/facebook/react', ?, 'README.md', 'version-doc', ?)`
)
.run(docId, version.id, now);
client
.prepare(
`INSERT INTO snippets (id, document_id, repository_id, version_id, type, content, created_at)
VALUES (?, ?, '/facebook/react', ?, 'info', 'version snippet', ?)`
)
.run(snippetId, docId, version.id, now);
client
.prepare(
`INSERT INTO snippet_embeddings (snippet_id, profile_id, model, dimensions, embedding, created_at)
VALUES (?, 'local-default', 'test-model', 3, ?, ?)`
)
.run(snippetId, Buffer.from(embedding.buffer), now);
vecStore.upsertEmbedding('local-default', snippetId, embedding);
versionService.remove('/facebook/react', 'v18.3.0');
const vecTable = sqliteVecTableName('local-default');
const rowidTable = sqliteVecRowidTableName('local-default');
const vecCount = client.prepare(`SELECT COUNT(*) as n FROM "${vecTable}"`).get() as {
n: number;
};
const rowidCount = client.prepare(`SELECT COUNT(*) as n FROM "${rowidTable}"`).get() as {
n: number;
};
expect(vecCount.n).toBe(0);
expect(rowidCount.n).toBe(0);
});
});
// ---------------------------------------------------------------------------

View File

@@ -11,6 +11,7 @@ import {
RepositoryVersion,
RepositoryVersionEntity
} from '$lib/server/models/repository-version.js';
import { SqliteVecStore } from '$lib/server/search/sqlite-vec.store.js';
import { AlreadyExistsError, NotFoundError } from '$lib/server/utils/validation';
import { resolveTagToCommit, discoverVersionTags } from '$lib/server/utils/git.js';
@@ -99,9 +100,13 @@ export class VersionService {
throw new NotFoundError(`Version ${tag} not found for repository ${repositoryId}`);
}
const sqliteVecStore = new SqliteVecStore(this.db);
this.db.transaction(() => {
sqliteVecStore.deleteEmbeddingsForVersion(repositoryId, version.id);
this.db
.prepare(`DELETE FROM repository_versions WHERE repository_id = ? AND tag = ?`)
.run(repositoryId, tag);
})();
}
/**

View File

@@ -0,0 +1,155 @@
/**
* Tests for getChangedFilesBetweenRefs (TRUEREF-0021).
*
* Uses vi.mock to intercept execFileSync so no real git process is spawned.
*/
import { describe, it, expect, vi, beforeEach } from 'vitest';
// ---------------------------------------------------------------------------
// Mock node:child_process before importing the module under test.
// ---------------------------------------------------------------------------
vi.mock('node:child_process', () => ({
execSync: vi.fn(),
execFileSync: vi.fn()
}));
import { execFileSync } from 'node:child_process';
import { getChangedFilesBetweenRefs } from '$lib/server/utils/git.js';
const mockExecFileSync = vi.mocked(execFileSync);
const BASE_OPTS = { repoPath: '/tmp/fake-repo', base: 'v1.0.0', head: 'v2.0.0' };
beforeEach(() => {
mockExecFileSync.mockReset();
});
// ---------------------------------------------------------------------------
// Status code parsing
// ---------------------------------------------------------------------------
describe('getChangedFilesBetweenRefs', () => {
it("parses an 'A' line as status 'added'", () => {
mockExecFileSync.mockReturnValue('A\tsrc/new-file.ts');
const result = getChangedFilesBetweenRefs(BASE_OPTS);
expect(result).toHaveLength(1);
expect(result[0]).toEqual({ path: 'src/new-file.ts', status: 'added' });
});
it("parses an 'M' line as status 'modified'", () => {
mockExecFileSync.mockReturnValue('M\tsrc/existing.ts');
const result = getChangedFilesBetweenRefs(BASE_OPTS);
expect(result).toHaveLength(1);
expect(result[0]).toEqual({ path: 'src/existing.ts', status: 'modified' });
});
it("parses a 'D' line as status 'removed'", () => {
mockExecFileSync.mockReturnValue('D\tsrc/deleted.ts');
const result = getChangedFilesBetweenRefs(BASE_OPTS);
expect(result).toHaveLength(1);
expect(result[0]).toEqual({ path: 'src/deleted.ts', status: 'removed' });
});
it("parses an 'R85' line as status 'renamed' with previousPath", () => {
mockExecFileSync.mockReturnValue('R85\tsrc/old-name.ts\tsrc/new-name.ts');
const result = getChangedFilesBetweenRefs(BASE_OPTS);
expect(result).toHaveLength(1);
expect(result[0]).toEqual({
path: 'src/new-name.ts',
status: 'renamed',
previousPath: 'src/old-name.ts'
});
});
it('returns an empty array for empty output', () => {
mockExecFileSync.mockReturnValue('');
const result = getChangedFilesBetweenRefs(BASE_OPTS);
expect(result).toHaveLength(0);
expect(result).toEqual([]);
});
it('parses multiple lines correctly', () => {
mockExecFileSync.mockReturnValue(
['A\tadded.ts', 'M\tmodified.ts', 'D\tdeleted.ts', 'R100\told.ts\tnew.ts'].join('\n')
);
const result = getChangedFilesBetweenRefs(BASE_OPTS);
expect(result).toHaveLength(4);
expect(result[0]).toMatchObject({ path: 'added.ts', status: 'added' });
expect(result[1]).toMatchObject({ path: 'modified.ts', status: 'modified' });
expect(result[2]).toMatchObject({ path: 'deleted.ts', status: 'removed' });
expect(result[3]).toMatchObject({ path: 'new.ts', status: 'renamed', previousPath: 'old.ts' });
});
// -------------------------------------------------------------------------
// Error handling
// -------------------------------------------------------------------------
it('throws a descriptive error when execFileSync throws', () => {
mockExecFileSync.mockImplementation(() => {
throw new Error('fatal: not a git repository');
});
expect(() => getChangedFilesBetweenRefs(BASE_OPTS)).toThrowError(
/Failed to get changed files between 'v1\.0\.0' and 'v2\.0\.0' in \/tmp\/fake-repo/
);
});
// -------------------------------------------------------------------------
// Shell-injection safety: first arg must be 'git', flags as array elements
// -------------------------------------------------------------------------
it('calls execFileSync with "git" as the executable (no shell)', () => {
mockExecFileSync.mockReturnValue('');
getChangedFilesBetweenRefs(BASE_OPTS);
const [executable, args] = mockExecFileSync.mock.calls[0] as [string, string[]];
expect(executable).toBe('git');
// Each flag must be a separate element — no shell concatenation
expect(Array.isArray(args)).toBe(true);
expect(args).toContain('diff');
expect(args).toContain('--name-status');
// Base and head are separate args, not joined with a shell metacharacter
expect(args).toContain('v1.0.0');
expect(args).toContain('v2.0.0');
});
it('passes the repoPath via -C flag as a separate array element', () => {
mockExecFileSync.mockReturnValue('');
getChangedFilesBetweenRefs(BASE_OPTS);
const [, args] = mockExecFileSync.mock.calls[0] as [string, string[]];
const cIdx = args.indexOf('-C');
expect(cIdx).not.toBe(-1);
expect(args[cIdx + 1]).toBe('/tmp/fake-repo');
});
// -------------------------------------------------------------------------
// Unknown status codes are silently skipped
// -------------------------------------------------------------------------
it('silently skips lines with unknown status codes', () => {
// 'X' is not a known status
mockExecFileSync.mockReturnValue('X\tunknown.ts\nM\tknown.ts');
const result = getChangedFilesBetweenRefs(BASE_OPTS);
expect(result).toHaveLength(1);
expect(result[0]).toMatchObject({ path: 'known.ts', status: 'modified' });
});
});

View File

@@ -7,9 +7,10 @@
* - File extraction via `git archive` to temp directories
*/
import { execSync } from 'node:child_process';
import { execSync, execFileSync } from 'node:child_process';
import { mkdirSync, rmSync } from 'node:fs';
import { join } from 'node:path';
import type { ChangedFile } from '../crawler/types.js';
export interface ResolveTagOptions {
repoPath: string;
@@ -158,3 +159,55 @@ export function cleanupTempExtraction(extractPath: string): void {
);
}
}
export interface LocalChangedFileOptions {
repoPath: string;
base: string;
head: string;
}
/**
* Get the list of files that differ between two git refs (tags, branches, commits).
*
* Uses `git diff --name-status` which produces tab-separated lines in formats:
* M\tpath
* A\tpath
* D\tpath
* R85\told-path\tnew-path
*
* @returns Array of ChangedFile objects
* @throws Error when git command fails
*/
export function getChangedFilesBetweenRefs(options: LocalChangedFileOptions): ChangedFile[] {
const { repoPath, base, head } = options;
try {
const output = execFileSync('git', ['-C', repoPath, 'diff', '--name-status', base, head], {
encoding: 'utf-8',
stdio: ['ignore', 'pipe', 'pipe']
}).trim();
if (!output) return [];
const results: ChangedFile[] = [];
for (const line of output.split('\n')) {
if (!line) continue;
const parts = line.split('\t');
const statusCode = parts[0];
if (statusCode === 'A') {
results.push({ path: parts[1], status: 'added' });
} else if (statusCode === 'M') {
results.push({ path: parts[1], status: 'modified' });
} else if (statusCode === 'D') {
results.push({ path: parts[1], status: 'removed' });
} else if (statusCode.startsWith('R')) {
results.push({ path: parts[2], status: 'renamed', previousPath: parts[1] });
}
}
return results;
} catch (error) {
throw new Error(
`Failed to get changed files between '${base}' and '${head}' in ${repoPath}: ${error instanceof Error ? error.message : String(error)}`
);
}
}

View File

@@ -0,0 +1,118 @@
/**
* Unit tests for tag-order utilities (TRUEREF-0021).
*/
import { describe, it, expect } from 'vitest';
import { findBestAncestorVersion } from './tag-order.js';
import { RepositoryVersion } from '$lib/server/models/repository-version.js';
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
function makeVersion(
tag: string,
state: RepositoryVersion['state'] = 'indexed'
): RepositoryVersion {
return new RepositoryVersion({
id: `/facebook/react/${tag}`,
repositoryId: '/facebook/react',
tag,
title: null,
commitHash: null,
state,
totalSnippets: 0,
indexedAt: new Date(),
createdAt: new Date()
});
}
// ---------------------------------------------------------------------------
// findBestAncestorVersion
// ---------------------------------------------------------------------------
describe('findBestAncestorVersion', () => {
it('returns null when candidates array is empty', () => {
expect(findBestAncestorVersion('v2.1.0', [])).toBeNull();
});
it('returns null when no candidates have state === indexed', () => {
const candidates = [
makeVersion('v1.0.0', 'pending'),
makeVersion('v1.1.0', 'indexing'),
makeVersion('v2.0.0', 'error')
];
expect(findBestAncestorVersion('v2.1.0', candidates)).toBeNull();
});
it('returns the nearest semver predecessor from a list', () => {
const candidates = [makeVersion('v1.0.0'), makeVersion('v1.1.0'), makeVersion('v2.0.0')];
const result = findBestAncestorVersion('v2.1.0', candidates);
expect(result?.tag).toBe('v2.0.0');
});
it('handles v-prefix stripping correctly', () => {
const candidates = [makeVersion('v1.0.0'), makeVersion('v1.5.0'), makeVersion('v2.0.0')];
const result = findBestAncestorVersion('v2.0.1', candidates);
expect(result?.tag).toBe('v2.0.0');
});
it('returns null when all candidates are after current tag', () => {
const candidates = [makeVersion('v2.0.0')];
expect(findBestAncestorVersion('v1.0.0', candidates)).toBeNull();
});
it('returns null when all candidates equal the current tag', () => {
const candidates = [makeVersion('v1.0.0'), makeVersion('v2.0.0')];
expect(findBestAncestorVersion('v1.0.0', candidates)).toBeNull();
});
it('handles tag lists without semver format using lexicographic fallback', () => {
const candidates = [
makeVersion('release-alpha'),
makeVersion('release-beta'),
makeVersion('release-gamma')
];
const result = findBestAncestorVersion('release-zeta', candidates);
expect(result).not.toBeNull();
// Lexicographic: all are "less than" release-zeta, so the max is release-gamma
expect(result?.tag).toBe('release-gamma');
});
it('returns single candidate that is older than current tag', () => {
const candidates = [makeVersion('v1.0.0')];
const result = findBestAncestorVersion('v2.0.0', candidates);
expect(result?.tag).toBe('v1.0.0');
});
it('ignores non-indexed versions even when they are valid predecessors', () => {
const candidates = [
makeVersion('v1.0.0', 'indexed'),
makeVersion('v1.5.0', 'pending'),
makeVersion('v1.8.0', 'error')
];
const result = findBestAncestorVersion('v2.0.0', candidates);
expect(result?.tag).toBe('v1.0.0');
});
it('correctly handles pre-release versions (pre-release < release)', () => {
const candidates = [
makeVersion('v2.0.0-alpha'),
makeVersion('v2.0.0-beta'),
makeVersion('v1.9.0')
];
// v2.0.0 is target; pre-releases are stricter: v2.0.0-alpha < v2.0.0
const result = findBestAncestorVersion('v2.0.0', candidates);
expect(result?.tag).toBe('v2.0.0-beta');
});
it('selects closest minor version as predecessor', () => {
const candidates = [
makeVersion('v1.0.0'),
makeVersion('v1.1.0'),
makeVersion('v1.2.0'),
makeVersion('v1.3.0')
];
const result = findBestAncestorVersion('v1.4.0', candidates);
expect(result?.tag).toBe('v1.3.0');
});
});

View File

@@ -0,0 +1,88 @@
/**
* Tag ordering and ancestor selection for differential indexing (TRUEREF-0021).
*/
import type { RepositoryVersion } from '$lib/server/models/repository-version.js';
interface ParsedVersion {
major: number;
minor: number;
patch: number;
prerelease: string[];
}
function parseVersion(tag: string): ParsedVersion | null {
const stripped = tag.startsWith('v') ? tag.slice(1) : tag;
const dashIndex = stripped.indexOf('-');
const versionPart = dashIndex === -1 ? stripped : stripped.slice(0, dashIndex);
const prereleaseStr = dashIndex === -1 ? '' : stripped.slice(dashIndex + 1);
const segments = versionPart.split('.');
if (segments.length < 1 || segments.some((s) => !/^\d+$/.test(s))) return null;
const [majorStr, minorStr = '0', patchStr = '0'] = segments;
const major = Number(majorStr);
const minor = Number(minorStr);
const patch = Number(patchStr);
const prerelease = prereleaseStr ? prereleaseStr.split('.') : [];
return { major, minor, patch, prerelease };
}
/**
* Compare two version tags. Returns negative if a < b, positive if a > b, 0 if equal.
*/
function compareTagVersions(tagA: string, tagB: string): number {
const a = parseVersion(tagA);
const b = parseVersion(tagB);
if (!a || !b) {
// Fall back to lexicographic comparison when semver parsing fails
return tagA.localeCompare(tagB);
}
if (a.major !== b.major) return a.major - b.major;
if (a.minor !== b.minor) return a.minor - b.minor;
if (a.patch !== b.patch) return a.patch - b.patch;
// Pre-release versions have lower precedence than the release version
if (a.prerelease.length === 0 && b.prerelease.length > 0) return 1;
if (a.prerelease.length > 0 && b.prerelease.length === 0) return -1;
// Compare pre-release segments lexicographically
const len = Math.max(a.prerelease.length, b.prerelease.length);
for (let i = 0; i < len; i++) {
const pa = a.prerelease[i] ?? '';
const pb = b.prerelease[i] ?? '';
if (pa !== pb) return pa.localeCompare(pb);
}
return 0;
}
/**
* Find the best ancestor version for differential indexing.
*
* Selects the most-recent indexed version whose tag sorts before `currentTag`
* using semver comparison. Falls back to lexicographic comparison when semver
* parsing fails. Falls back to creation timestamp order as last resort.
*
* @param currentTag The tag being indexed
* @param candidates All versioned snapshots for this repository
* @returns The best indexed ancestor, or null if none qualifies
*/
export function findBestAncestorVersion(
currentTag: string,
candidates: RepositoryVersion[]
): RepositoryVersion | null {
const indexed = candidates.filter((v) => v.state === 'indexed');
const predecessors = indexed.filter((v) => compareTagVersions(v.tag, currentTag) < 0);
if (predecessors.length === 0) return null;
// Return the one with the highest version (closest predecessor)
return predecessors.reduce((best, candidate) =>
compareTagVersions(candidate.tag, best.tag) > 0 ? candidate : best
);
}

View File

@@ -31,6 +31,16 @@ export type RepositorySource = 'github' | 'local';
export type RepositoryState = 'pending' | 'indexing' | 'indexed' | 'error';
export type SnippetType = 'code' | 'info';
export type JobStatus = 'queued' | 'running' | 'done' | 'failed';
export type IndexingStage =
| 'queued'
| 'differential'
| 'crawling'
| 'cloning'
| 'parsing'
| 'storing'
| 'embedding'
| 'done'
| 'failed';
export type VersionState = 'pending' | 'indexing' | 'indexed' | 'error';
export type EmbeddingProviderKind = 'local-transformers' | 'openai-compatible';

View File

@@ -38,6 +38,9 @@
<a href={resolveRoute('/search')} class="text-sm text-gray-600 hover:text-gray-900">
Search
</a>
<a href={resolveRoute('/admin/jobs')} class="text-sm text-gray-600 hover:text-gray-900">
Admin
</a>
<a href={resolveRoute('/settings')} class="text-sm text-gray-600 hover:text-gray-900">
Settings
</a>

View File

@@ -1,5 +1,10 @@
<script lang="ts">
import { onMount } from 'svelte';
import { SvelteURLSearchParams } from 'svelte/reactivity';
import JobSkeleton from '$lib/components/admin/JobSkeleton.svelte';
import JobStatusBadge from '$lib/components/admin/JobStatusBadge.svelte';
import Toast from '$lib/components/admin/Toast.svelte';
import WorkerStatusPanel from '$lib/components/admin/WorkerStatusPanel.svelte';
import type { IndexingJobDto } from '$lib/server/models/indexing-job.js';
interface JobResponse {
@@ -7,123 +12,230 @@
total: number;
}
let jobs = $state<IndexingJobDto[]>([]);
let loading = $state(true);
let error = $state<string | null>(null);
let actionInProgress = $state<string | null>(null);
interface ToastItem {
id: string;
message: string;
type: 'success' | 'error' | 'info';
}
type FilterStatus = 'queued' | 'running' | 'done' | 'failed';
type JobAction = 'pause' | 'resume' | 'cancel';
const filterStatuses: FilterStatus[] = ['queued', 'running', 'done', 'failed'];
const stageLabels: Record<string, string> = {
queued: 'Queued',
differential: 'Diff',
crawling: 'Crawling',
cloning: 'Cloning',
parsing: 'Parsing',
storing: 'Storing',
embedding: 'Embedding',
done: 'Done',
failed: 'Failed'
};
let jobs = $state<IndexingJobDto[]>([]);
let total = $state(0);
let loading = $state(true);
let refreshing = $state(false);
let error = $state<string | null>(null);
let repositoryInput = $state('');
let selectedStatuses = $state<FilterStatus[]>([]);
let appliedRepositoryFilter = $state('');
let appliedStatuses = $state<FilterStatus[]>([]);
let pendingCancelJobId = $state<string | null>(null);
let rowActions = $state<Record<string, JobAction | undefined>>({});
let toasts = $state<ToastItem[]>([]);
let refreshTimer: ReturnType<typeof setTimeout> | null = null;
function buildJobsUrl(): string {
const params = new SvelteURLSearchParams({ limit: '50' });
if (appliedRepositoryFilter) {
params.set('repositoryId', appliedRepositoryFilter);
}
if (appliedStatuses.length > 0) {
params.set('status', appliedStatuses.join(','));
}
return `/api/v1/jobs?${params.toString()}`;
}
function pushToast(message: string, type: ToastItem['type'] = 'success') {
toasts = [...toasts, { id: crypto.randomUUID(), message, type }];
}
function clearRowAction(jobId: string) {
const next = { ...rowActions };
delete next[jobId];
rowActions = next;
}
function setRowAction(jobId: string, action: JobAction) {
rowActions = { ...rowActions, [jobId]: action };
}
function scheduleRefresh(delayMs = 500) {
if (refreshTimer) {
clearTimeout(refreshTimer);
}
refreshTimer = setTimeout(() => {
void fetchJobs({ background: true });
}, delayMs);
}
function hasAppliedFilters(): boolean {
return appliedRepositoryFilter.length > 0 || appliedStatuses.length > 0;
}
function sameStatuses(left: FilterStatus[], right: FilterStatus[]): boolean {
return left.length === right.length && left.every((status, index) => status === right[index]);
}
function filtersDirty(): boolean {
return (
repositoryInput.trim() !== appliedRepositoryFilter ||
!sameStatuses(selectedStatuses, appliedStatuses)
);
}
function isSpecificRepositoryId(repositoryId: string): boolean {
return repositoryId.split('/').filter(Boolean).length >= 2;
}
function matchesAppliedFilters(job: IndexingJobDto): boolean {
if (appliedRepositoryFilter) {
const repositoryFilter = appliedRepositoryFilter;
const repositoryMatches = isSpecificRepositoryId(repositoryFilter)
? job.repositoryId === repositoryFilter
: job.repositoryId === repositoryFilter ||
job.repositoryId.startsWith(`${repositoryFilter}/`);
if (!repositoryMatches) {
return false;
}
}
if (appliedStatuses.length === 0) {
return true;
}
return appliedStatuses.includes(job.status as FilterStatus);
}
function syncCancelState(nextJobs: IndexingJobDto[]) {
if (!pendingCancelJobId) {
return;
}
const pendingJob = nextJobs.find((job) => job.id === pendingCancelJobId);
if (!pendingJob || !canCancel(pendingJob.status)) {
pendingCancelJobId = null;
}
}
async function fetchJobs(options: { background?: boolean } = {}) {
const background = options.background ?? false;
if (background) {
refreshing = true;
} else {
loading = true;
}
// Fetch jobs from API
async function fetchJobs() {
try {
const response = await fetch('/api/v1/jobs?limit=50');
const response = await fetch(buildJobsUrl());
if (!response.ok) {
throw new Error(`HTTP ${response.status}`);
}
const data: JobResponse = await response.json();
jobs = data.jobs;
total = data.total;
error = null;
syncCancelState(data.jobs);
} catch (err) {
error = err instanceof Error ? err.message : 'Failed to fetch jobs';
console.error('Failed to fetch jobs:', err);
} finally {
loading = false;
refreshing = false;
}
}
// Action handlers
async function pauseJob(id: string) {
actionInProgress = id;
async function runJobAction(job: IndexingJobDto, action: JobAction) {
setRowAction(job.id, action);
try {
const response = await fetch(`/api/v1/jobs/${id}/pause`, { method: 'POST' });
const response = await fetch(`/api/v1/jobs/${job.id}/${action}`, { method: 'POST' });
const payload = await response.json().catch(() => ({ message: 'Unknown error' }));
if (!response.ok) {
const errorData = await response.json().catch(() => ({ message: 'Unknown error' }));
throw new Error(errorData.message || `HTTP ${response.status}`);
}
// Optimistic update
jobs = jobs.map((j) => (j.id === id ? { ...j, status: 'paused' as const } : j));
// Show success message
showToast('Job paused successfully');
} catch (err) {
const msg = err instanceof Error ? err.message : 'Failed to pause job';
showToast(msg, 'error');
console.error('Failed to pause job:', err);
} finally {
actionInProgress = null;
// Refresh after a short delay to get the actual state
setTimeout(fetchJobs, 500);
}
throw new Error(payload.message || `HTTP ${response.status}`);
}
async function resumeJob(id: string) {
actionInProgress = id;
try {
const response = await fetch(`/api/v1/jobs/${id}/resume`, { method: 'POST' });
if (!response.ok) {
const errorData = await response.json().catch(() => ({ message: 'Unknown error' }));
throw new Error(errorData.message || `HTTP ${response.status}`);
}
// Optimistic update
jobs = jobs.map((j) => (j.id === id ? { ...j, status: 'queued' as const } : j));
showToast('Job resumed successfully');
} catch (err) {
const msg = err instanceof Error ? err.message : 'Failed to resume job';
showToast(msg, 'error');
console.error('Failed to resume job:', err);
} finally {
actionInProgress = null;
setTimeout(fetchJobs, 500);
}
}
async function cancelJob(id: string) {
if (!confirm('Are you sure you want to cancel this job?')) {
return;
}
actionInProgress = id;
try {
const response = await fetch(`/api/v1/jobs/${id}/cancel`, { method: 'POST' });
if (!response.ok) {
const errorData = await response.json().catch(() => ({ message: 'Unknown error' }));
throw new Error(errorData.message || `HTTP ${response.status}`);
}
// Optimistic update
jobs = jobs.map((j) => (j.id === id ? { ...j, status: 'cancelled' as const } : j));
showToast('Job cancelled successfully');
} catch (err) {
const msg = err instanceof Error ? err.message : 'Failed to cancel job';
showToast(msg, 'error');
console.error('Failed to cancel job:', err);
} finally {
actionInProgress = null;
setTimeout(fetchJobs, 500);
}
}
// Simple toast notification (using alert for v1, can be enhanced later)
function showToast(message: string, type: 'success' | 'error' = 'success') {
// For v1, just use alert. In production, integrate with a toast library.
if (type === 'error') {
alert(`Error: ${message}`);
const updatedJob = payload.job as IndexingJobDto | undefined;
if (updatedJob) {
if (matchesAppliedFilters(updatedJob)) {
jobs = jobs.map((currentJob) =>
currentJob.id === updatedJob.id ? updatedJob : currentJob
);
} else {
console.log(`✓ ${message}`);
jobs = jobs.filter((currentJob) => currentJob.id !== updatedJob.id);
}
}
// Auto-refresh every 3 seconds
$effect(() => {
fetchJobs();
const interval = setInterval(fetchJobs, 3000);
return () => clearInterval(interval);
});
pendingCancelJobId = null;
pushToast(`Job ${action}d successfully`);
} catch (err) {
const message = err instanceof Error ? err.message : `Failed to ${action} job`;
pushToast(message, 'error');
console.error(`Failed to ${action} job:`, err);
} finally {
clearRowAction(job.id);
scheduleRefresh();
}
}
function toggleStatusFilter(status: FilterStatus) {
selectedStatuses = selectedStatuses.includes(status)
? selectedStatuses.filter((candidate) => candidate !== status)
: [...selectedStatuses, status].sort(
(left, right) => filterStatuses.indexOf(left) - filterStatuses.indexOf(right)
);
}
function applyFilters(event?: SubmitEvent) {
event?.preventDefault();
appliedRepositoryFilter = repositoryInput.trim();
appliedStatuses = [...selectedStatuses];
pendingCancelJobId = null;
void fetchJobs();
}
function resetFilters() {
repositoryInput = '';
selectedStatuses = [];
appliedRepositoryFilter = '';
appliedStatuses = [];
pendingCancelJobId = null;
void fetchJobs();
}
function requestCancel(jobId: string) {
pendingCancelJobId = pendingCancelJobId === jobId ? null : jobId;
}
function formatDate(date: Date | string | null): string {
if (!date) {
return '—';
}
// Format date for display
function formatDate(date: Date | null): string {
if (!date) return '—';
return new Date(date).toLocaleString();
}
// Determine which actions are available for a job
function canPause(status: IndexingJobDto['status']): boolean {
return status === 'queued' || status === 'running';
}
@@ -133,8 +245,67 @@
}
function canCancel(status: IndexingJobDto['status']): boolean {
return status !== 'done' && status !== 'failed';
return status !== 'done' && status !== 'failed' && status !== 'cancelled';
}
function isRowBusy(jobId: string): boolean {
return Boolean(rowActions[jobId]);
}
function getStageLabel(stage: string | undefined): string {
return stage ? (stageLabels[stage] ?? stage) : '—';
}
onMount(() => {
void fetchJobs();
const es = new EventSource('/api/v1/jobs/stream');
let fallbackInterval: ReturnType<typeof setInterval> | null = null;
const refreshJobs = () => {
void fetchJobs({ background: true });
};
es.addEventListener('job-progress', (event) => {
const data = JSON.parse(event.data) as Partial<IndexingJobDto> & { jobId?: string };
if (!data.jobId) {
return;
}
jobs = jobs.map((job) =>
job.id === data.jobId
? {
...job,
progress: data.progress ?? job.progress,
stage: data.stage ?? job.stage,
stageDetail: data.stageDetail ?? job.stageDetail,
processedFiles: data.processedFiles ?? job.processedFiles,
totalFiles: data.totalFiles ?? job.totalFiles,
status: data.status ?? job.status
}
: job
);
});
es.addEventListener('job-done', refreshJobs);
es.addEventListener('job-failed', refreshJobs);
es.onerror = () => {
es.close();
if (!fallbackInterval) {
fallbackInterval = setInterval(refreshJobs, 3000);
}
};
return () => {
es.close();
if (fallbackInterval) {
clearInterval(fallbackInterval);
}
if (refreshTimer) {
clearTimeout(refreshTimer);
}
};
});
</script>
<svelte:head>
@@ -147,23 +318,100 @@
<p class="mt-2 text-gray-600">Monitor and control indexing jobs</p>
</div>
{#if loading && jobs.length === 0}
<div class="flex items-center justify-center py-12">
<div class="text-center">
<WorkerStatusPanel />
<form
class="mb-6 rounded-lg border border-gray-200 bg-white p-4 shadow-sm"
onsubmit={applyFilters}
>
<div class="flex flex-col gap-4 lg:flex-row lg:items-end lg:justify-between">
<div class="flex-1">
<label class="mb-2 block text-sm font-medium text-gray-700" for="repository-filter">
Repository filter
</label>
<input
id="repository-filter"
type="text"
bind:value={repositoryInput}
placeholder="/owner or /owner/repo"
class="w-full rounded-md border border-gray-300 px-3 py-2 text-sm text-gray-900 shadow-sm focus:border-blue-500 focus:ring-2 focus:ring-blue-200 focus:outline-none"
/>
<p class="mt-2 text-xs text-gray-500">
Use an owner prefix like <code>/facebook</code> or a full repository ID like
<code>/facebook/react</code>.
</p>
</div>
<div class="lg:min-w-72">
<span class="mb-2 block text-sm font-medium text-gray-700">Statuses</span>
<div class="flex flex-wrap gap-2">
{#each filterStatuses as status (status)}
<button
type="button"
onclick={() => toggleStatusFilter(status)}
class="rounded-full border px-3 py-1 text-xs font-semibold uppercase transition {selectedStatuses.includes(
status
)
? 'border-blue-600 bg-blue-50 text-blue-700'
: 'border-gray-300 text-gray-600 hover:border-gray-400 hover:text-gray-900'}"
>
{status}
</button>
{/each}
</div>
</div>
<div class="flex gap-2">
<button
type="submit"
disabled={!filtersDirty()}
class="rounded bg-blue-600 px-4 py-2 text-sm font-semibold text-white hover:bg-blue-700 disabled:cursor-not-allowed disabled:opacity-50"
>
Apply filters
</button>
<button
type="button"
onclick={resetFilters}
class="rounded border border-gray-300 px-4 py-2 text-sm font-semibold text-gray-700 hover:border-gray-400 hover:text-gray-900"
>
Reset
</button>
</div>
</div>
</form>
<div
class="inline-block h-8 w-8 animate-spin rounded-full border-4 border-solid border-blue-600 border-r-transparent"
></div>
<p class="mt-2 text-gray-600">Loading jobs...</p>
class="mb-4 flex flex-col gap-2 text-sm text-gray-600 md:flex-row md:items-center md:justify-between"
>
<p>
Showing <span class="font-semibold text-gray-900">{jobs.length}</span> of
<span class="font-semibold text-gray-900">{total}</span> jobs
</p>
{#if hasAppliedFilters()}
<p class="text-xs text-gray-500">
Active filters:
{appliedRepositoryFilter || 'all repositories'}
{#if appliedStatuses.length > 0}
· {appliedStatuses.join(', ')}
{:else}
· all statuses
{/if}
</p>
{/if}
</div>
{#if error}
<div class="mb-4 rounded-md border border-red-200 bg-red-50 px-4 py-3 text-sm text-red-800">
{error}
</div>
{:else if error && jobs.length === 0}
<div class="rounded-md bg-red-50 p-4">
<p class="text-sm text-red-800">Error: {error}</p>
</div>
{:else if jobs.length === 0}
{/if}
{#if !loading && jobs.length === 0}
<div class="rounded-md bg-gray-50 p-8 text-center">
<p class="text-gray-600">
No jobs found. Jobs will appear here when repositories are indexed.
{hasAppliedFilters()
? 'No jobs match the current filters.'
: 'No jobs found. Jobs will appear here when repositories are indexed.'}
</p>
</div>
{:else}
@@ -181,6 +429,11 @@
>
Status
</th>
<th
class="px-6 py-3 text-left text-xs font-medium tracking-wider text-gray-500 uppercase"
>
Stage
</th>
<th
class="px-6 py-3 text-left text-xs font-medium tracking-wider text-gray-500 uppercase"
>
@@ -199,6 +452,9 @@
</tr>
</thead>
<tbody class="divide-y divide-gray-200 bg-white">
{#if loading && jobs.length === 0}
<JobSkeleton rows={6} />
{:else}
{#each jobs as job (job.id)}
<tr class="hover:bg-gray-50">
<td class="px-6 py-4 text-sm font-medium whitespace-nowrap text-gray-900">
@@ -206,23 +462,36 @@
{#if job.versionId}
<span class="ml-1 text-xs text-gray-500">@{job.versionId}</span>
{/if}
<div class="mt-1 text-xs text-gray-400">{job.id}</div>
</td>
<td class="px-6 py-4 text-sm whitespace-nowrap text-gray-500">
<JobStatusBadge status={job.status} />
<JobStatusBadge status={job.status} spinning={job.status === 'running'} />
</td>
<td class="px-6 py-4 text-sm whitespace-nowrap text-gray-500">
<div class="flex items-center">
<span class="mr-2">{job.progress}%</span>
<div class="flex items-center gap-2">
<span>{getStageLabel(job.stage)}</span>
{#if job.stageDetail}
<span class="text-xs text-gray-400">{job.stageDetail}</span>
{/if}
</div>
</td>
<td class="px-6 py-4 text-sm whitespace-nowrap text-gray-500">
<div class="space-y-2">
<div class="flex items-center gap-2">
<span class="w-12 text-right text-xs font-semibold text-gray-600"
>{job.progress}%</span
>
<div class="h-2 w-32 rounded-full bg-gray-200">
<div
class="h-2 rounded-full bg-blue-600 transition-all"
style="width: {job.progress}%"
></div>
</div>
</div>
{#if job.totalFiles > 0}
<span class="ml-2 text-xs text-gray-400">
{job.processedFiles}/{job.totalFiles} files
</span>
<div class="text-xs text-gray-400">
{job.processedFiles}/{job.totalFiles} files processed
</div>
{/if}
</div>
</td>
@@ -231,29 +500,50 @@
</td>
<td class="px-6 py-4 text-right text-sm font-medium whitespace-nowrap">
<div class="flex justify-end gap-2">
{#if pendingCancelJobId === job.id}
<button
type="button"
onclick={() => void runJobAction(job, 'cancel')}
disabled={isRowBusy(job.id)}
class="rounded bg-red-600 px-3 py-1 text-xs font-semibold text-white hover:bg-red-700 disabled:cursor-not-allowed disabled:opacity-50"
>
{rowActions[job.id] === 'cancel' ? 'Cancelling...' : 'Confirm cancel'}
</button>
<button
type="button"
onclick={() => requestCancel(job.id)}
disabled={isRowBusy(job.id)}
class="rounded border border-gray-300 px-3 py-1 text-xs font-semibold text-gray-700 hover:border-gray-400 hover:text-gray-900 disabled:cursor-not-allowed disabled:opacity-50"
>
Keep job
</button>
{:else}
{#if canPause(job.status)}
<button
onclick={() => pauseJob(job.id)}
disabled={actionInProgress === job.id}
class="rounded bg-yellow-600 px-3 py-1 text-xs font-semibold text-white hover:bg-yellow-700 disabled:opacity-50"
type="button"
onclick={() => void runJobAction(job, 'pause')}
disabled={isRowBusy(job.id)}
class="rounded bg-yellow-600 px-3 py-1 text-xs font-semibold text-white hover:bg-yellow-700 disabled:cursor-not-allowed disabled:opacity-50"
>
Pause
{rowActions[job.id] === 'pause' ? 'Pausing...' : 'Pause'}
</button>
{/if}
{#if canResume(job.status)}
<button
onclick={() => resumeJob(job.id)}
disabled={actionInProgress === job.id}
class="rounded bg-green-600 px-3 py-1 text-xs font-semibold text-white hover:bg-green-700 disabled:opacity-50"
type="button"
onclick={() => void runJobAction(job, 'resume')}
disabled={isRowBusy(job.id)}
class="rounded bg-green-600 px-3 py-1 text-xs font-semibold text-white hover:bg-green-700 disabled:cursor-not-allowed disabled:opacity-50"
>
Resume
{rowActions[job.id] === 'resume' ? 'Resuming...' : 'Resume'}
</button>
{/if}
{#if canCancel(job.status)}
<button
onclick={() => cancelJob(job.id)}
disabled={actionInProgress === job.id}
class="rounded bg-red-600 px-3 py-1 text-xs font-semibold text-white hover:bg-red-700 disabled:opacity-50"
type="button"
onclick={() => requestCancel(job.id)}
disabled={isRowBusy(job.id)}
class="rounded bg-red-600 px-3 py-1 text-xs font-semibold text-white hover:bg-red-700 disabled:cursor-not-allowed disabled:opacity-50"
>
Cancel
</button>
@@ -261,16 +551,20 @@
{#if !canPause(job.status) && !canResume(job.status) && !canCancel(job.status)}
<span class="text-xs text-gray-400"></span>
{/if}
{/if}
</div>
</td>
</tr>
{/each}
{/if}
</tbody>
</table>
</div>
{#if loading}
{#if refreshing}
<div class="mt-4 text-center text-sm text-gray-500">Refreshing...</div>
{/if}
{/if}
</div>
<Toast bind:toasts />

View File

@@ -55,6 +55,8 @@ function createTestDb(): Database.Database {
const migration0 = readFileSync(join(migrationsFolder, '0000_large_master_chief.sql'), 'utf-8');
const migration1 = readFileSync(join(migrationsFolder, '0001_quick_nighthawk.sql'), 'utf-8');
const migration2 = readFileSync(join(migrationsFolder, '0002_silky_stellaris.sql'), 'utf-8');
const migration3 = readFileSync(join(migrationsFolder, '0003_multiversion_config.sql'), 'utf-8');
const migration4 = readFileSync(join(migrationsFolder, '0004_complete_sentry.sql'), 'utf-8');
// Apply first migration
const statements0 = migration0
@@ -85,6 +87,24 @@ function createTestDb(): Database.Database {
client.exec(statement);
}
const statements3 = migration3
.split('--> statement-breakpoint')
.map((statement) => statement.trim())
.filter(Boolean);
for (const statement of statements3) {
client.exec(statement);
}
const statements4 = migration4
.split('--> statement-breakpoint')
.map((statement) => statement.trim())
.filter(Boolean);
for (const statement of statements4) {
client.exec(statement);
}
client.exec(readFileSync(ftsFile, 'utf-8'));
return client;
@@ -197,15 +217,6 @@ function seedEmbedding(client: Database.Database, snippetId: string, values: num
.run(snippetId, values.length, Buffer.from(Float32Array.from(values).buffer), NOW_S);
}
function seedRules(client: Database.Database, repositoryId: string, rules: string[]) {
client
.prepare(
`INSERT INTO repository_configs (repository_id, rules, updated_at)
VALUES (?, ?, ?)`
)
.run(repositoryId, JSON.stringify(rules), NOW_S);
}
describe('API contract integration', () => {
beforeEach(() => {
db = createTestDb();
@@ -436,7 +447,11 @@ describe('API contract integration', () => {
const repositoryId = seedRepo(db);
const versionId = seedVersion(db, repositoryId, 'v18.3.0');
const documentId = seedDocument(db, repositoryId, versionId);
seedRules(db, repositoryId, ['Prefer hooks over classes']);
// Insert version-specific rules (versioned queries no longer inherit the NULL row).
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, ?, ?, ?)`
).run(repositoryId, versionId, JSON.stringify(['Prefer hooks over classes']), NOW_S);
seedSnippet(db, {
documentId,
repositoryId,
@@ -486,4 +501,198 @@ describe('API contract integration', () => {
isLocal: false
});
});
it('GET /api/v1/context returns only version-specific rules for versioned queries (no NULL row contamination)', async () => {
const repositoryId = seedRepo(db);
const versionId = seedVersion(db, repositoryId, 'v2.0.0');
const documentId = seedDocument(db, repositoryId, versionId);
// Insert repo-wide rules (version_id IS NULL) — these must NOT appear in versioned queries.
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, NULL, ?, ?)`
).run(repositoryId, JSON.stringify(['Repo-wide rule']), NOW_S);
// Insert version-specific rules.
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, ?, ?, ?)`
).run(repositoryId, versionId, JSON.stringify(['Version-specific rule']), NOW_S);
seedSnippet(db, {
documentId,
repositoryId,
versionId,
content: 'some versioned content'
});
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(`${repositoryId}/v2.0.0`)}&query=${encodeURIComponent('versioned content')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
// Only the version-specific rule should appear — NULL row must not contaminate.
expect(body.rules).toEqual(['Version-specific rule']);
});
it('GET /api/v1/context returns only repo-wide rules when no version is requested', async () => {
const repositoryId = seedRepo(db);
const documentId = seedDocument(db, repositoryId);
// Insert repo-wide rules (version_id IS NULL).
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, NULL, ?, ?)`
).run(repositoryId, JSON.stringify(['Repo-wide rule only']), NOW_S);
seedSnippet(db, { documentId, repositoryId, content: 'some content' });
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(repositoryId)}&query=${encodeURIComponent('some content')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
expect(body.rules).toEqual(['Repo-wide rule only']);
});
it('GET /api/v1/context versioned query returns only the version-specific rules row', async () => {
const repositoryId = seedRepo(db);
const versionId = seedVersion(db, repositoryId, 'v3.0.0');
const documentId = seedDocument(db, repositoryId, versionId);
const sharedRule = 'Use TypeScript strict mode';
// Insert repo-wide NULL row — must NOT bleed into versioned query results.
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, NULL, ?, ?)`
).run(repositoryId, JSON.stringify([sharedRule]), NOW_S);
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, ?, ?, ?)`
).run(repositoryId, versionId, JSON.stringify([sharedRule, 'Version-only rule']), NOW_S);
seedSnippet(db, { documentId, repositoryId, versionId, content: 'dedup test content' });
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(`${repositoryId}/v3.0.0`)}&query=${encodeURIComponent('dedup test')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
// Returns only the version-specific row as stored — no NULL row merge.
expect(body.rules).toEqual([sharedRule, 'Version-only rule']);
});
it('GET /api/v1/context versioned query returns empty rules when only NULL row exists (no NULL contamination)', async () => {
const repositoryId = seedRepo(db);
const versionId = seedVersion(db, repositoryId, 'v1.0.0');
const documentId = seedDocument(db, repositoryId, versionId);
// Only a repo-wide NULL row exists — no version-specific config.
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, NULL, ?, ?)`
).run(repositoryId, JSON.stringify(['HEAD rules that must not contaminate v1']), NOW_S);
seedSnippet(db, { documentId, repositoryId, versionId, content: 'v1 content' });
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(`${repositoryId}/v1.0.0`)}&query=${encodeURIComponent('v1 content')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
// No version-specific config row → empty rules. NULL row must not bleed in.
expect(body.rules).toEqual([]);
});
it('GET /api/v1/context returns 404 with VERSION_NOT_FOUND when version does not exist', async () => {
const repositoryId = seedRepo(db);
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(`${repositoryId}/v99.0.0`)}&query=${encodeURIComponent('foo')}`
)
} as never);
expect(response.status).toBe(404);
const body = await response.json();
expect(body.code).toBe('VERSION_NOT_FOUND');
});
it('GET /api/v1/context resolves a version by full commit SHA', async () => {
const repositoryId = seedRepo(db);
const fullSha = 'a'.repeat(40);
// Insert version with a commit_hash
db.prepare(
`INSERT INTO repository_versions
(id, repository_id, tag, commit_hash, state, total_snippets, indexed_at, created_at)
VALUES (?, ?, ?, ?, 'indexed', 0, ?, ?)`
).run(`${repositoryId}/v2.0.0`, repositoryId, 'v2.0.0', fullSha, NOW_S, NOW_S);
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(`${repositoryId}/${fullSha}`)}&query=${encodeURIComponent('anything')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
expect(body.version?.resolved).toBe('v2.0.0');
});
it('GET /api/v1/context resolves a version by short SHA prefix (8 chars)', async () => {
const repositoryId = seedRepo(db);
const fullSha = 'b1c2d3e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b9c0';
const shortSha = fullSha.slice(0, 8);
db.prepare(
`INSERT INTO repository_versions
(id, repository_id, tag, commit_hash, state, total_snippets, indexed_at, created_at)
VALUES (?, ?, ?, ?, 'indexed', 0, ?, ?)`
).run(`${repositoryId}/v3.0.0`, repositoryId, 'v3.0.0', fullSha, NOW_S, NOW_S);
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(`${repositoryId}/${shortSha}`)}&query=${encodeURIComponent('anything')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
expect(body.version?.resolved).toBe('v3.0.0');
});
it('GET /api/v1/context includes searchModeUsed in JSON response', async () => {
const repositoryId = seedRepo(db);
const documentId = seedDocument(db, repositoryId);
seedSnippet(db, {
documentId,
repositoryId,
content: 'search mode used test snippet'
});
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(repositoryId)}&query=${encodeURIComponent('search mode used')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
expect(body.searchModeUsed).toBeDefined();
expect(['keyword', 'semantic', 'hybrid', 'keyword_fallback']).toContain(body.searchModeUsed);
});
});

View File

@@ -36,9 +36,10 @@ function getServices(db: ReturnType<typeof getClient>) {
// Load the active embedding profile from the database
const profileRow = db
.prepare<[], EmbeddingProfileEntityProps>(
'SELECT * FROM embedding_profiles WHERE is_default = 1 AND enabled = 1 LIMIT 1'
)
.prepare<
[],
EmbeddingProfileEntityProps
>('SELECT * FROM embedding_profiles WHERE is_default = 1 AND enabled = 1 LIMIT 1')
.get();
const profile = profileRow
@@ -54,24 +55,42 @@ interface RawRepoConfig {
rules: string | null;
}
function getRules(db: ReturnType<typeof getClient>, repositoryId: string): string[] {
const row = db
.prepare<
[string],
RawRepoConfig
>(`SELECT rules FROM repository_configs WHERE repository_id = ?`)
.get(repositoryId);
if (!row?.rules) return [];
function parseRulesJson(raw: string | null | undefined): string[] {
if (!raw) return [];
try {
const parsed = JSON.parse(row.rules);
const parsed = JSON.parse(raw);
return Array.isArray(parsed) ? (parsed as string[]) : [];
} catch {
return [];
}
}
function getRules(
db: ReturnType<typeof getClient>,
repositoryId: string,
versionId?: string
): string[] {
if (!versionId) {
// Unversioned query: return repo-wide (HEAD) rules only.
const row = db
.prepare<
[string],
RawRepoConfig
>(`SELECT rules FROM repository_configs WHERE repository_id = ? AND version_id IS NULL`)
.get(repositoryId);
return parseRulesJson(row?.rules);
}
// Versioned query: return only version-specific rules (no NULL row merge).
const row = db
.prepare<
[string, string],
RawRepoConfig
>(`SELECT rules FROM repository_configs WHERE repository_id = ? AND version_id = ?`)
.get(repositoryId, versionId);
return parseRulesJson(row?.rules);
}
interface RawRepoState {
state: 'pending' | 'indexing' | 'indexed' | 'error';
id: string;
@@ -198,6 +217,7 @@ export const GET: RequestHandler = async ({ url }) => {
let versionId: string | undefined;
let resolvedVersion: RawVersionRow | undefined;
if (parsed.version) {
// Try exact tag match first.
resolvedVersion = db
.prepare<
[string, string],
@@ -205,12 +225,30 @@ export const GET: RequestHandler = async ({ url }) => {
>(`SELECT id, tag FROM repository_versions WHERE repository_id = ? AND tag = ?`)
.get(parsed.repositoryId, parsed.version);
// Version not found is not fatal — fall back to default branch.
versionId = resolvedVersion?.id;
// Fall back to commit hash prefix match (min 7 chars).
if (!resolvedVersion && parsed.version.length >= 7) {
resolvedVersion = db
.prepare<[string, string], RawVersionRow>(
`SELECT id, tag FROM repository_versions
WHERE repository_id = ? AND commit_hash LIKE ?`
)
.get(parsed.repositoryId, `${parsed.version}%`);
}
if (!resolvedVersion) {
return new Response(
JSON.stringify({
error: `Version ${parsed.version} not found for library ${parsed.repositoryId}`,
code: 'VERSION_NOT_FOUND'
}),
{ status: 404, headers: { 'Content-Type': 'application/json', ...CORS_HEADERS } }
);
}
versionId = resolvedVersion.id;
}
// Execute hybrid search (falls back to FTS5 when no embedding provider is set).
const searchResults = await hybridService.search(query, {
const { results: searchResults, searchModeUsed } = await hybridService.search(query, {
repositoryId: parsed.repositoryId,
versionId,
limit: 50, // fetch more than needed; token budget will trim
@@ -242,6 +280,7 @@ export const GET: RequestHandler = async ({ url }) => {
const metadata: ContextResponseMetadata = {
localSource: repo.source === 'local',
resultCount: selectedResults.length,
searchModeUsed,
repository: {
id: repo.id,
title: repo.title,
@@ -260,8 +299,8 @@ export const GET: RequestHandler = async ({ url }) => {
snippetVersions
};
// Load rules from repository_configs.
const rules = getRules(db, parsed.repositoryId);
// Load rules from repository_configs (repo-wide + version-specific merged).
const rules = getRules(db, parsed.repositoryId, versionId);
if (responseType === 'txt') {
const text = formatContextTxt(selectedResults, rules, metadata);

View File

@@ -15,16 +15,46 @@ import { JobQueue } from '$lib/server/pipeline/job-queue.js';
import { handleServiceError } from '$lib/server/utils/validation.js';
import type { IndexingJob } from '$lib/types';
const VALID_JOB_STATUSES: ReadonlySet<IndexingJob['status']> = new Set([
'queued',
'running',
'done',
'failed'
]);
function parseStatusFilter(
searchValue: string | null
): IndexingJob['status'] | Array<IndexingJob['status']> | undefined {
if (!searchValue) {
return undefined;
}
const statuses = [
...new Set(
searchValue
.split(',')
.map((value) => value.trim())
.filter((value): value is IndexingJob['status'] =>
VALID_JOB_STATUSES.has(value as IndexingJob['status'])
)
)
];
if (statuses.length === 0) {
return undefined;
}
return statuses.length === 1 ? statuses[0] : statuses;
}
export const GET: RequestHandler = ({ url }) => {
try {
const db = getClient();
const queue = new JobQueue(db);
const repositoryId = url.searchParams.get('repositoryId') ?? undefined;
const status = (url.searchParams.get('status') ?? undefined) as
| IndexingJob['status']
| undefined;
const limit = Math.min(parseInt(url.searchParams.get('limit') ?? '20', 10) || 20, 200);
const repositoryId = url.searchParams.get('repositoryId')?.trim() || undefined;
const status = parseStatusFilter(url.searchParams.get('status'));
const limit = Math.min(parseInt(url.searchParams.get('limit') ?? '20', 10) || 20, 1000);
const jobs = queue.listJobs({ repositoryId, status, limit });
const total = queue.countJobs({ repositoryId, status });

View File

@@ -0,0 +1,132 @@
/**
* GET /api/v1/jobs/:id/stream — stream real-time job progress via SSE.
*
* Headers:
* Last-Event-ID (optional) — triggers replay of last cached event
*/
import type { RequestHandler } from './$types';
import { getClient } from '$lib/server/db/client.js';
import { JobQueue } from '$lib/server/pipeline/job-queue.js';
import { getBroadcaster } from '$lib/server/pipeline/progress-broadcaster.js';
import { handleServiceError } from '$lib/server/utils/validation.js';
export const GET: RequestHandler = ({ params, request }) => {
try {
const db = getClient();
const queue = new JobQueue(db);
const jobId = params.id;
// Get the job from the queue
const job = queue.getJob(jobId);
if (!job) {
return new Response('Not found', { status: 404 });
}
// Get broadcaster
const broadcaster = getBroadcaster();
if (!broadcaster) {
return new Response('Service unavailable', { status: 503 });
}
// Create a new readable stream for SSE
const stream = new ReadableStream<string>({
async start(controller) {
try {
// Send initial job state as first event
const initialData = {
jobId,
stage: job.stage,
stageDetail: job.stageDetail,
progress: job.progress,
processedFiles: job.processedFiles,
totalFiles: job.totalFiles,
status: job.status,
error: job.error
};
controller.enqueue(`event: job-progress\ndata: ${JSON.stringify(initialData)}\n\n`);
// Check for Last-Event-ID header for reconnect
const lastEventId = request.headers.get('Last-Event-ID');
if (lastEventId) {
const lastEvent = broadcaster.getLastEvent(jobId);
if (lastEvent && lastEvent.id >= parseInt(lastEventId, 10)) {
controller.enqueue(
`id: ${lastEvent.id}\nevent: ${lastEvent.event}\ndata: ${lastEvent.data}\n\n`
);
}
}
// Check if job is already done or failed - close immediately after first event
if (job.status === 'done' || job.status === 'failed') {
if (job.status === 'done') {
controller.enqueue(`event: job-done\ndata: ${JSON.stringify({ jobId })}\n\n`);
} else {
controller.enqueue(
`event: job-failed\ndata: ${JSON.stringify({ jobId, error: job.error })}\n\n`
);
}
controller.close();
return;
}
// Subscribe to broadcaster for live events
const eventStream = broadcaster.subscribe(jobId);
const reader = eventStream.getReader();
// Pipe broadcaster events to the response
try {
while (true) {
const { done, value } = await reader.read();
if (done) break;
controller.enqueue(value);
// Check if the incoming event indicates job completion
if (value.includes('event: job-done') || value.includes('event: job-failed')) {
controller.close();
break;
}
}
} finally {
reader.releaseLock();
try {
controller.close();
} catch {
// Stream may already be closed after a terminal event.
}
}
} catch (err) {
console.error('SSE stream error:', err);
try {
controller.close();
} catch {
// Stream may already be closed.
}
}
}
});
return new Response(stream, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
Connection: 'keep-alive',
'X-Accel-Buffering': 'no',
'Access-Control-Allow-Origin': '*'
}
});
} catch (err) {
return handleServiceError(err);
}
};
export const OPTIONS: RequestHandler = () => {
return new Response(null, {
status: 204,
headers: {
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Methods': 'GET, OPTIONS',
'Access-Control-Allow-Headers': 'Content-Type, Authorization, Last-Event-ID'
}
});
};

View File

@@ -0,0 +1,52 @@
/**
* GET /api/v1/jobs/stream — stream real-time job progress for all jobs or a specific repository via SSE.
*
* Query parameters:
* repositoryId (optional) — filter to jobs for this repository
*/
import type { RequestHandler } from './$types';
import { getBroadcaster } from '$lib/server/pipeline/progress-broadcaster.js';
import { handleServiceError } from '$lib/server/utils/validation.js';
export const GET: RequestHandler = ({ url }) => {
try {
const broadcaster = getBroadcaster();
if (!broadcaster) {
return new Response('Service unavailable', { status: 503 });
}
const repositoryId = url.searchParams.get('repositoryId');
// Get the appropriate stream based on parameters
let stream;
if (repositoryId) {
stream = broadcaster.subscribeRepository(repositoryId);
} else {
stream = broadcaster.subscribeAll();
}
return new Response(stream, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
Connection: 'keep-alive',
'X-Accel-Buffering': 'no',
'Access-Control-Allow-Origin': '*'
}
});
} catch (err) {
return handleServiceError(err);
}
};
export const OPTIONS: RequestHandler = () => {
return new Response(null, {
status: 204,
headers: {
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Methods': 'GET, OPTIONS',
'Access-Control-Allow-Headers': 'Content-Type, Authorization'
}
});
};

View File

@@ -124,8 +124,10 @@ describe('POST /api/v1/libs/:id/index', () => {
versionService.add('/facebook/react', 'v18.3.0', 'React v18.3.0');
versionService.add('/facebook/react', 'v17.0.0', 'React v17.0.0');
const enqueue = vi.fn().mockImplementation(
(repositoryId: string, versionId?: string) => makeEnqueueJob(repositoryId, versionId)
const enqueue = vi
.fn()
.mockImplementation((repositoryId: string, versionId?: string) =>
makeEnqueueJob(repositoryId, versionId)
);
mockQueue = { enqueue };
@@ -158,8 +160,10 @@ describe('POST /api/v1/libs/:id/index', () => {
repoService.add({ source: 'github', sourceUrl: 'https://github.com/facebook/react' });
versionService.add('/facebook/react', 'v18.3.0', 'React v18.3.0');
const enqueue = vi.fn().mockImplementation(
(repositoryId: string, versionId?: string) => makeEnqueueJob(repositoryId, versionId)
const enqueue = vi
.fn()
.mockImplementation((repositoryId: string, versionId?: string) =>
makeEnqueueJob(repositoryId, versionId)
);
mockQueue = { enqueue };

View File

@@ -49,7 +49,10 @@ function createTestDb(): Database.Database {
const client = new Database(':memory:');
client.pragma('foreign_keys = ON');
const migrationsFolder = join(import.meta.dirname, '../../../../../../../lib/server/db/migrations');
const migrationsFolder = join(
import.meta.dirname,
'../../../../../../../lib/server/db/migrations'
);
const ftsFile = join(import.meta.dirname, '../../../../../../../lib/server/db/fts.sql');
const migration0 = readFileSync(join(migrationsFolder, '0000_large_master_chief.sql'), 'utf-8');

View File

@@ -0,0 +1,100 @@
import { json } from '@sveltejs/kit';
import type { RequestHandler } from './$types';
import { getClient } from '$lib/server/db/client.js';
import { getPool } from '$lib/server/pipeline/startup.js';
import os from 'node:os';
/**
* GET /api/v1/settings/indexing — retrieve indexing concurrency setting
* PUT /api/v1/settings/indexing — update indexing concurrency setting
* OPTIONS /api/v1/settings/indexing — CORS preflight
*/
// ---------------------------------------------------------------------------
// GET — Return current indexing concurrency
// ---------------------------------------------------------------------------
export const GET: RequestHandler = () => {
try {
const db = getClient();
const row = db
.prepare<
[],
{ value: string }
>("SELECT value FROM settings WHERE key = 'indexing.concurrency'")
.get();
let concurrency = 2;
if (row && row.value) {
try {
const parsed = JSON.parse(row.value);
if (typeof parsed === 'object' && parsed !== null && typeof parsed.value === 'number') {
concurrency = parsed.value;
} else if (typeof parsed === 'number') {
concurrency = parsed;
}
} catch {
concurrency = 2;
}
}
return json({ concurrency });
} catch (err) {
console.error('GET /api/v1/settings/indexing error:', err);
return json({ error: 'Failed to read indexing settings' }, { status: 500 });
}
};
// ---------------------------------------------------------------------------
// PUT — Update indexing concurrency
// ---------------------------------------------------------------------------
export const PUT: RequestHandler = async ({ request }) => {
try {
const body = await request.json();
// Validate and clamp concurrency
const maxConcurrency = Math.max(os.cpus().length - 1, 1);
const concurrency = Math.max(
1,
Math.min(parseInt(String(body.concurrency ?? 2), 10), maxConcurrency)
);
if (isNaN(concurrency)) {
return json({ error: 'Concurrency must be a valid integer' }, { status: 400 });
}
const db = getClient();
// Write to settings table
db.prepare(
"INSERT OR REPLACE INTO settings (key, value, updated_at) VALUES ('indexing.concurrency', ?, unixepoch())"
).run(JSON.stringify({ value: concurrency }));
// Update worker pool if available
getPool()?.setMaxConcurrency(concurrency);
return json({ concurrency });
} catch (err) {
console.error('PUT /api/v1/settings/indexing error:', err);
return json(
{ error: err instanceof Error ? err.message : 'Failed to update indexing settings' },
{ status: 500 }
);
}
};
// ---------------------------------------------------------------------------
// OPTIONS — CORS preflight
// ---------------------------------------------------------------------------
export const OPTIONS: RequestHandler = () => {
return new Response(null, {
status: 200,
headers: {
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Methods': 'GET, PUT, OPTIONS',
'Access-Control-Allow-Headers': 'Content-Type'
}
});
};

View File

@@ -0,0 +1,668 @@
/**
* Integration tests for SSE streaming endpoints and the indexing settings API
* (TRUEREF-0022).
*
* Uses the same mock / in-memory DB pattern as api-contract.integration.test.ts.
*/
import { beforeEach, describe, expect, it, vi } from 'vitest';
import Database from 'better-sqlite3';
import { readFileSync } from 'node:fs';
import { join } from 'node:path';
import type { ProgressBroadcaster as BroadcasterType } from '$lib/server/pipeline/progress-broadcaster.js';
// ---------------------------------------------------------------------------
// Module-level mocks (must be hoisted to the top of the file)
// ---------------------------------------------------------------------------
let db: Database.Database;
// Closed over by the vi.mock factory below.
let mockBroadcaster: BroadcasterType | null = null;
let mockPool: { getStatus: () => object; setMaxConcurrency?: (value: number) => void } | null =
null;
vi.mock('$lib/server/db/client', () => ({
getClient: () => db
}));
vi.mock('$lib/server/db/client.js', () => ({
getClient: () => db
}));
vi.mock('$lib/server/pipeline/startup', () => ({
getQueue: () => null,
getPool: () => mockPool
}));
vi.mock('$lib/server/pipeline/startup.js', () => ({
getQueue: () => null,
getPool: () => mockPool
}));
vi.mock('$lib/server/pipeline/progress-broadcaster', async (importOriginal) => {
const original =
await importOriginal<typeof import('$lib/server/pipeline/progress-broadcaster.js')>();
return {
...original,
getBroadcaster: () => mockBroadcaster
};
});
vi.mock('$lib/server/pipeline/progress-broadcaster.js', async (importOriginal) => {
const original =
await importOriginal<typeof import('$lib/server/pipeline/progress-broadcaster.js')>();
return {
...original,
getBroadcaster: () => mockBroadcaster
};
});
// ---------------------------------------------------------------------------
// Imports (after mocks are registered)
// ---------------------------------------------------------------------------
import { ProgressBroadcaster } from '$lib/server/pipeline/progress-broadcaster.js';
import { GET as getJobsList } from './jobs/+server.js';
import { GET as getJobStream } from './jobs/[id]/stream/+server.js';
import { GET as getJobsStream } from './jobs/stream/+server.js';
import {
GET as getIndexingSettings,
PUT as putIndexingSettings
} from './settings/indexing/+server.js';
import { GET as getWorkers } from './workers/+server.js';
// ---------------------------------------------------------------------------
// DB factory
// ---------------------------------------------------------------------------
function createTestDb(): Database.Database {
const client = new Database(':memory:');
client.pragma('foreign_keys = ON');
const migrationsFolder = join(import.meta.dirname, '../../../lib/server/db/migrations');
for (const migrationFile of [
'0000_large_master_chief.sql',
'0001_quick_nighthawk.sql',
'0002_silky_stellaris.sql',
'0003_multiversion_config.sql',
'0004_complete_sentry.sql',
'0005_fix_stage_defaults.sql'
]) {
const sql = readFileSync(join(migrationsFolder, migrationFile), 'utf-8');
for (const stmt of sql
.split('--> statement-breakpoint')
.map((s) => s.trim())
.filter(Boolean)) {
client.exec(stmt);
}
}
return client;
}
// ---------------------------------------------------------------------------
// Fixtures
// ---------------------------------------------------------------------------
const NOW_S = Math.floor(Date.now() / 1000);
function seedRepo(client: Database.Database, id = '/test/repo'): string {
client
.prepare(
`INSERT INTO repositories
(id, title, source, source_url, state, created_at, updated_at)
VALUES (?, 'Test Repo', 'local', '/tmp/repo', 'indexed', ?, ?)`
)
.run(id, NOW_S, NOW_S);
return id;
}
function seedJob(
client: Database.Database,
overrides: Partial<{
id: string;
repository_id: string;
status: string;
stage: string;
progress: number;
total_files: number;
processed_files: number;
error: string | null;
}> = {}
): string {
const id = overrides.id ?? crypto.randomUUID();
client
.prepare(
`INSERT INTO indexing_jobs
(id, repository_id, version_id, status, progress, total_files, processed_files,
stage, stage_detail, error, started_at, completed_at, created_at)
VALUES (?, ?, null, ?, ?, ?, ?, ?, null, ?, null, null, ?)`
)
.run(
id,
overrides.repository_id ?? '/test/repo',
overrides.status ?? 'queued',
overrides.progress ?? 0,
overrides.total_files ?? 0,
overrides.processed_files ?? 0,
overrides.stage ?? 'queued',
overrides.error ?? null,
NOW_S
);
return id;
}
/** Build a minimal SvelteKit-compatible RequestEvent for SSE handlers. */
function makeEvent<T = Parameters<typeof getJobStream>[0]>(opts: {
params?: Record<string, string>;
url?: string;
headers?: Record<string, string>;
body?: unknown;
}): T {
const url = new URL(opts.url ?? 'http://localhost/api/v1/jobs/test/stream');
const headers = new Headers(opts.headers ?? {});
return {
params: opts.params ?? {},
url,
request: new Request(url.toString(), {
method: opts.body ? 'PUT' : 'GET',
headers,
body: opts.body ? JSON.stringify(opts.body) : undefined
}),
route: { id: null },
locals: {},
platform: undefined,
cookies: {} as never,
fetch: fetch,
getClientAddress: () => '127.0.0.1',
setHeaders: vi.fn(),
isDataRequest: false,
isSubRequest: false,
depends: vi.fn(),
untrack: vi.fn()
} as unknown as T;
}
// ---------------------------------------------------------------------------
// Helper: read first chunk from a response body
// ---------------------------------------------------------------------------
async function readFirstChunk(response: Response): Promise<string> {
const reader = response.body?.getReader();
if (!reader) throw new Error('Response has no body');
const { value } = await reader.read();
reader.releaseLock();
// Stream enqueues strings directly — no TextDecoder needed
return String(value ?? '');
}
// ---------------------------------------------------------------------------
// Test group 1: GET /api/v1/jobs/:id/stream
// ---------------------------------------------------------------------------
describe('GET /api/v1/jobs/:id/stream', () => {
beforeEach(() => {
db = createTestDb();
mockBroadcaster = new ProgressBroadcaster();
});
it('returns 404 when the job does not exist', async () => {
seedRepo(db);
const response = await getJobStream(makeEvent({ params: { id: 'non-existent-job-id' } }));
expect(response.status).toBe(404);
});
it('returns 503 when broadcaster is not initialized', async () => {
mockBroadcaster = null;
seedRepo(db);
const jobId = seedJob(db);
const response = await getJobStream(makeEvent({ params: { id: jobId } }));
expect(response.status).toBe(503);
});
it('returns 200 with Content-Type: text/event-stream', async () => {
seedRepo(db);
const jobId = seedJob(db);
const response = await getJobStream(makeEvent({ params: { id: jobId } }));
expect(response.status).toBe(200);
expect(response.headers.get('Content-Type')).toContain('text/event-stream');
});
it('first chunk contains the initial job state as a data event', async () => {
seedRepo(db);
const jobId = seedJob(db, { status: 'running', progress: 42 });
const response = await getJobStream(makeEvent({ params: { id: jobId } }));
const text = await readFirstChunk(response);
expect(text).toContain('data:');
// The initial event carries jobId and status
expect(text).toContain(jobId);
expect(text).toContain('running');
});
it('closes the stream immediately when job status is "done"', async () => {
seedRepo(db);
const jobId = seedJob(db, { status: 'done' });
const response = await getJobStream(makeEvent({ params: { id: jobId } }));
expect(response.status).toBe(200);
// Read both chunks until done
const reader = response.body!.getReader();
let fullText = '';
let iterations = 0;
while (iterations < 10) {
const { done, value } = await reader.read();
if (done) break;
fullText += String(value ?? '');
iterations++;
}
// Stream should close without blocking (done=true was reached)
expect(fullText).toContain(jobId);
});
it('closes the stream immediately when job status is "failed"', async () => {
seedRepo(db);
const jobId = seedJob(db, { status: 'failed', error: 'something went wrong' });
const response = await getJobStream(makeEvent({ params: { id: jobId } }));
expect(response.status).toBe(200);
const reader = response.body!.getReader();
let fullText = '';
let iterations = 0;
while (iterations < 10) {
const { done, value } = await reader.read();
if (done) break;
fullText += String(value ?? '');
iterations++;
}
expect(fullText).toContain('failed');
});
it('replays last cached event when Last-Event-ID header is provided', async () => {
seedRepo(db);
const jobId = seedJob(db, { status: 'running' });
// Pre-seed a cached event in the broadcaster
mockBroadcaster!.broadcast(jobId, '/test/repo', 'progress', { stage: 'parsing', progress: 50 });
const response = await getJobStream(
makeEvent({
params: { id: jobId },
headers: { 'Last-Event-ID': '1' }
})
);
expect(response.status).toBe(200);
// Consume enough to get both initial state and replay
const reader = response.body!.getReader();
let fullText = '';
// Read two chunks
for (let i = 0; i < 2; i++) {
const { done, value } = await reader.read();
if (done) break;
fullText += String(value ?? '');
}
reader.releaseLock();
// The replay event should include the cached event data
expect(fullText).toContain('progress');
});
it('closes after receiving the broadcaster job-done event', async () => {
seedRepo(db);
const jobId = seedJob(db, { status: 'running', stage: 'parsing', progress: 10 });
const response = await getJobStream(makeEvent({ params: { id: jobId } }));
const reader = response.body!.getReader();
const initialChunk = await reader.read();
expect(String(initialChunk.value ?? '')).toContain('event: job-progress');
mockBroadcaster!.broadcast(jobId, '/test/repo', 'job-done', { jobId, status: 'done' });
const completionChunk = await reader.read();
expect(String(completionChunk.value ?? '')).toContain('event: job-done');
const closed = await reader.read();
expect(closed.done).toBe(true);
});
});
// ---------------------------------------------------------------------------
// Test group 2: GET /api/v1/jobs/stream
// ---------------------------------------------------------------------------
describe('GET /api/v1/jobs/stream', () => {
beforeEach(() => {
db = createTestDb();
mockBroadcaster = new ProgressBroadcaster();
});
it('returns 200 with Content-Type: text/event-stream', async () => {
const response = await getJobsStream(
makeEvent<Parameters<typeof getJobsStream>[0]>({ url: 'http://localhost/api/v1/jobs/stream' })
);
expect(response.status).toBe(200);
expect(response.headers.get('Content-Type')).toContain('text/event-stream');
});
it('returns 503 when broadcaster is not initialized', async () => {
mockBroadcaster = null;
const response = await getJobsStream(
makeEvent<Parameters<typeof getJobsStream>[0]>({ url: 'http://localhost/api/v1/jobs/stream' })
);
expect(response.status).toBe(503);
});
it('uses subscribeRepository when ?repositoryId= is provided', async () => {
const subscribeSpy = vi.spyOn(mockBroadcaster!, 'subscribeRepository');
await getJobsStream(
makeEvent<Parameters<typeof getJobsStream>[0]>({
url: 'http://localhost/api/v1/jobs/stream?repositoryId=/test/repo'
})
);
expect(subscribeSpy).toHaveBeenCalledWith('/test/repo');
});
it('uses subscribeAll when no repositoryId query param is present', async () => {
const subscribeSpy = vi.spyOn(mockBroadcaster!, 'subscribeAll');
await getJobsStream(
makeEvent<Parameters<typeof getJobsStream>[0]>({ url: 'http://localhost/api/v1/jobs/stream' })
);
expect(subscribeSpy).toHaveBeenCalled();
});
it('broadcasts to stream subscribers for the correct repository', async () => {
seedRepo(db, '/repo/alpha');
const response = await getJobsStream(
makeEvent<Parameters<typeof getJobsStream>[0]>({
url: 'http://localhost/api/v1/jobs/stream?repositoryId=/repo/alpha'
})
);
// Broadcast an event for this repository
mockBroadcaster!.broadcast('job-123', '/repo/alpha', 'progress', { stage: 'parsing' });
const reader = response.body!.getReader();
const { value } = await reader.read();
const text = String(value ?? '');
reader.releaseLock();
expect(text).toContain('progress');
});
});
// ---------------------------------------------------------------------------
// Test group 3: GET /api/v1/jobs
// ---------------------------------------------------------------------------
describe('GET /api/v1/jobs', () => {
beforeEach(() => {
db = createTestDb();
});
it('supports repository prefix and comma-separated status filters', async () => {
seedRepo(db, '/facebook/react');
seedRepo(db, '/facebook/react-native');
seedRepo(db, '/vitejs/vite');
seedJob(db, { repository_id: '/facebook/react', status: 'queued' });
seedJob(db, { repository_id: '/facebook/react-native', status: 'running' });
seedJob(db, { repository_id: '/facebook/react', status: 'done' });
seedJob(db, { repository_id: '/vitejs/vite', status: 'queued' });
const response = await getJobsList(
makeEvent<Parameters<typeof getJobsList>[0]>({
url: 'http://localhost/api/v1/jobs?repositoryId=%2Ffacebook&status=queued,%20running'
})
);
const body = await response.json();
expect(response.status).toBe(200);
expect(body.total).toBe(2);
expect(body.jobs).toHaveLength(2);
expect(body.jobs.map((job: { repositoryId: string }) => job.repositoryId).sort()).toEqual([
'/facebook/react',
'/facebook/react-native'
]);
expect(body.jobs.map((job: { status: string }) => job.status).sort()).toEqual([
'queued',
'running'
]);
});
it('keeps exact-match behavior for specific repository IDs', async () => {
seedRepo(db, '/facebook/react');
seedRepo(db, '/facebook/react-native');
seedJob(db, { repository_id: '/facebook/react', status: 'queued' });
seedJob(db, { repository_id: '/facebook/react-native', status: 'queued' });
const response = await getJobsList(
makeEvent<Parameters<typeof getJobsList>[0]>({
url: 'http://localhost/api/v1/jobs?repositoryId=%2Ffacebook%2Freact&status=queued'
})
);
const body = await response.json();
expect(response.status).toBe(200);
expect(body.total).toBe(1);
expect(body.jobs).toHaveLength(1);
expect(body.jobs[0].repositoryId).toBe('/facebook/react');
});
});
// ---------------------------------------------------------------------------
// Test group 4: GET /api/v1/workers
// ---------------------------------------------------------------------------
describe('GET /api/v1/workers', () => {
beforeEach(() => {
mockPool = null;
});
it('returns 503 when the worker pool is not initialized', async () => {
const response = await getWorkers(makeEvent<Parameters<typeof getWorkers>[0]>({}));
expect(response.status).toBe(503);
});
it('returns the current worker status snapshot', async () => {
mockPool = {
getStatus: () => ({
concurrency: 2,
active: 1,
idle: 1,
workers: [
{
index: 0,
state: 'running',
jobId: 'job-1',
repositoryId: '/test/repo',
versionId: null
},
{
index: 1,
state: 'idle',
jobId: null,
repositoryId: null,
versionId: null
}
]
})
};
const response = await getWorkers(makeEvent<Parameters<typeof getWorkers>[0]>({}));
const body = await response.json();
expect(response.status).toBe(200);
expect(body.active).toBe(1);
expect(body.workers[0].jobId).toBe('job-1');
});
});
// ---------------------------------------------------------------------------
// Test group 5: GET /api/v1/settings/indexing
// ---------------------------------------------------------------------------
describe('GET /api/v1/settings/indexing', () => {
beforeEach(() => {
db = createTestDb();
mockPool = {
getStatus: () => ({ concurrency: 2, active: 0, idle: 2, workers: [] }),
setMaxConcurrency: vi.fn()
};
});
it('returns { concurrency: 2 } when no setting exists in DB', async () => {
const response = await getIndexingSettings(
makeEvent<Parameters<typeof getIndexingSettings>[0]>({})
);
const body = await response.json();
expect(response.status).toBe(200);
expect(body).toEqual({ concurrency: 2 });
});
it('returns the stored concurrency when a setting exists', async () => {
db.prepare(
"INSERT INTO settings (key, value, updated_at) VALUES ('indexing.concurrency', ?, ?)"
).run(JSON.stringify(4), NOW_S);
const response = await getIndexingSettings(
makeEvent<Parameters<typeof getIndexingSettings>[0]>({})
);
const body = await response.json();
expect(body.concurrency).toBe(4);
});
it('parses JSON-wrapped value correctly: {"value": 5}', async () => {
db.prepare(
"INSERT INTO settings (key, value, updated_at) VALUES ('indexing.concurrency', ?, ?)"
).run(JSON.stringify({ value: 5 }), NOW_S);
const response = await getIndexingSettings(
makeEvent<Parameters<typeof getIndexingSettings>[0]>({})
);
const body = await response.json();
expect(body.concurrency).toBe(5);
});
});
// ---------------------------------------------------------------------------
// Test group 6: PUT /api/v1/settings/indexing
// ---------------------------------------------------------------------------
describe('PUT /api/v1/settings/indexing', () => {
beforeEach(() => {
db = createTestDb();
mockPool = {
getStatus: () => ({ concurrency: 2, active: 0, idle: 2, workers: [] }),
setMaxConcurrency: vi.fn()
};
});
function makePutEvent(body: unknown) {
const url = new URL('http://localhost/api/v1/settings/indexing');
return {
params: {},
url,
request: new Request(url.toString(), {
method: 'PUT',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(body)
}),
route: { id: null },
locals: {},
platform: undefined,
cookies: {} as never,
fetch: fetch,
getClientAddress: () => '127.0.0.1',
setHeaders: vi.fn(),
isDataRequest: false,
isSubRequest: false,
depends: vi.fn(),
untrack: vi.fn()
} as unknown as Parameters<typeof putIndexingSettings>[0];
}
it('returns 200 with { concurrency } for a valid integer input', async () => {
const response = await putIndexingSettings(makePutEvent({ concurrency: 3 }));
const body = await response.json();
expect(response.status).toBe(200);
expect(body.concurrency).toBe(3);
});
it('persists the new concurrency to the settings table', async () => {
await putIndexingSettings(makePutEvent({ concurrency: 3 }));
const row = db
.prepare<
[],
{ value: string }
>("SELECT value FROM settings WHERE key = 'indexing.concurrency'")
.get();
expect(row).toBeDefined();
const parsed = JSON.parse(row!.value);
expect(parsed.value).toBe(3);
});
it('clamps to minimum of 1', async () => {
const response = await putIndexingSettings(makePutEvent({ concurrency: 0 }));
const body = await response.json();
expect(body.concurrency).toBeGreaterThanOrEqual(1);
});
it('clamps to maximum of max(cpus-1, 1)', async () => {
// Pass an absurdly large value; it must be clamped
const response = await putIndexingSettings(makePutEvent({ concurrency: 99999 }));
const body = await response.json();
const os = await import('node:os');
const expectedMax = Math.max(os.cpus().length - 1, 1);
expect(body.concurrency).toBeLessThanOrEqual(expectedMax);
});
it('returns 400 for NaN concurrency (non-numeric string)', async () => {
// parseInt('abc', 10) is NaN → should return 400
// However, the implementation uses `parseInt(String(body.concurrency ?? 2), 10)`
// and then checks isNaN — but the isNaN check is AFTER the Math.max/min clamping.
// The actual flow: parseInt('abc') => NaN, Math.max(1, Math.min(NaN, max)) => NaN,
// then `if (isNaN(concurrency))` returns 400.
// We pass the raw string directly.
const response = await putIndexingSettings(makePutEvent({ concurrency: 'not-a-number' }));
// parseInt('not-a-number') = NaN, so the handler should return 400
expect(response.status).toBe(400);
});
it('uses concurrency=2 as default when body.concurrency is missing', async () => {
const response = await putIndexingSettings(makePutEvent({}));
const body = await response.json();
// Default is 2 per the code: `body.concurrency ?? 2`
expect(body.concurrency).toBe(2);
});
});

View File

@@ -0,0 +1,16 @@
import type { RequestHandler } from './$types';
import { getPool } from '$lib/server/pipeline/startup.js';
import { handleServiceError } from '$lib/server/utils/validation.js';
export const GET: RequestHandler = () => {
try {
const pool = getPool();
if (!pool) {
return new Response('Service unavailable', { status: 503 });
}
return Response.json(pool.getStatus());
} catch (error) {
return handleServiceError(error);
}
};

View File

@@ -2,6 +2,7 @@
import { goto } from '$app/navigation';
import { resolve as resolveRoute } from '$app/paths';
import { onMount } from 'svelte';
import { SvelteSet } from 'svelte/reactivity';
import type { PageData } from './$types';
import type { Repository, IndexingJob } from '$lib/types';
import ConfirmDialog from '$lib/components/ConfirmDialog.svelte';
@@ -38,8 +39,11 @@
indexedAt: string | null;
createdAt: string;
}
type VersionStateFilter = VersionDto['state'] | 'all';
let versions = $state<VersionDto[]>([]);
let versionsLoading = $state(false);
let activeVersionFilter = $state<VersionStateFilter>('all');
let bulkReprocessBusy = $state(false);
// Add version form
let addVersionTag = $state('');
@@ -48,10 +52,16 @@
// Discover tags state
let discoverBusy = $state(false);
let discoveredTags = $state<Array<{ tag: string; commitHash: string }>>([]);
let selectedDiscoveredTags = $state<Set<string>>(new Set());
const selectedDiscoveredTags = new SvelteSet<string>();
let showDiscoverPanel = $state(false);
let registerBusy = $state(false);
// Active version indexing jobs: tag -> jobId
let activeVersionJobs = $state<Record<string, string | undefined>>({});
// Job progress data fed by the single shared poller (replaces per-version <IndexingProgress>).
let versionJobProgress = $state<Record<string, IndexingJob>>({});
// Remove confirm
let removeTag = $state<string | null>(null);
@@ -69,6 +79,40 @@
error: 'Error'
};
const versionFilterOptions: Array<{ value: VersionStateFilter; label: string }> = [
{ value: 'all', label: 'All' },
{ value: 'pending', label: stateLabels.pending },
{ value: 'indexing', label: stateLabels.indexing },
{ value: 'indexed', label: stateLabels.indexed },
{ value: 'error', label: stateLabels.error }
];
const stageLabels: Record<string, string> = {
queued: 'Queued',
differential: 'Diff',
crawling: 'Crawling',
cloning: 'Cloning',
parsing: 'Parsing',
storing: 'Storing',
embedding: 'Embedding',
done: 'Done',
failed: 'Failed'
};
const filteredVersions = $derived(
activeVersionFilter === 'all'
? versions
: versions.filter((version) => version.state === activeVersionFilter)
);
const actionableErroredTags = $derived(
versions
.filter((version) => version.state === 'error' && !activeVersionJobs[version.tag])
.map((version) => version.tag)
);
const activeVersionFilterLabel = $derived(
versionFilterOptions.find((option) => option.value === activeVersionFilter)?.label ?? 'All'
);
async function refreshRepo() {
try {
const res = await fetch(`/api/v1/libs/${encodeURIComponent(repo.id)}`);
@@ -99,6 +143,79 @@
loadVersions();
});
// Single shared poller replaced with EventSource SSE stream
$effect(() => {
if (!repo.id) return;
let stopped = false;
const es = new EventSource(`/api/v1/jobs/stream?repositoryId=${encodeURIComponent(repo.id)}`);
es.addEventListener('job-progress', (event) => {
if (stopped) return;
try {
const data = JSON.parse(event.data) as IndexingJob;
versionJobProgress = { ...versionJobProgress, [data.id]: data };
} catch {
// ignore parse errors
}
});
es.addEventListener('job-done', (event) => {
if (stopped) return;
try {
const data = JSON.parse(event.data) as IndexingJob;
const next = { ...versionJobProgress };
delete next[data.id];
versionJobProgress = next;
void loadVersions();
void refreshRepo();
} catch {
// ignore parse errors
}
});
es.addEventListener('job-failed', (event) => {
if (stopped) return;
try {
const data = JSON.parse(event.data) as IndexingJob;
const next = { ...versionJobProgress };
delete next[data.id];
versionJobProgress = next;
void loadVersions();
void refreshRepo();
} catch {
// ignore parse errors
}
});
es.onerror = () => {
if (stopped) return;
es.close();
// Fall back to a single fetch for resilience
(async () => {
try {
const res = await fetch(
`/api/v1/jobs?repositoryId=${encodeURIComponent(repo.id)}&limit=1000`
);
if (!res.ok || stopped) return;
const d = await res.json();
const map: Record<string, IndexingJob> = {};
for (const job of (d.jobs ?? []) as IndexingJob[]) {
map[job.id] = job;
}
if (!stopped) versionJobProgress = map;
} catch {
// ignore errors
}
})();
};
return () => {
stopped = true;
es.close();
};
});
async function handleReindex() {
errorMessage = null;
successMessage = null;
@@ -115,6 +232,16 @@
activeJobId = d.job.id;
}
const versionCount = d.versionJobs?.length ?? 0;
if (versionCount > 0) {
let next = { ...activeVersionJobs };
for (const vj of d.versionJobs) {
const matched = versions.find((v) => v.id === vj.versionId);
if (matched) {
next = { ...next, [matched.tag]: vj.id };
}
}
activeVersionJobs = next;
}
successMessage =
versionCount > 0
? `Re-indexing started. Also queued ${versionCount} version job${versionCount === 1 ? '' : 's'}.`
@@ -157,6 +284,10 @@
const d = await res.json();
throw new Error(d.error ?? 'Failed to add version');
}
const d = await res.json();
if (d.job?.id) {
activeVersionJobs = { ...activeVersionJobs, [tag]: d.job.id };
}
addVersionTag = '';
await loadVersions();
} catch (e) {
@@ -169,6 +300,16 @@
async function handleIndexVersion(tag: string) {
errorMessage = null;
try {
const jobId = await queueVersionIndex(tag);
if (jobId) {
activeVersionJobs = { ...activeVersionJobs, [tag]: jobId };
}
} catch (e) {
errorMessage = (e as Error).message;
}
}
async function queueVersionIndex(tag: string): Promise<string | null> {
const res = await fetch(
`/api/v1/libs/${encodeURIComponent(repo.id)}/versions/${encodeURIComponent(tag)}/index`,
{ method: 'POST' }
@@ -177,9 +318,37 @@
const d = await res.json();
throw new Error(d.error ?? 'Failed to queue version indexing');
}
const d = await res.json();
return d.job?.id ?? null;
}
async function handleBulkReprocessErroredVersions() {
if (actionableErroredTags.length === 0) return;
bulkReprocessBusy = true;
errorMessage = null;
successMessage = null;
try {
const tags = [...actionableErroredTags];
const BATCH_SIZE = 5;
let next = { ...activeVersionJobs };
for (let i = 0; i < tags.length; i += BATCH_SIZE) {
const batch = tags.slice(i, i + BATCH_SIZE);
const jobIds = await Promise.all(batch.map((versionTag) => queueVersionIndex(versionTag)));
for (let j = 0; j < batch.length; j++) {
if (jobIds[j]) {
next = { ...next, [batch[j]]: jobIds[j] ?? undefined };
}
}
activeVersionJobs = next;
}
successMessage = `Queued ${tags.length} errored tag${tags.length === 1 ? '' : 's'} for reprocessing.`;
await loadVersions();
} catch (e) {
errorMessage = (e as Error).message;
} finally {
bulkReprocessBusy = false;
}
}
@@ -207,10 +376,9 @@
discoverBusy = true;
errorMessage = null;
try {
const res = await fetch(
`/api/v1/libs/${encodeURIComponent(repo.id)}/versions/discover`,
{ method: 'POST' }
);
const res = await fetch(`/api/v1/libs/${encodeURIComponent(repo.id)}/versions/discover`, {
method: 'POST'
});
if (!res.ok) {
const d = await res.json();
throw new Error(d.error ?? 'Failed to discover tags');
@@ -220,7 +388,10 @@
discoveredTags = (d.tags ?? []).filter(
(t: { tag: string; commitHash: string }) => !registeredTags.has(t.tag)
);
selectedDiscoveredTags = new Set(discoveredTags.map((t) => t.tag));
selectedDiscoveredTags.clear();
for (const discoveredTag of discoveredTags) {
selectedDiscoveredTags.add(discoveredTag.tag);
}
showDiscoverPanel = true;
} catch (e) {
errorMessage = (e as Error).message;
@@ -230,13 +401,11 @@
}
function toggleDiscoveredTag(tag: string) {
const next = new Set(selectedDiscoveredTags);
if (next.has(tag)) {
next.delete(tag);
if (selectedDiscoveredTags.has(tag)) {
selectedDiscoveredTags.delete(tag);
} else {
next.add(tag);
selectedDiscoveredTags.add(tag);
}
selectedDiscoveredTags = next;
}
async function handleRegisterSelected() {
@@ -244,8 +413,14 @@
registerBusy = true;
errorMessage = null;
try {
await Promise.all(
[...selectedDiscoveredTags].map((tag) =>
const tags = [...selectedDiscoveredTags];
const BATCH_SIZE = 5;
let next = { ...activeVersionJobs };
for (let i = 0; i < tags.length; i += BATCH_SIZE) {
const batch = tags.slice(i, i + BATCH_SIZE);
const responses = await Promise.all(
batch.map((tag) =>
fetch(`/api/v1/libs/${encodeURIComponent(repo.id)}/versions`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
@@ -253,9 +428,19 @@
})
)
);
const results = await Promise.all(responses.map((r) => (r.ok ? r.json() : null)));
for (let j = 0; j < batch.length; j++) {
const result = results[j];
if (result?.job?.id) {
next = { ...next, [batch[j]]: result.job.id };
}
}
}
activeVersionJobs = next;
showDiscoverPanel = false;
discoveredTags = [];
selectedDiscoveredTags = new Set();
selectedDiscoveredTags.clear();
await loadVersions();
} catch (e) {
errorMessage = (e as Error).message;
@@ -346,7 +531,13 @@
{#if activeJobId}
<div class="mt-4 rounded-xl border border-blue-100 bg-blue-50 p-4">
<p class="mb-2 text-sm font-medium text-blue-700">Indexing in progress</p>
<IndexingProgress jobId={activeJobId} />
<IndexingProgress
jobId={activeJobId}
oncomplete={() => {
activeJobId = null;
refreshRepo();
}}
/>
</div>
{:else if repo.state === 'error'}
<div class="mt-4 rounded-xl border border-red-100 bg-red-50 p-4">
@@ -367,9 +558,36 @@
<!-- Versions -->
<div class="mt-6 rounded-xl border border-gray-200 bg-white p-5">
<div class="mb-4 flex flex-wrap items-center justify-between gap-3">
<div class="mb-4 flex flex-col gap-3">
<div class="flex flex-wrap items-center justify-between gap-3">
<div class="flex flex-wrap items-center gap-3">
<h2 class="text-sm font-semibold text-gray-700">Versions</h2>
<div class="flex flex-wrap items-center gap-1 rounded-lg bg-gray-100 p-1">
{#each versionFilterOptions as option (option.value)}
<button
type="button"
onclick={() => (activeVersionFilter = option.value)}
class="rounded-md px-2.5 py-1 text-xs font-medium transition-colors {activeVersionFilter ===
option.value
? 'bg-white text-gray-900 shadow-sm'
: 'text-gray-500 hover:text-gray-700'}"
>
{option.label}
</button>
{/each}
</div>
</div>
<div class="flex flex-wrap items-center gap-2">
<button
type="button"
onclick={handleBulkReprocessErroredVersions}
disabled={bulkReprocessBusy || actionableErroredTags.length === 0}
class="rounded-lg border border-red-200 px-3 py-1.5 text-sm font-medium text-red-600 hover:bg-red-50 disabled:cursor-not-allowed disabled:opacity-50"
>
{bulkReprocessBusy
? 'Reprocessing...'
: `Reprocess errored${actionableErroredTags.length > 0 ? ` (${actionableErroredTags.length})` : ''}`}
</button>
<!-- Add version inline form -->
<form
onsubmit={(e) => {
@@ -404,6 +622,7 @@
{/if}
</div>
</div>
</div>
<!-- Discover panel -->
{#if showDiscoverPanel}
@@ -418,7 +637,7 @@
onclick={() => {
showDiscoverPanel = false;
discoveredTags = [];
selectedDiscoveredTags = new Set();
selectedDiscoveredTags.clear();
}}
class="text-xs text-blue-600 hover:underline"
>
@@ -436,7 +655,9 @@
class="rounded border-gray-300"
/>
<span class="font-mono text-gray-800">{discovered.tag}</span>
<span class="font-mono text-xs text-gray-400">{discovered.commitHash.slice(0, 8)}</span>
<span class="font-mono text-xs text-gray-400"
>{discovered.commitHash.slice(0, 8)}</span
>
</label>
{/each}
</div>
@@ -445,9 +666,7 @@
disabled={registerBusy || selectedDiscoveredTags.size === 0}
class="rounded-lg bg-blue-600 px-3 py-1.5 text-sm font-medium text-white hover:bg-blue-700 disabled:cursor-not-allowed disabled:opacity-50"
>
{registerBusy
? 'Registering...'
: `Register ${selectedDiscoveredTags.size} selected`}
{registerBusy ? 'Registering...' : `Register ${selectedDiscoveredTags.size} selected`}
</button>
{/if}
</div>
@@ -458,10 +677,17 @@
<p class="text-sm text-gray-400">Loading versions...</p>
{:else if versions.length === 0}
<p class="text-sm text-gray-400">No versions registered. Add a tag above to get started.</p>
{:else if filteredVersions.length === 0}
<div class="rounded-lg border border-dashed border-gray-200 bg-gray-50 px-4 py-5">
<p class="text-sm text-gray-500">
No versions match the {activeVersionFilterLabel.toLowerCase()} filter.
</p>
</div>
{:else}
<div class="divide-y divide-gray-100">
{#each versions as version (version.id)}
<div class="flex items-center justify-between py-2.5">
{#each filteredVersions as version (version.id)}
<div class="py-2.5">
<div class="flex items-center justify-between">
<div class="flex items-center gap-3">
<span class="font-mono text-sm font-medium text-gray-900">{version.tag}</span>
<span
@@ -474,10 +700,12 @@
<div class="flex items-center gap-2">
<button
onclick={() => handleIndexVersion(version.tag)}
disabled={version.state === 'indexing'}
disabled={version.state === 'indexing' || !!activeVersionJobs[version.tag]}
class="rounded-lg border border-blue-200 px-3 py-1 text-xs font-medium text-blue-600 hover:bg-blue-50 disabled:cursor-not-allowed disabled:opacity-50"
>
{version.state === 'indexing' ? 'Indexing...' : 'Index'}
{version.state === 'indexing' || !!activeVersionJobs[version.tag]
? 'Indexing...'
: 'Index'}
</button>
<button
onclick={() => (removeTag = version.tag)}
@@ -487,6 +715,50 @@
</button>
</div>
</div>
{#if version.totalSnippets > 0 || version.commitHash || version.indexedAt}
{@const metaParts = (
[
version.totalSnippets > 0
? { text: `${version.totalSnippets} snippets`, mono: false }
: null,
version.commitHash ? { text: version.commitHash.slice(0, 8), mono: true } : null,
version.indexedAt ? { text: formatDate(version.indexedAt), mono: false } : null
] as Array<{ text: string; mono: boolean } | null>
).filter((p): p is { text: string; mono: boolean } => p !== null)}
<div class="mt-1 flex items-center gap-1.5">
{#each metaParts as part, i (i)}
{#if i > 0}
<span class="text-xs text-gray-300">·</span>
{/if}
<span class="text-xs text-gray-400{part.mono ? ' font-mono' : ''}">{part.text}</span
>
{/each}
</div>
{/if}
{#if activeVersionJobs[version.tag]}
{@const job = versionJobProgress[activeVersionJobs[version.tag]!]}
<div class="mt-2">
<div class="flex justify-between text-xs text-gray-500">
<span>
{#if job?.stageDetail}{job.stageDetail}{:else}{(
job?.processedFiles ?? 0
).toLocaleString()} / {(job?.totalFiles ?? 0).toLocaleString()} files{/if}
{#if job?.stage}{' - ' + (stageLabels[job.stage] ?? job.stage)}{/if}
</span>
<span>{job?.progress ?? 0}%</span>
</div>
<div class="mt-1 h-1.5 w-full rounded-full bg-gray-200">
<div
class="h-1.5 rounded-full bg-blue-600 transition-all duration-300"
style="width: {job?.progress ?? 0}%"
></div>
</div>
{#if job?.status === 'failed'}
<p class="mt-1 text-xs text-red-600">{job.error ?? 'Indexing failed.'}</p>
{/if}
</div>
{/if}
</div>
{/each}
</div>
{/if}

View File

@@ -5,7 +5,9 @@ import { EmbeddingSettingsDtoMapper } from '$lib/server/mappers/embedding-settin
import { EmbeddingSettingsService } from '$lib/server/services/embedding-settings.service.js';
export const load: PageServerLoad = async () => {
const service = new EmbeddingSettingsService(getClient());
const db = getClient();
const service = new EmbeddingSettingsService(db);
const settings = EmbeddingSettingsDtoMapper.toDto(service.getSettings());
let localProviderAvailable = false;
@@ -15,8 +17,28 @@ export const load: PageServerLoad = async () => {
localProviderAvailable = false;
}
// Read indexing concurrency setting
let indexingConcurrency = 2;
const concurrencyRow = db
.prepare<[], { value: string }>("SELECT value FROM settings WHERE key = 'indexing.concurrency'")
.get();
if (concurrencyRow && concurrencyRow.value) {
try {
const parsed = JSON.parse(concurrencyRow.value);
if (typeof parsed === 'object' && parsed !== null && typeof parsed.value === 'number') {
indexingConcurrency = parsed.value;
} else if (typeof parsed === 'number') {
indexingConcurrency = parsed;
}
} catch {
indexingConcurrency = 2;
}
}
return {
settings,
localProviderAvailable
localProviderAvailable,
indexingConcurrency
};
};

View File

@@ -66,12 +66,19 @@
let saveError = $state<string | null>(null);
let saveStatusTimer: ReturnType<typeof setTimeout> | null = null;
let concurrencyInput = $derived(data.indexingConcurrency);
let concurrencySaving = $state(false);
let concurrencySaveStatus = $state<'idle' | 'ok' | 'error'>('idle');
let concurrencySaveError = $state<string | null>(null);
let concurrencySaveStatusTimer: ReturnType<typeof setTimeout> | null = null;
const currentSettings = $derived(settingsOverride ?? data.settings);
const activeProfile = $derived(currentSettings.activeProfile);
const activeConfigEntries = $derived(activeProfile?.configEntries ?? []);
onDestroy(() => {
if (saveStatusTimer) clearTimeout(saveStatusTimer);
if (concurrencySaveStatusTimer) clearTimeout(concurrencySaveStatusTimer);
});
// ---------------------------------------------------------------------------
@@ -159,8 +166,42 @@
void save();
}
async function saveConcurrency() {
concurrencySaving = true;
concurrencySaveStatus = 'idle';
concurrencySaveError = null;
try {
const res = await fetch('/api/v1/settings/indexing', {
method: 'PUT',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ concurrency: concurrencyInput })
});
if (res.ok) {
const updated = await res.json();
concurrencyInput = updated.concurrency;
concurrencySaveStatus = 'ok';
if (concurrencySaveStatusTimer) clearTimeout(concurrencySaveStatusTimer);
concurrencySaveStatusTimer = setTimeout(() => {
concurrencySaveStatus = 'idle';
concurrencySaveStatusTimer = null;
}, 3000);
} else {
const data = await res.json();
concurrencySaveStatus = 'error';
concurrencySaveError = data.error ?? 'Save failed';
}
} catch (e) {
concurrencySaveStatus = 'error';
concurrencySaveError = (e as Error).message;
} finally {
concurrencySaving = false;
}
}
function getOpenAiProfile(settings: EmbeddingSettingsDto): EmbeddingProfileDto | null {
return settings.profiles.find((profile) => profile.providerKind === 'openai-compatible') ?? null;
return (
settings.profiles.find((profile) => profile.providerKind === 'openai-compatible') ?? null
);
}
function resolveProvider(profile: EmbeddingProfileDto | null): 'none' | 'openai' | 'local' {
@@ -171,7 +212,8 @@
}
function resolveBaseUrl(settings: EmbeddingSettingsDto): string {
const profile = settings.activeProfile?.providerKind === 'openai-compatible'
const profile =
settings.activeProfile?.providerKind === 'openai-compatible'
? settings.activeProfile
: getOpenAiProfile(settings);
return typeof profile?.config.baseUrl === 'string'
@@ -180,16 +222,18 @@
}
function resolveModel(settings: EmbeddingSettingsDto): string {
const profile = settings.activeProfile?.providerKind === 'openai-compatible'
const profile =
settings.activeProfile?.providerKind === 'openai-compatible'
? settings.activeProfile
: getOpenAiProfile(settings);
return typeof profile?.config.model === 'string'
? profile.config.model
: profile?.model ?? 'text-embedding-3-small';
: (profile?.model ?? 'text-embedding-3-small');
}
function resolveDimensions(settings: EmbeddingSettingsDto): number | undefined {
const profile = settings.activeProfile?.providerKind === 'openai-compatible'
const profile =
settings.activeProfile?.providerKind === 'openai-compatible'
? settings.activeProfile
: getOpenAiProfile(settings);
return profile?.dimensions ?? 1536;
@@ -257,7 +301,7 @@
<dt class="font-medium text-gray-500">Provider</dt>
<dd class="font-semibold text-gray-900">{activeProfile.providerKind}</dd>
<dt class="font-medium text-gray-500">Model</dt>
<dd class="break-all font-semibold text-gray-900">{activeProfile.model}</dd>
<dd class="font-semibold break-all text-gray-900">{activeProfile.model}</dd>
<dt class="font-medium text-gray-500">Dimensions</dt>
<dd class="font-semibold text-gray-900">{activeProfile.dimensions}</dd>
</div>
@@ -275,16 +319,20 @@
<div class="rounded-lg border border-gray-200 bg-gray-50 p-4">
<p class="text-sm font-medium text-gray-800">Provider configuration</p>
<p class="mb-3 mt-1 text-sm text-gray-500">
<p class="mt-1 mb-3 text-sm text-gray-500">
These are the provider-specific settings currently saved for the active profile.
</p>
{#if activeConfigEntries.length > 0}
<ul class="space-y-2 text-sm">
{#each activeConfigEntries as entry (entry.key)}
<li class="flex items-start justify-between gap-4 border-b border-gray-200 pb-2 last:border-b-0 last:pb-0">
<li
class="flex items-start justify-between gap-4 border-b border-gray-200 pb-2 last:border-b-0 last:pb-0"
>
<span class="font-medium text-gray-600">{entry.key}</span>
<span class={entry.redacted ? 'text-gray-500' : 'text-gray-800'}>{entry.value}</span>
<span class={entry.redacted ? 'text-gray-500' : 'text-gray-800'}
>{entry.value}</span
>
</li>
{/each}
</ul>
@@ -293,9 +341,9 @@
No provider-specific configuration is stored for this profile.
</p>
<p class="mt-2 text-sm text-gray-500">
For <span class="font-medium text-gray-700">OpenAI-compatible</span> profiles, edit the
settings in the <span class="font-medium text-gray-700">Embedding Provider</span> form
below. The built-in <span class="font-medium text-gray-700">Local Model</span> profile
For <span class="font-medium text-gray-700">OpenAI-compatible</span> profiles, edit
the settings in the <span class="font-medium text-gray-700">Embedding Provider</span>
form below. The built-in <span class="font-medium text-gray-700">Local Model</span> profile
does not currently expose extra configurable fields.
</p>
{/if}
@@ -303,14 +351,17 @@
</div>
{:else}
<div class="rounded-lg border border-amber-200 bg-amber-50 p-4 text-sm text-amber-800">
Embeddings are currently disabled. Keyword search remains available, but no embedding profile is active.
Embeddings are currently disabled. Keyword search remains available, but no embedding
profile is active.
</div>
{/if}
</div>
<div class="rounded-xl border border-gray-200 bg-white p-6">
<h2 class="mb-1 text-base font-semibold text-gray-900">Profile Inventory</h2>
<p class="mb-4 text-sm text-gray-500">Profiles stored in the database and available for activation.</p>
<p class="mb-4 text-sm text-gray-500">
Profiles stored in the database and available for activation.
</p>
<div class="grid grid-cols-2 gap-3">
<StatBadge label="Profiles" value={String(currentSettings.profiles.length)} />
<StatBadge label="Active" value={activeProfile ? '1' : '0'} />
@@ -324,7 +375,9 @@
<p class="text-gray-500">{profile.id}</p>
</div>
{#if profile.id === currentSettings.activeProfileId}
<span class="rounded-full bg-blue-50 px-2 py-0.5 text-xs font-medium text-blue-700">Active</span>
<span class="rounded-full bg-blue-50 px-2 py-0.5 text-xs font-medium text-blue-700"
>Active</span
>
{/if}
</div>
</div>
@@ -357,11 +410,7 @@
: 'border border-gray-200 text-gray-700 hover:bg-gray-50'
].join(' ')}
>
{p === 'none'
? 'None (FTS5 only)'
: p === 'openai'
? 'OpenAI-compatible'
: 'Local Model'}
{p === 'none' ? 'None (FTS5 only)' : p === 'openai' ? 'OpenAI-compatible' : 'Local Model'}
</button>
{/each}
</div>
@@ -482,6 +531,45 @@
</div>
{/if}
<!-- Indexing section -->
<div class="space-y-3 rounded-lg border border-gray-200 bg-white p-4">
<div>
<label for="concurrency" class="block text-sm font-medium text-gray-700">
Concurrent Workers
</label>
<p class="mt-0.5 text-xs text-gray-500">
Number of parallel indexing workers. Range: 1 to 8.
</p>
</div>
<div class="flex items-center gap-3">
<input
id="concurrency"
type="number"
min="1"
max="8"
inputmode="numeric"
bind:value={concurrencyInput}
disabled={concurrencySaving}
class="w-20 rounded-lg border border-gray-300 px-3 py-2 text-sm focus:border-blue-500 focus:outline-none disabled:opacity-50"
/>
<button
type="button"
onclick={saveConcurrency}
disabled={concurrencySaving}
class="rounded-lg bg-blue-600 px-3 py-2 text-sm text-white hover:bg-blue-700 disabled:opacity-50"
>
{concurrencySaving ? 'Saving…' : 'Save'}
</button>
{#if concurrencySaveStatus === 'ok'}
<span class="text-sm text-green-600">✓ Saved</span>
{:else if concurrencySaveStatus === 'error'}
<span class="text-sm text-red-600">{concurrencySaveError}</span>
{/if}
</div>
</div>
<!-- Save feedback banners -->
{#if saveStatus === 'ok'}
<div