TRUEREF-0023 rewrite indexing pipeline - parallel reads - serialized writes
This commit is contained in:
20
README.md
20
README.md
@@ -214,16 +214,16 @@ For GitHub repositories, TrueRef fetches the file from the default branch root.
|
||||
|
||||
### Fields
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|---|---|---|---|
|
||||
| `$schema` | string | No | URL to the live JSON Schema for editor validation |
|
||||
| `projectTitle` | string | No | Display name override (max 100 chars) |
|
||||
| `description` | string | No | Library description used for search ranking (10–500 chars) |
|
||||
| `folders` | string[] | No | Path prefixes or regex strings to **include** (max 50 items). If absent, all folders are included |
|
||||
| `excludeFolders` | string[] | No | Path prefixes or regex strings to **exclude** after the `folders` allowlist (max 50 items) |
|
||||
| `excludeFiles` | string[] | No | Exact filenames to skip — no path, no glob (max 100 items) |
|
||||
| `rules` | string[] | No | Best-practice rules prepended to every `query-docs` response (max 20 rules, 5–500 chars each) |
|
||||
| `previousVersions` | object[] | No | Version tags to register when the repository is indexed (max 50 entries) |
|
||||
| Field | Type | Required | Description |
|
||||
| ------------------ | -------- | -------- | ------------------------------------------------------------------------------------------------- |
|
||||
| `$schema` | string | No | URL to the live JSON Schema for editor validation |
|
||||
| `projectTitle` | string | No | Display name override (max 100 chars) |
|
||||
| `description` | string | No | Library description used for search ranking (10–500 chars) |
|
||||
| `folders` | string[] | No | Path prefixes or regex strings to **include** (max 50 items). If absent, all folders are included |
|
||||
| `excludeFolders` | string[] | No | Path prefixes or regex strings to **exclude** after the `folders` allowlist (max 50 items) |
|
||||
| `excludeFiles` | string[] | No | Exact filenames to skip — no path, no glob (max 100 items) |
|
||||
| `rules` | string[] | No | Best-practice rules prepended to every `query-docs` response (max 20 rules, 5–500 chars each) |
|
||||
| `previousVersions` | object[] | No | Version tags to register when the repository is indexed (max 50 entries) |
|
||||
|
||||
`previousVersions` entries each require a `tag` (e.g. `"v1.2.3"`) and a `title` (e.g. `"Version 1.2.3"`).
|
||||
|
||||
|
||||
@@ -335,3 +335,47 @@ Add subsequent research below this section.
|
||||
- Risks / follow-ups:
|
||||
- Iteration 2 task decomposition must treat the current dirty code files from iterations 0 and 1 as the validation baseline, otherwise the executor will keep rediscovering pre-existing worktree drift instead of new task deltas.
|
||||
- The sqlite-vec bootstrap helper and the relational cleanup should be planned as one acceptance unit before any downstream vec0, worker-status, or admin-page tasks, because that is the smallest unit that removes the known broken intermediate state.
|
||||
|
||||
### 2026-04-01T00:00:00.000Z — TRUEREF-0023 iteration 3 navbar follow-up planning research
|
||||
|
||||
- Task: Plan the accepted follow-up request to add an admin route to the main navbar.
|
||||
- Files inspected:
|
||||
- `prompts/TRUEREF-0023/progress.yaml`
|
||||
- `prompts/TRUEREF-0023/iteration_2/review_report.yaml`
|
||||
- `prompts/TRUEREF-0023/prompt.yaml`
|
||||
- `package.json`
|
||||
- `src/routes/+layout.svelte`
|
||||
- `src/routes/admin/jobs/+page.svelte`
|
||||
- Findings:
|
||||
- The accepted iteration-2 workspace is green: `review_report.yaml` records passing build, passing tests, and no workspace diagnostics, so this request is a narrow additive follow-up rather than a rework of the sqlite-vec/admin jobs implementation.
|
||||
- The main navbar is defined entirely in `src/routes/+layout.svelte` and already uses base-aware SvelteKit navigation via `resolve as resolveRoute` from `$app/paths` for the existing `Repositories`, `Search`, and `Settings` links.
|
||||
- The existing admin surface already lives at `src/routes/admin/jobs/+page.svelte`, which sets the page title to `Job Queue - TrueRef Admin`; adding a navbar entry can therefore target `/admin/jobs` directly without introducing new routes, loaders, or components.
|
||||
- Repository findings from the earlier lint planning work already confirm the codebase expectation to avoid root-relative internal navigation in SvelteKit pages and components, so the new navbar link should follow the existing `resolveRoute('/...')` anchor pattern.
|
||||
- No dedicated test file currently covers the shared navbar. The appropriate validation for this follow-up remains repository-level `npm run build` and `npm test` after the single layout edit.
|
||||
- Risks / follow-ups:
|
||||
- The follow-up navigation request should stay isolated to the shared layout so it does not reopen the accepted sqlite-vec implementation surface.
|
||||
- Build and test validation remain the appropriate regression checks because no dedicated navbar test currently exists.
|
||||
|
||||
### 2026-04-01T12:05:23.000Z — TRUEREF-0023 iteration 5 tabs filter and bulk reprocess planning research
|
||||
|
||||
- Task: Plan the follow-up repo-detail UI change to filter version rows in the tabs/tags view and add a bulk action that reprocesses all errored tags without adding a new backend endpoint.
|
||||
- Files inspected:
|
||||
- `prompts/TRUEREF-0023/progress.yaml`
|
||||
- `prompts/TRUEREF-0023/prompt.yaml`
|
||||
- `prompts/TRUEREF-0023/iteration_2/plan.md`
|
||||
- `prompts/TRUEREF-0023/iteration_2/tasks.yaml`
|
||||
- `src/routes/repos/[id]/+page.svelte`
|
||||
- `src/routes/api/v1/libs/[id]/versions/[tag]/index/+server.ts`
|
||||
- `src/routes/api/v1/api-contract.integration.test.ts`
|
||||
- `package.json`
|
||||
- Findings:
|
||||
- The relevant UI surface is entirely in `src/routes/repos/[id]/+page.svelte`; the page already loads `versions`, renders per-version state badges, and exposes per-tag `Index` and `Remove` buttons.
|
||||
- Version states are concretely `pending`, `indexing`, `indexed`, and `error`, and the page already centralizes their labels and color classes in `stateLabels` and `stateColors`.
|
||||
- Existing per-tag reprocessing is implemented by `handleIndexVersion(tag)`, which POSTs to `/api/v1/libs/:id/versions/:tag/index`; the corresponding backend route exists and returns a queued job DTO with status `202`.
|
||||
- No bulk reprocess endpoint exists, so the lowest-risk implementation is a UI-only bulk action that iterates the existing per-tag route.
|
||||
- The page already contains a bounded batching pattern in `handleRegisterSelected()` with `BATCH_SIZE = 5`, which provides a concrete local precedent for bulk tag operations without inventing a new concurrency model.
|
||||
- There is no existing page-component or browser test targeting `src/routes/repos/[id]/+page.svelte`; nearby automated coverage is API-contract focused, so this iteration should rely on `npm run build` and `npm test` regression validation unless a developer discovers an existing Svelte page harness during implementation.
|
||||
- Context7 lookup for Svelte and SvelteKit could not be completed in this environment because the configured API key is invalid; planning therefore relied on installed versions from `package.json` (`svelte` `^5.51.0`, `@sveltejs/kit` `^2.50.2`) and the live page patterns already present in the repository.
|
||||
- Risks / follow-ups:
|
||||
- Bulk reprocessing must avoid queuing duplicate jobs for tags already shown as `indexing` or already tracked in `activeVersionJobs`.
|
||||
- Filter state should be implemented as local UI state only and must not disturb the existing `onMount(loadVersions)` fetch path or the SSE job-progress flow.
|
||||
|
||||
@@ -47,8 +47,8 @@ Executed in `IndexingPipeline.run()` before the crawl, when the job has a `versi
|
||||
containing shell metacharacters).
|
||||
|
||||
3. **Path partitioning**: The changed-file list is split into `changedPaths` (added + modified
|
||||
+ renamed-destination) and `deletedPaths`. `unchangedPaths` is derived as
|
||||
`ancestorFilePaths − changedPaths − deletedPaths`.
|
||||
- renamed-destination) and `deletedPaths`. `unchangedPaths` is derived as
|
||||
`ancestorFilePaths − changedPaths − deletedPaths`.
|
||||
|
||||
4. **Guard**: Returns `null` when no indexed ancestor exists, when the ancestor has no indexed
|
||||
documents, or when all files changed (nothing to clone).
|
||||
@@ -74,18 +74,18 @@ matching files are returned. This minimises GitHub API requests and local I/O.
|
||||
|
||||
## API Surface Changes
|
||||
|
||||
| Symbol | Location | Change |
|
||||
|---|---|---|
|
||||
| `buildDifferentialPlan` | `pipeline/differential-strategy.ts` | **New** — async function |
|
||||
| `DifferentialPlan` | `pipeline/differential-strategy.ts` | **New** — interface |
|
||||
| `findBestAncestorVersion` | `utils/tag-order.ts` | **New** — pure function |
|
||||
| `fetchGitHubChangedFiles` | `crawler/github-compare.ts` | **New** — async function |
|
||||
| `getChangedFilesBetweenRefs` | `utils/git.ts` | **New** — sync function (uses `execFileSync`) |
|
||||
| `ChangedFile` | `crawler/types.ts` | **New** — interface |
|
||||
| `CrawlOptions.allowedPaths` | `crawler/types.ts` | **New** — optional field |
|
||||
| `IndexingPipeline.crawl()` | `pipeline/indexing.pipeline.ts` | **Modified** — added `allowedPaths` param |
|
||||
| `IndexingPipeline.cloneFromAncestor()` | `pipeline/indexing.pipeline.ts` | **New** — private method |
|
||||
| `IndexingPipeline.run()` | `pipeline/indexing.pipeline.ts` | **Modified** — Stage 0 added |
|
||||
| Symbol | Location | Change |
|
||||
| -------------------------------------- | ----------------------------------- | --------------------------------------------- |
|
||||
| `buildDifferentialPlan` | `pipeline/differential-strategy.ts` | **New** — async function |
|
||||
| `DifferentialPlan` | `pipeline/differential-strategy.ts` | **New** — interface |
|
||||
| `findBestAncestorVersion` | `utils/tag-order.ts` | **New** — pure function |
|
||||
| `fetchGitHubChangedFiles` | `crawler/github-compare.ts` | **New** — async function |
|
||||
| `getChangedFilesBetweenRefs` | `utils/git.ts` | **New** — sync function (uses `execFileSync`) |
|
||||
| `ChangedFile` | `crawler/types.ts` | **New** — interface |
|
||||
| `CrawlOptions.allowedPaths` | `crawler/types.ts` | **New** — optional field |
|
||||
| `IndexingPipeline.crawl()` | `pipeline/indexing.pipeline.ts` | **Modified** — added `allowedPaths` param |
|
||||
| `IndexingPipeline.cloneFromAncestor()` | `pipeline/indexing.pipeline.ts` | **New** — private method |
|
||||
| `IndexingPipeline.run()` | `pipeline/indexing.pipeline.ts` | **Modified** — Stage 0 added |
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -88,6 +88,7 @@ The UI currently polls `GET /api/v1/jobs?repositoryId=...` every 2 seconds. This
|
||||
#### Worker Thread lifecycle
|
||||
|
||||
Each worker is a long-lived `node:worker_threads` `Worker` instance that:
|
||||
|
||||
1. Opens its own `better-sqlite3` connection to the same database file.
|
||||
2. Listens for `{ type: 'run', jobId }` messages from the main thread.
|
||||
3. Runs `IndexingPipeline.run(job)`, emitting `postMessage` progress events at each stage boundary and every N files.
|
||||
@@ -100,18 +101,18 @@ Manages a pool of `concurrency` workers.
|
||||
|
||||
```typescript
|
||||
interface WorkerPoolOptions {
|
||||
concurrency: number; // default: Math.max(1, os.cpus().length - 1), capped at 4
|
||||
workerScript: string; // absolute path to the compiled worker entry
|
||||
concurrency: number; // default: Math.max(1, os.cpus().length - 1), capped at 4
|
||||
workerScript: string; // absolute path to the compiled worker entry
|
||||
}
|
||||
|
||||
class WorkerPool {
|
||||
private workers: Worker[];
|
||||
private idle: Worker[];
|
||||
private workers: Worker[];
|
||||
private idle: Worker[];
|
||||
|
||||
enqueue(jobId: string): void;
|
||||
private dispatch(worker: Worker, jobId: string): void;
|
||||
private onWorkerMessage(msg: WorkerMessage): void;
|
||||
private onWorkerExit(worker: Worker, code: number): void;
|
||||
enqueue(jobId: string): void;
|
||||
private dispatch(worker: Worker, jobId: string): void;
|
||||
private onWorkerMessage(msg: WorkerMessage): void;
|
||||
private onWorkerExit(worker: Worker, code: number): void;
|
||||
}
|
||||
```
|
||||
|
||||
@@ -120,12 +121,14 @@ Workers are kept alive across jobs. If a worker crashes (non-zero exit), the poo
|
||||
#### Parallelism and write contention
|
||||
|
||||
With WAL mode enabled (already the case), SQLite supports:
|
||||
|
||||
- **One concurrent writer** (the transaction lock)
|
||||
- **Many concurrent readers**
|
||||
|
||||
The `replaceSnippets` transaction for different repositories never contends — they write different rows. The `cloneFromAncestor` operation writes to the same tables but different `version_id` values, so WAL checkpoint logic keeps them non-overlapping at the page level.
|
||||
|
||||
Two jobs on the **same repository** (e.g. `/my-lib/v1.0.0` and `/my-lib/v2.0.0`) can run in parallel because:
|
||||
|
||||
- Differential indexing (TRUEREF-0021) ensures `v2.0.0` reads from `v1.0.0`'s already-committed rows.
|
||||
- The write transactions for each version touch disjoint `version_id` partitions.
|
||||
|
||||
@@ -134,6 +137,7 @@ If write contention still occurs under parallel load, `busy_timeout = 5000` (alr
|
||||
#### Concurrency limit per repository
|
||||
|
||||
To prevent a user from queuing 500 tags and overwhelming the worker pool, the pool enforces:
|
||||
|
||||
- **Max 1 running job per repository** for the default branch (re-index).
|
||||
- **Max `concurrency` total running jobs** across all repositories.
|
||||
- Version jobs for the same repository are serialised within the pool (the queue picks the oldest queued version job for a given repo only when no other version job for that repo is running).
|
||||
@@ -148,15 +152,15 @@ Replace the opaque integer progress with a structured stage model:
|
||||
|
||||
```typescript
|
||||
type IndexingStage =
|
||||
| 'queued'
|
||||
| 'differential' // computing ancestor diff
|
||||
| 'crawling' // fetching files from GitHub or local FS
|
||||
| 'cloning' // cloning unchanged files from ancestor (differential only)
|
||||
| 'parsing' // parsing files into snippets
|
||||
| 'storing' // writing documents + snippets to DB
|
||||
| 'embedding' // generating vector embeddings
|
||||
| 'done'
|
||||
| 'failed';
|
||||
| 'queued'
|
||||
| 'differential' // computing ancestor diff
|
||||
| 'crawling' // fetching files from GitHub or local FS
|
||||
| 'cloning' // cloning unchanged files from ancestor (differential only)
|
||||
| 'parsing' // parsing files into snippets
|
||||
| 'storing' // writing documents + snippets to DB
|
||||
| 'embedding' // generating vector embeddings
|
||||
| 'done'
|
||||
| 'failed';
|
||||
```
|
||||
|
||||
### Extended Job Schema
|
||||
@@ -172,22 +176,24 @@ The `progress` column (0–100) is retained for backward compatibility and overa
|
||||
|
||||
```typescript
|
||||
interface ProgressMessage {
|
||||
type: 'progress';
|
||||
jobId: string;
|
||||
stage: IndexingStage;
|
||||
stageDetail?: string; // human-readable detail for the current stage
|
||||
progress: number; // 0–100 overall
|
||||
processedFiles: number;
|
||||
totalFiles: number;
|
||||
type: 'progress';
|
||||
jobId: string;
|
||||
stage: IndexingStage;
|
||||
stageDetail?: string; // human-readable detail for the current stage
|
||||
progress: number; // 0–100 overall
|
||||
processedFiles: number;
|
||||
totalFiles: number;
|
||||
}
|
||||
```
|
||||
|
||||
Workers emit this message:
|
||||
|
||||
- On every stage transition (crawl start, parse start, store start, embed start).
|
||||
- Every `PROGRESS_EMIT_EVERY = 10` files during the parse loop.
|
||||
- On job completion or failure.
|
||||
|
||||
The main thread receives these messages and does two things:
|
||||
|
||||
1. Writes the update to `indexing_jobs` in SQLite (batched — one write per message, not per file).
|
||||
2. Pushes the payload to any open SSE channels for that jobId.
|
||||
|
||||
@@ -198,6 +204,7 @@ The main thread receives these messages and does two things:
|
||||
### `GET /api/v1/jobs/:id/stream`
|
||||
|
||||
Opens an SSE connection for a specific job. The server:
|
||||
|
||||
1. Sends the current job state as the first event immediately (no initial lag).
|
||||
2. Pushes `ProgressMessage` events as the worker emits them.
|
||||
3. Sends a final `event: done` or `event: failed` event, then closes the connection.
|
||||
@@ -281,7 +288,7 @@ Expose via the settings table (key `indexing.concurrency`):
|
||||
|
||||
```typescript
|
||||
interface IndexingSettings {
|
||||
concurrency: number; // 1–max(cpus-1, 1); default 2
|
||||
concurrency: number; // 1–max(cpus-1, 1); default 2
|
||||
}
|
||||
```
|
||||
|
||||
@@ -362,13 +369,13 @@ The embedding stage must **not** run inside the same Worker Thread as the crawl/
|
||||
|
||||
### Why a dedicated embedding worker
|
||||
|
||||
| Concern | Per-parse-worker model | Dedicated embedding worker |
|
||||
|---|---|---|
|
||||
| Memory | N × ~100 MB (model weights + WASM heap) per worker | 1 × ~100 MB regardless of concurrency |
|
||||
| Model warm-up | Paid once per worker spawn; cold starts slow | Paid once at server startup |
|
||||
| Batch size | Each worker batches only its own job's snippets | All in-flight jobs queue to one worker → larger batches → higher WASM throughput |
|
||||
| Provider migration | Must update every worker | Update one file |
|
||||
| API rate limiting | N parallel streams to the same API → N×rate-limit hits | One serial stream, naturally throttled |
|
||||
| Concern | Per-parse-worker model | Dedicated embedding worker |
|
||||
| ------------------ | ------------------------------------------------------ | -------------------------------------------------------------------------------- |
|
||||
| Memory | N × ~100 MB (model weights + WASM heap) per worker | 1 × ~100 MB regardless of concurrency |
|
||||
| Model warm-up | Paid once per worker spawn; cold starts slow | Paid once at server startup |
|
||||
| Batch size | Each worker batches only its own job's snippets | All in-flight jobs queue to one worker → larger batches → higher WASM throughput |
|
||||
| Provider migration | Must update every worker | Update one file |
|
||||
| API rate limiting | N parallel streams to the same API → N×rate-limit hits | One serial stream, naturally throttled |
|
||||
|
||||
With `Xenova/all-MiniLM-L6-v2`, the WASM model and weight files occupy ~90–120 MB of heap. Running three parse workers with embedded model loading costs ~300–360 MB of resident memory that can never be freed while the server is alive. A dedicated worker keeps that cost fixed at one instance.
|
||||
|
||||
@@ -415,6 +422,7 @@ Instead, the existing `findSnippetIdsMissingEmbeddings` query is the handshake:
|
||||
5. Main thread routes this to the SSE broadcaster → UI updates the embedding progress slice.
|
||||
|
||||
This means:
|
||||
|
||||
- The embedding worker reads snippet text from the DB itself (no IPC serialisation of content).
|
||||
- The model is loaded once, stays warm, and processes batches from all repositories in FIFO order.
|
||||
- Parse workers are never blocked waiting for embeddings — they complete their job stages and exit immediately.
|
||||
@@ -424,15 +432,15 @@ This means:
|
||||
```typescript
|
||||
// Main → Embedding worker
|
||||
type EmbedRequest =
|
||||
| { type: 'embed'; jobId: string; repositoryId: string; versionId: string | null }
|
||||
| { type: 'shutdown' };
|
||||
| { type: 'embed'; jobId: string; repositoryId: string; versionId: string | null }
|
||||
| { type: 'shutdown' };
|
||||
|
||||
// Embedding worker → Main
|
||||
type EmbedResponse =
|
||||
| { type: 'embed-progress'; jobId: string; done: number; total: number }
|
||||
| { type: 'embed-done'; jobId: string }
|
||||
| { type: 'embed-failed'; jobId: string; error: string }
|
||||
| { type: 'ready' }; // emitted once after model warm-up completes
|
||||
| { type: 'embed-progress'; jobId: string; done: number; total: number }
|
||||
| { type: 'embed-done'; jobId: string }
|
||||
| { type: 'embed-failed'; jobId: string; error: string }
|
||||
| { type: 'ready' }; // emitted once after model warm-up completes
|
||||
```
|
||||
|
||||
The `ready` message allows the server startup sequence to defer routing any embed requests until the model is loaded, preventing a race on first-run.
|
||||
|
||||
955
docs/features/TRUEREF-0023.md
Normal file
955
docs/features/TRUEREF-0023.md
Normal file
@@ -0,0 +1,955 @@
|
||||
# TRUEREF-0023 — libSQL Migration, Native Vector Search, Parallel Tag Indexing, and Performance Hardening
|
||||
|
||||
**Priority:** P1
|
||||
**Status:** Draft
|
||||
**Depends On:** TRUEREF-0001, TRUEREF-0022
|
||||
**Blocks:** —
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
TrueRef currently uses `better-sqlite3` for all database access. This creates three compounding performance problems:
|
||||
|
||||
1. **Vector search does not scale.** `VectorSearch.vectorSearch()` loads the entire `snippet_embeddings` table for a repository into Node.js memory and computes cosine similarity in a JavaScript loop. A repository with 100k snippets at 1536 OpenAI dimensions allocates ~600 MB per query and ties up the worker thread for seconds before returning results.
|
||||
2. **Missing composite indexes cause table scans on every query.** The schema defines FK columns used in every search and embedding filter, but declares zero composite or covering indexes on them. Every call to `searchSnippets`, `findSnippetIdsMissingEmbeddings`, and `cloneFromAncestor` performs full or near-full table scans.
|
||||
3. **SQLite connection is under-configured.** Critical pragmas (`synchronous`, `cache_size`, `mmap_size`, `temp_store`) are absent, leaving significant I/O throughput on the table.
|
||||
|
||||
The solution is to replace `better-sqlite3` with `@libsql/better-sqlite3` — an embeddable, drop-in synchronous replacement that is a superset of the better-sqlite3 API and exposes libSQL's native vector index (`libsql_vector_idx`). Because the API is identical, no service layer or ORM code changes are needed beyond import statements and the vector search implementation.
|
||||
|
||||
Two additional structural improvements are delivered in the same feature:
|
||||
|
||||
4. **Per-repo job serialization is too coarse.** `WorkerPool` prevents any two jobs sharing the same `repositoryId` from running in parallel. This means indexing 200 tags of a single library is fully sequential — one tag at a time — even though different tags write to entirely disjoint row sets. The constraint should track `(repositoryId, versionId)` pairs instead.
|
||||
5. **Write lock contention under parallel indexing.** When multiple parse workers flush parsed snippets simultaneously they all compete for the SQLite write lock, spending most of their time in `busy_timeout` back-off. A single dedicated write worker eliminates this: parse workers become pure CPU workers (crawl → parse → send batches over `postMessage`) and the write worker is the sole DB writer.
|
||||
6. **Admin UI is unusable under load.** The job queue page has no status or repository filters, no worker status panel, no skeleton loading, uses blocking `alert()` / `confirm()` dialogs, and `IndexingProgress` still polls every 2 seconds instead of consuming the existing SSE stream.
|
||||
|
||||
---
|
||||
|
||||
## Goals
|
||||
|
||||
1. Replace `better-sqlite3` with `@libsql/better-sqlite3` with minimal code churn — import paths only.
|
||||
2. Add a libSQL vector index on `snippet_embeddings` so that KNN queries execute inside SQLite instead of in a JavaScript loop.
|
||||
3. Add the six composite and covering indexes required by the hot query paths.
|
||||
4. Tune the SQLite pragma configuration for I/O performance.
|
||||
5. Eliminate the leading cause of OOM risk during semantic search.
|
||||
6. Keep a single embedded database file — no external server, no network.
|
||||
7. Allow multiple tags of the same repository to index in parallel (unrelated version rows, no write conflict).
|
||||
8. Eliminate write-lock contention between parallel parse workers by introducing a single dedicated write worker.
|
||||
9. Rebuild the admin jobs page with full filtering (status, repository, free-text), a live worker status panel, skeleton loading on initial fetch, per-action inline spinners, non-blocking toast notifications, and SSE-driven real-time updates throughout.
|
||||
|
||||
---
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- Migrating to the async `@libsql/client` package (HTTP/embedded-replica mode).
|
||||
- Changing the Drizzle ORM adapter (`drizzle-orm/better-sqlite3` stays unchanged).
|
||||
- Changing `drizzle.config.ts` dialect (`sqlite` is still correct for embedded libSQL).
|
||||
- Adding hybrid/approximate indexing beyond the default HNSW strategy provided by `libsql_vector_idx`.
|
||||
- Parallelizing embedding batches across providers (separate feature).
|
||||
- Horizontally scaling across processes.
|
||||
- Allowing more than one job for the exact same `(repositoryId, versionId)` pair to run concurrently (still serialized — duplicate detection in `JobQueue` is unchanged).
|
||||
- A full admin authentication system (out of scope).
|
||||
- Mobile-responsive redesign of the entire admin section (out of scope).
|
||||
|
||||
---
|
||||
|
||||
## Problem Detail
|
||||
|
||||
### 1. Vector Search — Full Table Scan in JavaScript
|
||||
|
||||
**File:** `src/lib/server/search/vector.search.ts`
|
||||
|
||||
```typescript
|
||||
// Current: no LIMIT, loads ALL embeddings for repo into memory
|
||||
const rows = this.db.prepare<unknown[], RawEmbeddingRow>(sql).all(...params);
|
||||
|
||||
const scored: VectorSearchResult[] = rows.map((row) => {
|
||||
const embedding = new Float32Array(
|
||||
row.embedding.buffer,
|
||||
row.embedding.byteOffset,
|
||||
row.embedding.byteLength / 4
|
||||
);
|
||||
return { snippetId: row.snippet_id, score: cosineSimilarity(queryEmbedding, embedding) };
|
||||
});
|
||||
|
||||
return scored.sort((a, b) => b.score - a.score).slice(0, limit);
|
||||
```
|
||||
|
||||
For a repo with N snippets and D dimensions, this allocates `N × D × 4` bytes per query. At N=100k and D=1536, that is ~600 MB allocated synchronously. The result is sorted entirely in JS before the top-k is returned. With a native vector index, SQLite returns only the top-k rows.
|
||||
|
||||
### 2. Missing Composite Indexes
|
||||
|
||||
The `snippets`, `documents`, and `snippet_embeddings` tables are queried with multi-column WHERE predicates in every hot path, but no composite indexes exist:
|
||||
|
||||
| Table | Filter columns | Used in |
|
||||
| -------------------- | ----------------------------- | ---------------------------------------------- |
|
||||
| `snippets` | `(repository_id, version_id)` | All search, diff, clone |
|
||||
| `snippets` | `(repository_id, type)` | Type-filtered queries |
|
||||
| `documents` | `(repository_id, version_id)` | Diff strategy, clone |
|
||||
| `snippet_embeddings` | `(profile_id, snippet_id)` | `findSnippetIdsMissingEmbeddings` LEFT JOIN |
|
||||
| `repositories` | `(state)` | `searchRepositories` WHERE `state = 'indexed'` |
|
||||
| `indexing_jobs` | `(repository_id, status)` | Job status lookups |
|
||||
|
||||
Without these indexes, SQLite performs a B-tree scan of the primary key and filters rows in memory. On a 500k-row `snippets` table this is the dominant cost of every search.
|
||||
|
||||
### 4. Admin UI — Current Problems
|
||||
|
||||
**File:** `src/routes/admin/jobs/+page.svelte`, `src/lib/components/IndexingProgress.svelte`
|
||||
|
||||
| Problem | Location | Impact |
|
||||
| -------------------------------------------------------------- | ----------------------------------------- | ------------------------------------------------------------ |
|
||||
| `IndexingProgress` polls every 2 s via `setInterval` + `fetch` | `IndexingProgress.svelte` | Constant HTTP traffic; progress lags by up to 2 s |
|
||||
| No status or repository filter controls | `admin/jobs/+page.svelte` | With 200 tag jobs, finding a specific one requires scrolling |
|
||||
| No worker status panel | — (no endpoint exists) | Operator cannot see which workers are busy or idle |
|
||||
| `alert()` for errors, `confirm()` for cancel | `admin/jobs/+page.svelte` — `showToast()` | Blocks the entire browser tab; unusable under parallel jobs |
|
||||
| `actionInProgress` is a single string, not per-job | `admin/jobs/+page.svelte` | Pausing job A disables buttons on all other jobs |
|
||||
| No skeleton loading — blank + spinner on first load | `admin/jobs/+page.svelte` | Layout shift; no structural preview while data loads |
|
||||
| Hard-coded `limit=50` query, no pagination | `admin/jobs/+page.svelte:fetchJobs()` | Page truncates silently for large queues |
|
||||
|
||||
---
|
||||
|
||||
### 3. Under-configured SQLite Connection
|
||||
|
||||
**File:** `src/lib/server/db/client.ts` and `src/lib/server/db/index.ts`
|
||||
|
||||
Current pragmas:
|
||||
|
||||
```typescript
|
||||
client.pragma('journal_mode = WAL');
|
||||
client.pragma('foreign_keys = ON');
|
||||
client.pragma('busy_timeout = 5000');
|
||||
```
|
||||
|
||||
Missing:
|
||||
|
||||
- `synchronous = NORMAL` — halves fsync overhead vs the default FULL; safe with WAL
|
||||
- `cache_size = -65536` — 64 MB page cache; default is 2 MB
|
||||
- `temp_store = MEMORY` — temp tables and sort spills stay in RAM
|
||||
- `mmap_size = 268435456` — 256 MB memory-mapped read path; bypasses system call overhead for reads
|
||||
- `wal_autocheckpoint = 1000` — more frequent checkpoints prevent WAL growth
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
### Drop-In Replacement: `@libsql/better-sqlite3`
|
||||
|
||||
`@libsql/better-sqlite3` is published by Turso and implemented as a Node.js native addon wrapping the libSQL embedded engine. The exported class is API-compatible with `better-sqlite3`:
|
||||
|
||||
```typescript
|
||||
// before
|
||||
import Database from 'better-sqlite3';
|
||||
const db = new Database('/path/to/file.db');
|
||||
db.pragma('journal_mode = WAL');
|
||||
const rows = db.prepare('SELECT ...').all(...params);
|
||||
|
||||
// after — identical code
|
||||
import Database from '@libsql/better-sqlite3';
|
||||
const db = new Database('/path/to/file.db');
|
||||
db.pragma('journal_mode = WAL');
|
||||
const rows = db.prepare('SELECT ...').all(...params);
|
||||
```
|
||||
|
||||
All of the following continue to work unchanged:
|
||||
|
||||
- `drizzle-orm/better-sqlite3` adapter and `migrate` helper
|
||||
- `drizzle-kit` with `dialect: 'sqlite'`
|
||||
- Prepared statements, transactions, WAL pragmas, foreign keys
|
||||
- Worker thread per-thread connections (`worker-entry.ts`, `embed-worker-entry.ts`)
|
||||
- All `type Database from 'better-sqlite3'` type imports (replaced in lock-step)
|
||||
|
||||
### Vector Index Design
|
||||
|
||||
libSQL provides `libsql_vector_idx()` — a virtual index type stored in a shadow table alongside the main table. Once indexed, KNN queries use a SQL `vector_top_k()` function:
|
||||
|
||||
```sql
|
||||
-- KNN: return top-k snippet IDs closest to the query vector
|
||||
SELECT snippet_id
|
||||
FROM vector_top_k('idx_snippet_embeddings_vec', vector_from_float32(?), ?)
|
||||
```
|
||||
|
||||
`vector_from_float32(blob)` accepts the same raw little-endian Float32 bytes currently stored in the `embedding` blob column. **No data migration is needed** — the existing blob column can be re-indexed with `libsql_vector_idx` pointing at the bytes-stored column.
|
||||
|
||||
The index strategy:
|
||||
|
||||
1. Add a generated `vec_embedding` column of type `F32_BLOB(dimensions)` to `snippet_embeddings`, populated from the existing `embedding` blob via a migration trigger.
|
||||
2. Create the vector index: `CREATE INDEX idx_snippet_embeddings_vec ON snippet_embeddings(vec_embedding) USING libsql_vector_idx(vec_embedding)`.
|
||||
3. Rewrite `VectorSearch.vectorSearch()` to use `vector_top_k()` with a two-step join instead of the in-memory loop.
|
||||
4. Update `EmbeddingService.embedSnippets()` to write `vec_embedding` on insert.
|
||||
|
||||
Dimensions are profile-specific. Because the index is per-column, a separate index is needed per embedding dimensionality. For v1, a single index covering the default profile's dimensions is sufficient; multi-profile KNN can be handled with a `WHERE profile_id = ?` pre-filter on the vector_top_k results.
|
||||
|
||||
### Updated Vector Search Query
|
||||
|
||||
```typescript
|
||||
vectorSearch(queryEmbedding: Float32Array, options: VectorSearchOptions): VectorSearchResult[] {
|
||||
const { repositoryId, versionId, profileId = 'local-default', limit = 50 } = options;
|
||||
|
||||
// Encode query vector as raw bytes (same format as stored blobs)
|
||||
const queryBytes = Buffer.from(queryEmbedding.buffer);
|
||||
|
||||
// Use libSQL vector_top_k for ANN — returns ordered (rowid, distance) pairs
|
||||
let sql = `
|
||||
SELECT se.snippet_id,
|
||||
vector_distance_cos(se.vec_embedding, vector_from_float32(?)) AS score
|
||||
FROM vector_top_k('idx_snippet_embeddings_vec', vector_from_float32(?), ?) AS knn
|
||||
JOIN snippet_embeddings se ON se.rowid = knn.id
|
||||
JOIN snippets s ON s.id = se.snippet_id
|
||||
WHERE s.repository_id = ?
|
||||
AND se.profile_id = ?
|
||||
`;
|
||||
const params: unknown[] = [queryBytes, queryBytes, limit * 4, repositoryId, profileId];
|
||||
|
||||
if (versionId) {
|
||||
sql += ' AND s.version_id = ?';
|
||||
params.push(versionId);
|
||||
}
|
||||
|
||||
sql += ' ORDER BY score ASC LIMIT ?';
|
||||
params.push(limit);
|
||||
|
||||
return this.db
|
||||
.prepare<unknown[], { snippet_id: string; score: number }>(sql)
|
||||
.all(...params)
|
||||
.map((row) => ({ snippetId: row.snippet_id, score: 1 - row.score }));
|
||||
}
|
||||
```
|
||||
|
||||
`vector_distance_cos` returns distance (0 = identical), so `1 - distance` gives a similarity score in [0, 1] matching the existing `VectorSearchResult.score` contract.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 1 — Package Swap (no logic changes)
|
||||
|
||||
**Files touched:** `package.json`, all `.ts` files that import `better-sqlite3`
|
||||
|
||||
1. In `package.json`:
|
||||
- Remove `"better-sqlite3": "^12.6.2"` from `dependencies`
|
||||
- Add `"@libsql/better-sqlite3": "^0.4.0"` to `dependencies`
|
||||
- Remove `"@types/better-sqlite3": "^7.6.13"` from `devDependencies`
|
||||
- `@libsql/better-sqlite3` ships its own TypeScript declarations
|
||||
|
||||
2. Replace all import statements (35 occurrences across 19 files):
|
||||
|
||||
| Old import | New import |
|
||||
| --------------------------------------------------------------- | ---------------------------------------------------- |
|
||||
| `import Database from 'better-sqlite3'` | `import Database from '@libsql/better-sqlite3'` |
|
||||
| `import type Database from 'better-sqlite3'` | `import type Database from '@libsql/better-sqlite3'` |
|
||||
| `import { drizzle } from 'drizzle-orm/better-sqlite3'` | unchanged |
|
||||
| `import { migrate } from 'drizzle-orm/better-sqlite3/migrator'` | unchanged |
|
||||
|
||||
Affected production files:
|
||||
- `src/lib/server/db/index.ts`
|
||||
- `src/lib/server/db/client.ts`
|
||||
- `src/lib/server/embeddings/embedding.service.ts`
|
||||
- `src/lib/server/pipeline/indexing.pipeline.ts`
|
||||
- `src/lib/server/pipeline/job-queue.ts`
|
||||
- `src/lib/server/pipeline/startup.ts`
|
||||
- `src/lib/server/pipeline/worker-entry.ts`
|
||||
- `src/lib/server/pipeline/embed-worker-entry.ts`
|
||||
- `src/lib/server/pipeline/differential-strategy.ts`
|
||||
- `src/lib/server/search/vector.search.ts`
|
||||
- `src/lib/server/search/hybrid.search.service.ts`
|
||||
- `src/lib/server/search/search.service.ts`
|
||||
- `src/lib/server/services/repository.service.ts`
|
||||
- `src/lib/server/services/version.service.ts`
|
||||
- `src/lib/server/services/embedding-settings.service.ts`
|
||||
|
||||
Affected test files (same mechanical replacement):
|
||||
- `src/routes/api/v1/api-contract.integration.test.ts`
|
||||
- `src/routes/api/v1/sse-and-settings.integration.test.ts`
|
||||
- `src/routes/settings/page.server.test.ts`
|
||||
- `src/lib/server/db/schema.test.ts`
|
||||
- `src/lib/server/embeddings/embedding.service.test.ts`
|
||||
- `src/lib/server/pipeline/indexing.pipeline.test.ts`
|
||||
- `src/lib/server/pipeline/differential-strategy.test.ts`
|
||||
- `src/lib/server/search/search.service.test.ts`
|
||||
- `src/lib/server/search/hybrid.search.service.test.ts`
|
||||
- `src/lib/server/services/repository.service.test.ts`
|
||||
- `src/lib/server/services/version.service.test.ts`
|
||||
- `src/routes/api/v1/settings/embedding/server.test.ts`
|
||||
- `src/routes/api/v1/libs/[id]/index/server.test.ts`
|
||||
- `src/routes/api/v1/libs/[id]/versions/discover/server.test.ts`
|
||||
|
||||
3. Run all tests — they should pass with zero logic changes: `npm test`
|
||||
|
||||
### Phase 2 — Pragma Hardening
|
||||
|
||||
**Files touched:** `src/lib/server/db/client.ts`, `src/lib/server/db/index.ts`
|
||||
|
||||
Add the following pragmas to both connection factories (raw client and `initializeDatabase()`):
|
||||
|
||||
```typescript
|
||||
client.pragma('synchronous = NORMAL');
|
||||
client.pragma('cache_size = -65536'); // 64 MB
|
||||
client.pragma('temp_store = MEMORY');
|
||||
client.pragma('mmap_size = 268435456'); // 256 MB
|
||||
client.pragma('wal_autocheckpoint = 1000');
|
||||
```
|
||||
|
||||
Worker threads (`worker-entry.ts`, `embed-worker-entry.ts`) open their own connections — apply the same pragmas there.
|
||||
|
||||
### Phase 3 — Composite Indexes (Drizzle migration)
|
||||
|
||||
**Files touched:** `src/lib/server/db/schema.ts`, new migration SQL file
|
||||
|
||||
Add indexes in `schema.ts` using Drizzle's `index()` helper:
|
||||
|
||||
```typescript
|
||||
// snippets table
|
||||
export const snippets = sqliteTable(
|
||||
'snippets',
|
||||
{
|
||||
/* unchanged */
|
||||
},
|
||||
(t) => [
|
||||
index('idx_snippets_repo_version').on(t.repositoryId, t.versionId),
|
||||
index('idx_snippets_repo_type').on(t.repositoryId, t.type)
|
||||
]
|
||||
);
|
||||
|
||||
// documents table
|
||||
export const documents = sqliteTable(
|
||||
'documents',
|
||||
{
|
||||
/* unchanged */
|
||||
},
|
||||
(t) => [index('idx_documents_repo_version').on(t.repositoryId, t.versionId)]
|
||||
);
|
||||
|
||||
// snippet_embeddings table
|
||||
export const snippetEmbeddings = sqliteTable(
|
||||
'snippet_embeddings',
|
||||
{
|
||||
/* unchanged */
|
||||
},
|
||||
(table) => [
|
||||
primaryKey({ columns: [table.snippetId, table.profileId] }), // unchanged
|
||||
index('idx_embeddings_profile').on(table.profileId, table.snippetId)
|
||||
]
|
||||
);
|
||||
|
||||
// repositories table
|
||||
export const repositories = sqliteTable(
|
||||
'repositories',
|
||||
{
|
||||
/* unchanged */
|
||||
},
|
||||
(t) => [index('idx_repositories_state').on(t.state)]
|
||||
);
|
||||
|
||||
// indexing_jobs table
|
||||
export const indexingJobs = sqliteTable(
|
||||
'indexing_jobs',
|
||||
{
|
||||
/* unchanged */
|
||||
},
|
||||
(t) => [index('idx_jobs_repo_status').on(t.repositoryId, t.status)]
|
||||
);
|
||||
```
|
||||
|
||||
Generate and apply migration: `npm run db:generate && npm run db:migrate`
|
||||
|
||||
### Phase 4 — Vector Column and Index (Drizzle migration)
|
||||
|
||||
**Files touched:** `src/lib/server/db/schema.ts`, new migration SQL, `src/lib/server/search/vector.search.ts`, `src/lib/server/embeddings/embedding.service.ts`
|
||||
|
||||
#### 4a. Schema: add `vec_embedding` column
|
||||
|
||||
Add `vec_embedding` to `snippet_embeddings`. Drizzle does not have a `F32_BLOB` column type helper; use a raw SQL column:
|
||||
|
||||
```typescript
|
||||
import { sql } from 'drizzle-orm';
|
||||
import { customType } from 'drizzle-orm/sqlite-core';
|
||||
|
||||
const f32Blob = (name: string, dimensions: number) =>
|
||||
customType<{ data: Buffer }>({
|
||||
dataType() {
|
||||
return `F32_BLOB(${dimensions})`;
|
||||
}
|
||||
})(name);
|
||||
|
||||
export const snippetEmbeddings = sqliteTable(
|
||||
'snippet_embeddings',
|
||||
{
|
||||
snippetId: text('snippet_id')
|
||||
.notNull()
|
||||
.references(() => snippets.id, { onDelete: 'cascade' }),
|
||||
profileId: text('profile_id')
|
||||
.notNull()
|
||||
.references(() => embeddingProfiles.id, { onDelete: 'cascade' }),
|
||||
model: text('model').notNull(),
|
||||
dimensions: integer('dimensions').notNull(),
|
||||
embedding: blob('embedding').notNull(), // existing blob — kept for backward compat
|
||||
vecEmbedding: f32Blob('vec_embedding', 1536), // libSQL vector column (nullable during migration fill)
|
||||
createdAt: integer('created_at').notNull()
|
||||
},
|
||||
(table) => [
|
||||
primaryKey({ columns: [table.snippetId, table.profileId] }),
|
||||
index('idx_embeddings_profile').on(table.profileId, table.snippetId)
|
||||
]
|
||||
);
|
||||
```
|
||||
|
||||
Because dimensionality is fixed per model, `F32_BLOB(1536)` covers OpenAI `text-embedding-3-small/large`. A follow-up can parameterize this per profile.
|
||||
|
||||
#### 4b. Migration SQL: populate `vec_embedding` from existing `embedding` blob and create the vector index
|
||||
|
||||
The vector index cannot be expressed in SQL DDL portable across Drizzle — it must be applied in the FTS-style custom SQL file (`src/lib/server/db/fts.sql` or an equivalent `vectors.sql`):
|
||||
|
||||
```sql
|
||||
-- Backfill vec_embedding from existing raw blob data
|
||||
UPDATE snippet_embeddings
|
||||
SET vec_embedding = vector_from_float32(embedding)
|
||||
WHERE vec_embedding IS NULL AND embedding IS NOT NULL;
|
||||
|
||||
-- Create the HNSW vector index (libSQL extension syntax)
|
||||
CREATE INDEX IF NOT EXISTS idx_snippet_embeddings_vec
|
||||
ON snippet_embeddings(vec_embedding)
|
||||
USING libsql_vector_idx(vec_embedding, 'metric=cosine', 'compress_neighbors=float8', 'max_neighbors=20');
|
||||
```
|
||||
|
||||
Add a call to this SQL in `initializeDatabase()` alongside the existing `fts.sql` execution:
|
||||
|
||||
```typescript
|
||||
const vectorSql = readFileSync(join(__dirname, 'vectors.sql'), 'utf-8');
|
||||
client.exec(vectorSql);
|
||||
```
|
||||
|
||||
#### 4c. Update `EmbeddingService.embedSnippets()`
|
||||
|
||||
When inserting a new embedding, write both the blob and the vec column:
|
||||
|
||||
```typescript
|
||||
const insert = this.db.prepare<[string, string, string, number, Buffer, Buffer]>(`
|
||||
INSERT OR REPLACE INTO snippet_embeddings
|
||||
(snippet_id, profile_id, model, dimensions, embedding, vec_embedding, created_at)
|
||||
VALUES (?, ?, ?, ?, ?, vector_from_float32(?), unixepoch())
|
||||
`);
|
||||
|
||||
// inside the transaction:
|
||||
insert.run(
|
||||
snippet.id,
|
||||
this.profileId,
|
||||
embedding.model,
|
||||
embedding.dimensions,
|
||||
embeddingBuffer,
|
||||
embeddingBuffer // same bytes — vector_from_float32() interprets them
|
||||
);
|
||||
```
|
||||
|
||||
#### 4d. Rewrite `VectorSearch.vectorSearch()`
|
||||
|
||||
Replace the full-scan JS loop with `vector_top_k()`:
|
||||
|
||||
```typescript
|
||||
vectorSearch(queryEmbedding: Float32Array, options: VectorSearchOptions): VectorSearchResult[] {
|
||||
const { repositoryId, versionId, profileId = 'local-default', limit = 50 } = options;
|
||||
|
||||
const queryBytes = Buffer.from(queryEmbedding.buffer);
|
||||
const candidatePool = limit * 4; // over-fetch for post-filter
|
||||
|
||||
let sql = `
|
||||
SELECT se.snippet_id,
|
||||
vector_distance_cos(se.vec_embedding, vector_from_float32(?)) AS distance
|
||||
FROM vector_top_k('idx_snippet_embeddings_vec', vector_from_float32(?), ?) AS knn
|
||||
JOIN snippet_embeddings se ON se.rowid = knn.id
|
||||
JOIN snippets s ON s.id = se.snippet_id
|
||||
WHERE s.repository_id = ?
|
||||
AND se.profile_id = ?
|
||||
`;
|
||||
const params: unknown[] = [queryBytes, queryBytes, candidatePool, repositoryId, profileId];
|
||||
|
||||
if (versionId) {
|
||||
sql += ' AND s.version_id = ?';
|
||||
params.push(versionId);
|
||||
}
|
||||
|
||||
sql += ' ORDER BY distance ASC LIMIT ?';
|
||||
params.push(limit);
|
||||
|
||||
return this.db
|
||||
.prepare<unknown[], { snippet_id: string; distance: number }>(sql)
|
||||
.all(...params)
|
||||
.map((row) => ({ snippetId: row.snippet_id, score: 1 - row.distance }));
|
||||
}
|
||||
```
|
||||
|
||||
The `score` contract is preserved (1 = identical, 0 = orthogonal). The `cosineSimilarity` helper function is no longer called at runtime but can be kept for unit tests.
|
||||
|
||||
### Phase 5 — Per-Job Serialization Key Fix
|
||||
|
||||
**Files touched:** `src/lib/server/pipeline/worker-pool.ts`
|
||||
|
||||
The current serialization guard uses a bare `repositoryId`:
|
||||
|
||||
```typescript
|
||||
// current
|
||||
private runningRepoIds = new Set<string>();
|
||||
// blocks any job whose repositoryId is already in the set
|
||||
const jobIdx = this.jobQueue.findIndex((j) => !this.runningRepoIds.has(j.repositoryId));
|
||||
```
|
||||
|
||||
Different tags of the same repository write to completely disjoint rows (`version_id`-partitioned documents, snippets, and embeddings). The only genuine conflict is two jobs for the same `(repositoryId, versionId)` pair, which `JobQueue.enqueue()` already prevents via the `status IN ('queued', 'running')` deduplication check.
|
||||
|
||||
Change the guard to key on the compound pair:
|
||||
|
||||
```typescript
|
||||
// replace Set<string> with Set<string> keyed on compound pair
|
||||
private runningJobKeys = new Set<string>();
|
||||
|
||||
private jobKey(repositoryId: string, versionId?: string | null): string {
|
||||
return `${repositoryId}|${versionId ?? ''}`;
|
||||
}
|
||||
```
|
||||
|
||||
Update all four sites that read/write `runningRepoIds`:
|
||||
|
||||
| Location | Old | New |
|
||||
| ------------------------------------ | ----------------------------------------------------- | ---------------------------------------------------------------------------------------- |
|
||||
| `dispatch()` find | `!this.runningRepoIds.has(j.repositoryId)` | `!this.runningJobKeys.has(this.jobKey(j.repositoryId, j.versionId))` |
|
||||
| `dispatch()` add | `this.runningRepoIds.add(job.repositoryId)` | `this.runningJobKeys.add(this.jobKey(job.repositoryId, job.versionId))` |
|
||||
| `onWorkerMessage` done/failed delete | `this.runningRepoIds.delete(runningJob.repositoryId)` | `this.runningJobKeys.delete(this.jobKey(runningJob.repositoryId, runningJob.versionId))` |
|
||||
| `onWorkerExit` delete | same | same |
|
||||
|
||||
The `QueuedJob` and `RunningJob` interfaces already carry `versionId` — no type changes needed.
|
||||
|
||||
The only serialized case that remains is `versionId = null` (default-branch re-index) paired with itself, which maps to the stable key `"repositoryId|"` — correctly deduplicated.
|
||||
|
||||
---
|
||||
|
||||
### Phase 6 — Dedicated Write Worker (Single-Writer Pattern)
|
||||
|
||||
**Files touched:** `src/lib/server/pipeline/worker-types.ts`, `src/lib/server/pipeline/write-worker-entry.ts` (new), `src/lib/server/pipeline/worker-entry.ts`, `src/lib/server/pipeline/worker-pool.ts`
|
||||
|
||||
#### Motivation
|
||||
|
||||
With Phase 5 in place, N tags of the same library can index in parallel. Each parse worker currently opens its own DB connection and holds the write lock while storing parsed snippets. Under N concurrent writers, each worker spends the majority of its wall-clock time waiting in `busy_timeout` back-off. The fix is the single-writer pattern: one dedicated write worker owns the only writable DB connection; parse workers become stateless CPU workers that send write batches over `postMessage`.
|
||||
|
||||
```
|
||||
Parse Worker 1 ──┐ WriteRequest (docs[], snippets[]) ┌── WriteAck
|
||||
Parse Worker 2 ──┼─────────────────────────────────────► Write Worker (sole DB writer)
|
||||
Parse Worker N ──┘ └── single better-sqlite3 connection
|
||||
```
|
||||
|
||||
#### New message types (`worker-types.ts`)
|
||||
|
||||
```typescript
|
||||
export interface WriteRequest {
|
||||
type: 'write';
|
||||
jobId: string;
|
||||
documents: SerializedDocument[];
|
||||
snippets: SerializedSnippet[];
|
||||
}
|
||||
|
||||
export interface WriteAck {
|
||||
type: 'write_ack';
|
||||
jobId: string;
|
||||
documentCount: number;
|
||||
snippetCount: number;
|
||||
}
|
||||
|
||||
export interface WriteError {
|
||||
type: 'write_error';
|
||||
jobId: string;
|
||||
error: string;
|
||||
}
|
||||
|
||||
// SerializedDocument / SerializedSnippet mirror the DB column shapes
|
||||
// (plain objects, safe to transfer via structured clone)
|
||||
```
|
||||
|
||||
#### Write worker (`write-worker-entry.ts`)
|
||||
|
||||
The write worker:
|
||||
|
||||
- Opens its own `Database` connection (WAL mode, all pragmas from Phase 2)
|
||||
- Listens for `WriteRequest` messages
|
||||
- Wraps each batch in a single transaction
|
||||
- Posts `WriteAck` or `WriteError` back to the parent, which forwards the ack to the originating parse worker by `jobId`
|
||||
|
||||
```typescript
|
||||
import Database from '@libsql/better-sqlite3';
|
||||
import { workerData, parentPort } from 'node:worker_threads';
|
||||
import type { WriteRequest, WriteAck, WriteError } from './worker-types.js';
|
||||
|
||||
const db = new Database((workerData as WorkerInitData).dbPath);
|
||||
db.pragma('journal_mode = WAL');
|
||||
db.pragma('synchronous = NORMAL');
|
||||
db.pragma('cache_size = -65536');
|
||||
db.pragma('foreign_keys = ON');
|
||||
|
||||
const insertDoc = db.prepare(`INSERT OR REPLACE INTO documents (...) VALUES (...)`);
|
||||
const insertSnippet = db.prepare(`INSERT OR REPLACE INTO snippets (...) VALUES (...)`);
|
||||
|
||||
const writeBatch = db.transaction((req: WriteRequest) => {
|
||||
for (const doc of req.documents) insertDoc.run(doc);
|
||||
for (const snip of req.snippets) insertSnippet.run(snip);
|
||||
});
|
||||
|
||||
parentPort!.on('message', (req: WriteRequest) => {
|
||||
try {
|
||||
writeBatch(req);
|
||||
const ack: WriteAck = {
|
||||
type: 'write_ack',
|
||||
jobId: req.jobId,
|
||||
documentCount: req.documents.length,
|
||||
snippetCount: req.snippets.length
|
||||
};
|
||||
parentPort!.postMessage(ack);
|
||||
} catch (err) {
|
||||
const fail: WriteError = { type: 'write_error', jobId: req.jobId, error: String(err) };
|
||||
parentPort!.postMessage(fail);
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
#### Parse worker changes (`worker-entry.ts`)
|
||||
|
||||
Parse workers lose their DB connection. `IndexingPipeline` receives a `sendWrite` callback instead of a `db` instance. After parsing each file batch, the worker calls `sendWrite({ type: 'write', jobId, documents, snippets })` and awaits the `WriteAck` before continuing. This keeps back-pressure: a slow write worker naturally throttles the parse workers without additional semaphores.
|
||||
|
||||
#### WorkerPool changes
|
||||
|
||||
- Spawn one write worker at startup (always, regardless of embedding config)
|
||||
- Route incoming `write_ack` / `write_error` messages to the correct waiting parse worker via a `Map<jobId, resolve>` promise registry
|
||||
- The write worker is separate from the embed worker — embed writes (`snippet_embeddings`) can still go through the write worker by adding an `EmbedWriteRequest` message type, or remain in the embed worker since embedding runs after parsing completes (no lock contention with active parse jobs)
|
||||
|
||||
#### Conflict analysis with Phase 5
|
||||
|
||||
Phases 5 and 6 compose cleanly:
|
||||
|
||||
- Phase 5 allows multiple `(repo, versionId)` jobs to run concurrently
|
||||
- Phase 6 ensures all those concurrent jobs share a single write path — contention is eliminated by design
|
||||
- The write worker is stateless with respect to job identity; it just executes batches in arrival order within a FIFO message queue (Node.js `postMessage` is ordered)
|
||||
- The embed worker remains a separate process (it runs after parse completes, so it never overlaps with active parse writes for the same job)
|
||||
|
||||
---
|
||||
|
||||
### Phase 7 — Admin UI Overhaul
|
||||
|
||||
**Files touched:**
|
||||
|
||||
- `src/routes/admin/jobs/+page.svelte` — rebuilt
|
||||
- `src/routes/api/v1/workers/+server.ts` — new endpoint
|
||||
- `src/lib/components/admin/JobStatusBadge.svelte` — extend with spinner variant
|
||||
- `src/lib/components/admin/JobSkeleton.svelte` — new
|
||||
- `src/lib/components/admin/WorkerStatusPanel.svelte` — new
|
||||
- `src/lib/components/admin/Toast.svelte` — new
|
||||
- `src/lib/components/IndexingProgress.svelte` — switch to SSE
|
||||
|
||||
#### 7a. New API endpoint: `GET /api/v1/workers`
|
||||
|
||||
The `WorkerPool` singleton tracks running jobs in `runningJobs: Map<Worker, RunningJob>` and idle workers in `idleWorkers: Worker[]`. Expose this state as a lightweight REST snapshot:
|
||||
|
||||
```typescript
|
||||
// GET /api/v1/workers
|
||||
// Response shape:
|
||||
interface WorkersResponse {
|
||||
concurrency: number; // configured max workers
|
||||
active: number; // workers with a running job
|
||||
idle: number; // workers waiting for work
|
||||
workers: WorkerStatus[]; // one entry per spawned parse worker
|
||||
}
|
||||
|
||||
interface WorkerStatus {
|
||||
index: number; // worker slot (0-based)
|
||||
state: 'idle' | 'running'; // current state
|
||||
jobId: string | null; // null when idle
|
||||
repositoryId: string | null;
|
||||
versionId: string | null;
|
||||
}
|
||||
```
|
||||
|
||||
The route handler calls `getPool().getStatus()` — add a `getStatus(): WorkersResponse` method to `WorkerPool` that reads `runningJobs` and `idleWorkers` without any DB call. This is read-only and runs on the main thread.
|
||||
|
||||
The SSE stream at `/api/v1/jobs/stream` should emit a new `worker-status` event type whenever a worker transitions idle ↔ running (on `dispatch()` and job completion). This allows the worker panel to update in real-time without polling the REST endpoint.
|
||||
|
||||
#### 7b. `GET /api/v1/jobs` — add `repositoryId` free-text and multi-status filter
|
||||
|
||||
The existing endpoint already accepts `repositoryId` (exact match) and `status` (single value). Extend:
|
||||
|
||||
- `repositoryId` to also support prefix match (e.g. `?repositoryId=/facebook` returns all `/facebook/*` repos)
|
||||
- `status` to accept comma-separated values: `?status=queued,running`
|
||||
- `page` and `pageSize` query params (default pageSize=50, max 200) in addition to `limit` for backwards compat
|
||||
|
||||
Return `{ jobs, total, page, pageSize }` with `total` always reflecting the unfiltered-by-page count.
|
||||
|
||||
#### 7c. New component: `JobSkeleton.svelte`
|
||||
|
||||
A set of skeleton rows matching the job table structure. Shown during the initial fetch before any data arrives. Uses Tailwind `animate-pulse`:
|
||||
|
||||
```svelte
|
||||
<!-- renders N skeleton rows -->
|
||||
<script lang="ts">
|
||||
let { rows = 5 }: { rows?: number } = $props();
|
||||
</script>
|
||||
|
||||
{#each Array(rows) as _, i (i)}
|
||||
<tr>
|
||||
<td class="px-6 py-4">
|
||||
<div class="h-4 w-48 animate-pulse rounded bg-gray-200"></div>
|
||||
<div class="mt-1 h-3 w-24 animate-pulse rounded bg-gray-100"></div>
|
||||
</td>
|
||||
<td class="px-6 py-4">
|
||||
<div class="h-5 w-16 animate-pulse rounded-full bg-gray-200"></div>
|
||||
</td>
|
||||
<td class="px-6 py-4">
|
||||
<div class="h-4 w-20 animate-pulse rounded bg-gray-200"></div>
|
||||
</td>
|
||||
<td class="px-6 py-4">
|
||||
<div class="h-2 w-32 animate-pulse rounded-full bg-gray-200"></div>
|
||||
</td>
|
||||
<td class="px-6 py-4">
|
||||
<div class="h-4 w-28 animate-pulse rounded bg-gray-200"></div>
|
||||
</td>
|
||||
<td class="px-6 py-4 text-right">
|
||||
<div class="ml-auto h-7 w-20 animate-pulse rounded bg-gray-200"></div>
|
||||
</td>
|
||||
</tr>
|
||||
{/each}
|
||||
```
|
||||
|
||||
#### 7d. New component: `Toast.svelte`
|
||||
|
||||
Replaces all `alert()` / `console.log()` calls in the jobs page. Renders a fixed-position stack in the bottom-right corner. Each toast auto-dismisses after 4 seconds and can be manually closed:
|
||||
|
||||
```svelte
|
||||
<!-- Usage: bind a toasts array and call push({ message, type }) -->
|
||||
<script lang="ts">
|
||||
export interface ToastItem {
|
||||
id: string;
|
||||
message: string;
|
||||
type: 'success' | 'error' | 'info';
|
||||
}
|
||||
|
||||
let { toasts = $bindable([]) }: { toasts: ToastItem[] } = $props();
|
||||
|
||||
function dismiss(id: string) {
|
||||
toasts = toasts.filter((t) => t.id !== id);
|
||||
}
|
||||
</script>
|
||||
|
||||
<div class="fixed right-4 bottom-4 z-50 flex flex-col gap-2">
|
||||
{#each toasts as toast (toast.id)}
|
||||
<!-- color by type, close button, auto-dismiss via onmount timer -->
|
||||
{/each}
|
||||
</div>
|
||||
```
|
||||
|
||||
The jobs page replaces `showToast()` with pushing onto the bound `toasts` array. The `confirm()` for cancel is replaced with an inline confirmation state per job (`pendingCancelId`) that shows "Confirm cancel?" / "Yes" / "No" buttons inside the row.
|
||||
|
||||
#### 7e. New component: `WorkerStatusPanel.svelte`
|
||||
|
||||
A compact panel displayed above the job table showing the worker pool health. Subscribes to the `worker-status` SSE events and falls back to polling `GET /api/v1/workers` every 5 s on SSE error:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Workers [2 / 4 active] ████░░░░ 50% │
|
||||
│ Worker 0 ● running /facebook/react / v18.3.0 │
|
||||
│ Worker 1 ● running /facebook/react / v17.0.2 │
|
||||
│ Worker 2 ○ idle │
|
||||
│ Worker 3 ○ idle │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Each worker row shows: slot index, status dot (animated green pulse for running), repository ID, version tag, and a link to the job row in the table below.
|
||||
|
||||
#### 7f. Filter bar on the jobs page
|
||||
|
||||
Add a filter strip between the page header and the table:
|
||||
|
||||
```
|
||||
[ Repository: _______________ ] [ Status: ▾ all ] [ 🔍 Apply ] [ ↺ Reset ]
|
||||
```
|
||||
|
||||
- **Repository field**: free-text input, matches `repositoryId` prefix (e.g. `/facebook` shows all `/facebook/*`)
|
||||
- **Status dropdown**: multi-select checkboxes for `queued`, `running`, `paused`, `cancelled`, `done`, `failed`; default = all
|
||||
- Filters are applied client-side against the loaded `jobs` array for instant feedback, and also re-fetched from the API on Apply to get the correct total count
|
||||
- Filter state is mirrored to URL search params (`?repo=...&status=...`) so the view is bookmarkable and survives refresh
|
||||
|
||||
#### 7g. Per-job action spinner and disabled state
|
||||
|
||||
Replace the single `actionInProgress: string | null` with a `Map<string, 'pausing' | 'resuming' | 'cancelling'>`:
|
||||
|
||||
```typescript
|
||||
let actionInProgress = $state(new Map<string, 'pausing' | 'resuming' | 'cancelling'>());
|
||||
```
|
||||
|
||||
Each action button shows an inline spinner (small `animate-spin` circle) and is disabled only for that row. Other rows remain fully interactive during the action. On completion the entry is deleted from the map.
|
||||
|
||||
#### 7h. `IndexingProgress.svelte` — switch from polling to SSE
|
||||
|
||||
The component currently uses `setInterval + fetch` at 2 s. Replace with the per-job SSE stream already available at `/api/v1/jobs/{id}/stream`:
|
||||
|
||||
```typescript
|
||||
// replace the $effect body
|
||||
$effect(() => {
|
||||
job = null;
|
||||
const es = new EventSource(`/api/v1/jobs/${jobId}/stream`);
|
||||
|
||||
es.addEventListener('job-progress', (event) => {
|
||||
const data = JSON.parse(event.data);
|
||||
job = { ...job, ...data };
|
||||
});
|
||||
|
||||
es.addEventListener('job-done', () => {
|
||||
void fetch(`/api/v1/jobs/${jobId}`)
|
||||
.then((r) => r.json())
|
||||
.then((d) => {
|
||||
job = d.job;
|
||||
oncomplete?.();
|
||||
});
|
||||
es.close();
|
||||
});
|
||||
|
||||
es.addEventListener('job-failed', (event) => {
|
||||
const data = JSON.parse(event.data);
|
||||
job = { ...job, status: 'failed', error: data.error };
|
||||
oncomplete?.();
|
||||
es.close();
|
||||
});
|
||||
|
||||
es.onerror = () => {
|
||||
// on SSE failure fall back to a single fetch to get current state
|
||||
es.close();
|
||||
void fetch(`/api/v1/jobs/${jobId}`)
|
||||
.then((r) => r.json())
|
||||
.then((d) => {
|
||||
job = d.job;
|
||||
});
|
||||
};
|
||||
|
||||
return () => es.close();
|
||||
});
|
||||
```
|
||||
|
||||
This reduces network traffic from 1 request/2 s to zero requests during active indexing — updates arrive as server-push events.
|
||||
|
||||
#### 7i. Pagination on the jobs page
|
||||
|
||||
Replace the hard-coded `?limit=50` fetch with paginated requests:
|
||||
|
||||
```typescript
|
||||
let currentPage = $state(1);
|
||||
const PAGE_SIZE = 50;
|
||||
|
||||
async function fetchJobs() {
|
||||
const params = new URLSearchParams({
|
||||
page: String(currentPage),
|
||||
pageSize: String(PAGE_SIZE),
|
||||
...(filterRepo ? { repositoryId: filterRepo } : {}),
|
||||
...(filterStatuses.length ? { status: filterStatuses.join(',') } : {})
|
||||
});
|
||||
const data = await fetch(`/api/v1/jobs?${params}`).then((r) => r.json());
|
||||
jobs = data.jobs;
|
||||
total = data.total;
|
||||
}
|
||||
```
|
||||
|
||||
Render a simple `« Prev Page N of M Next »` control below the table, hidden when `total <= PAGE_SIZE`.
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `npm install` with `@libsql/better-sqlite3` succeeds; `better-sqlite3` is absent from `node_modules`
|
||||
- [ ] All existing unit and integration tests pass after Phase 1 import swap
|
||||
- [ ] `npm run db:migrate` applies the composite index migration cleanly against an existing database
|
||||
- [ ] `npm run db:migrate` applies the vector column migration cleanly; `sql> SELECT vec_embedding FROM snippet_embeddings LIMIT 1` returns a non-NULL value for any previously-embedded snippet
|
||||
- [ ] `GET /api/v1/context?libraryId=...&query=...` with a semantic-mode or hybrid-mode request returns results in ≤ 200 ms on a repository with 50k+ snippets (vs previous multi-second response)
|
||||
- [ ] Memory profiled during a /context request shows no allocation spike proportional to repository size
|
||||
- [ ] `EXPLAIN QUERY PLAN` on the `snippets` search query shows `SCAN snippets USING INDEX idx_snippets_repo_version` instead of `SCAN snippets`
|
||||
- [ ] Worker threads (`worker-entry.ts`, `embed-worker-entry.ts`) start and complete an indexing job successfully after the package swap
|
||||
- [ ] `drizzle-kit studio` connects and browses the migrated database
|
||||
- [ ] Re-indexing a repository after the migration correctly populates `vec_embedding` on all new snippets
|
||||
- [ ] `cosineSimilarity` unit tests still pass (function is kept)
|
||||
- [ ] Starting two indexing jobs for different tags of the same repository simultaneously results in both jobs reaching `running` state concurrently (not one waiting for the other)
|
||||
- [ ] Starting two indexing jobs for the **same** `(repositoryId, versionId)` pair returns the existing job (deduplication unchanged)
|
||||
- [ ] With 4 parse workers and 4 concurrent tag jobs, zero `SQLITE_BUSY` errors appear in logs
|
||||
- [ ] Write worker is present in the process list during active indexing (`worker_threads` inspector shows `write-worker-entry`)
|
||||
- [ ] A `WriteError` from the write worker marks the originating job as `failed` with the error message propagated to the SSE stream
|
||||
- [ ] `GET /api/v1/workers` returns a `WorkersResponse` JSON object with correct `active`, `idle`, and `workers[]` fields while jobs are in-flight
|
||||
- [ ] The `worker-status` SSE event is emitted by `/api/v1/jobs/stream` whenever a worker transitions state
|
||||
- [ ] The admin jobs page shows skeleton rows (not a blank screen) during the initial `fetchJobs()` call
|
||||
- [ ] No `alert()` or `confirm()` calls exist in `admin/jobs/+page.svelte` after this change; all notifications go through `Toast.svelte`
|
||||
- [ ] Pausing job A while job B is also in progress does not disable job B's action buttons
|
||||
- [ ] The status filter multi-select correctly restricts the visible job list; the URL updates to reflect the filter state
|
||||
- [ ] The repository prefix filter `?repositoryId=/facebook` returns all jobs whose `repositoryId` starts with `/facebook`
|
||||
- [ ] Paginating past page 1 fetches the next batch from the API, not from the client-side array
|
||||
- [ ] `IndexingProgress.svelte` has no `setInterval` call; it uses `EventSource` for progress updates
|
||||
- [ ] The `WorkerStatusPanel` shows the correct number of running workers live during a multi-tag indexing run
|
||||
- [ ] Refreshing the jobs page with `?repo=/facebook/react&status=running` pre-populates the filters and fetches with those params
|
||||
|
||||
---
|
||||
|
||||
## Migration Safety
|
||||
|
||||
### Backward Compatibility
|
||||
|
||||
The `embedding` blob column is kept. The `vec_embedding` column is nullable during the backfill window and becomes populated as:
|
||||
|
||||
1. The `UPDATE` in `vectors.sql` fills all existing rows on startup
|
||||
2. New embeddings populate it at insert time
|
||||
|
||||
If `vec_embedding IS NULL` for a row (e.g., a row inserted before the migration runs), the vector index silently omits that row from results. The fallback in `HybridSearchService` to FTS-only mode still applies when no embeddings exist, so degraded-but-correct behavior is preserved.
|
||||
|
||||
### Rollback
|
||||
|
||||
Rollback before Phase 4 (vector column): remove `@libsql/better-sqlite3`, restore `better-sqlite3`, restore imports. No schema changes have been made.
|
||||
|
||||
Rollback after Phase 4: schema now has `vec_embedding` column. Drop the column with a migration reversal and restore imports. The `embedding` blob is intact throughout — no data loss.
|
||||
|
||||
### SQLite File Compatibility
|
||||
|
||||
libSQL embedded mode reads and writes standard SQLite 3 files. The WAL file, page size, and encoding are unchanged. An existing production database opened with `@libsql/better-sqlite3` is fully readable and writable. The vector index is stored in a shadow table `idx_snippet_embeddings_vec_shadow` which better-sqlite3 would ignore if rolled back (it is a regular table with a special name).
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
| Package | Action | Reason |
|
||||
| ------------------------ | ----------------------------- | ----------------------------------------------- |
|
||||
| `better-sqlite3` | Remove from `dependencies` | Replaced |
|
||||
| `@types/better-sqlite3` | Remove from `devDependencies` | `@libsql/better-sqlite3` ships own types |
|
||||
| `@libsql/better-sqlite3` | Add to `dependencies` | Drop-in libSQL node addon |
|
||||
| `drizzle-orm` | No change | `better-sqlite3` adapter works unchanged |
|
||||
| `drizzle-kit` | No change | `dialect: 'sqlite'` correct for embedded libSQL |
|
||||
|
||||
No new runtime dependencies beyond the package replacement.
|
||||
|
||||
---
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Tests
|
||||
|
||||
- `src/lib/server/search/vector.search.ts`: add test asserting KNN results are correct for a seeded 3-vector table; verify memory is not proportional to table size (mock `db.prepare` to assert no unbounded `.all()` is called)
|
||||
- `src/lib/server/embeddings/embedding.service.ts`: existing tests cover insert round-trips; verify `vec_embedding` column is non-NULL after `embedSnippets()`
|
||||
|
||||
### Integration Tests
|
||||
|
||||
- `api-contract.integration.test.ts`: existing tests already use `new Database(':memory:')` — these continue to work with `@libsql/better-sqlite3` because the in-memory path is identical
|
||||
- Add one test to `api-contract.integration.test.ts`: seed a repository + multiple embeddings, call `/api/v1/context` in semantic mode, assert non-empty results and response time < 500ms on in-memory DB
|
||||
|
||||
### UI Tests
|
||||
|
||||
- `src/routes/admin/jobs/+page.svelte`: add Vitest browser tests (Playwright) verifying:
|
||||
- Skeleton rows appear before the first fetch resolves (mock `fetch` to delay 200 ms)
|
||||
- Status filter restricts displayed rows; URL param updates
|
||||
- Pausing job A leaves job B's buttons enabled
|
||||
- Toast appears and auto-dismisses on successful pause
|
||||
- Cancel confirm flow shows inline confirmation, not `window.confirm`
|
||||
- `src/lib/components/IndexingProgress.svelte`: unit test that no `setInterval` is created; verify `EventSource` is opened with the correct URL
|
||||
|
||||
### Performance Regression Gate
|
||||
|
||||
Add a benchmark script `scripts/bench-vector-search.mjs` that:
|
||||
|
||||
1. Creates an in-memory libSQL database
|
||||
2. Seeds 10000 snippet embeddings (random Float32Array, 1536 dims)
|
||||
3. Runs 100 `vectorSearch()` calls
|
||||
4. Asserts p99 < 50 ms
|
||||
|
||||
This gates the CI check on Phase 4 correctness and speed.
|
||||
@@ -8,7 +8,7 @@ const entries = [
|
||||
];
|
||||
|
||||
try {
|
||||
const existing = entries.filter(e => existsSync(e));
|
||||
const existing = entries.filter((e) => existsSync(e));
|
||||
if (existing.length === 0) {
|
||||
console.log('[build-workers] No worker entry files found yet, skipping.');
|
||||
process.exit(0);
|
||||
@@ -23,7 +23,7 @@ try {
|
||||
outdir: 'build/workers',
|
||||
outExtension: { '.js': '.mjs' },
|
||||
alias: {
|
||||
'$lib': './src/lib',
|
||||
$lib: './src/lib',
|
||||
'$lib/server': './src/lib/server'
|
||||
},
|
||||
external: ['better-sqlite3', '@xenova/transformers'],
|
||||
|
||||
@@ -33,9 +33,10 @@ try {
|
||||
try {
|
||||
const db = getClient();
|
||||
const activeProfileRow = db
|
||||
.prepare<[], EmbeddingProfileEntityProps>(
|
||||
'SELECT * FROM embedding_profiles WHERE is_default = 1 AND enabled = 1 LIMIT 1'
|
||||
)
|
||||
.prepare<
|
||||
[],
|
||||
EmbeddingProfileEntityProps
|
||||
>('SELECT * FROM embedding_profiles WHERE is_default = 1 AND enabled = 1 LIMIT 1')
|
||||
.get();
|
||||
|
||||
let embeddingService: EmbeddingService | null = null;
|
||||
@@ -55,9 +56,10 @@ try {
|
||||
let concurrency = 2; // default
|
||||
if (dbPath) {
|
||||
const concurrencyRow = db
|
||||
.prepare<[], { value: string }>(
|
||||
"SELECT value FROM settings WHERE key = 'indexing.concurrency' LIMIT 1"
|
||||
)
|
||||
.prepare<
|
||||
[],
|
||||
{ value: string }
|
||||
>("SELECT value FROM settings WHERE key = 'indexing.concurrency' LIMIT 1")
|
||||
.get();
|
||||
if (concurrencyRow) {
|
||||
try {
|
||||
|
||||
@@ -16,21 +16,29 @@
|
||||
|
||||
es.addEventListener('job-done', () => {
|
||||
void fetch(`/api/v1/jobs/${jobId}`)
|
||||
.then(r => r.json())
|
||||
.then(d => { job = d.job; oncomplete?.(); });
|
||||
.then((r) => r.json())
|
||||
.then((d) => {
|
||||
job = d.job;
|
||||
oncomplete?.();
|
||||
});
|
||||
es.close();
|
||||
});
|
||||
|
||||
es.addEventListener('job-failed', (event) => {
|
||||
const data = JSON.parse(event.data);
|
||||
if (job) job = { ...job, status: 'failed', error: data.error ?? 'Unknown error' } as IndexingJob;
|
||||
if (job)
|
||||
job = { ...job, status: 'failed', error: data.error ?? 'Unknown error' } as IndexingJob;
|
||||
oncomplete?.();
|
||||
es.close();
|
||||
});
|
||||
|
||||
es.onerror = () => {
|
||||
es.close();
|
||||
void fetch(`/api/v1/jobs/${jobId}`).then(r => r.json()).then(d => { job = d.job; });
|
||||
void fetch(`/api/v1/jobs/${jobId}`)
|
||||
.then((r) => r.json())
|
||||
.then((d) => {
|
||||
job = d.job;
|
||||
});
|
||||
};
|
||||
|
||||
return () => es.close();
|
||||
|
||||
@@ -1,8 +1,9 @@
|
||||
<script lang="ts">
|
||||
let { rows = 5 }: { rows?: number } = $props();
|
||||
const rowIndexes = $derived(Array.from({ length: rows }, (_, index) => index));
|
||||
</script>
|
||||
|
||||
{#each Array(rows) as _, i (i)}
|
||||
{#each rowIndexes as i (i)}
|
||||
<tr>
|
||||
<td class="px-6 py-4">
|
||||
<div class="h-4 w-48 animate-pulse rounded bg-gray-200"></div>
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
<script lang="ts">
|
||||
import { onDestroy } from 'svelte';
|
||||
import { SvelteMap } from 'svelte/reactivity';
|
||||
|
||||
export interface ToastItem {
|
||||
id: string;
|
||||
@@ -8,7 +9,7 @@
|
||||
}
|
||||
|
||||
let { toasts = $bindable([]) }: { toasts: ToastItem[] } = $props();
|
||||
const timers = new Map<string, ReturnType<typeof setTimeout>>();
|
||||
const timers = new SvelteMap<string, ReturnType<typeof setTimeout>>();
|
||||
|
||||
$effect(() => {
|
||||
for (const toast of toasts) {
|
||||
@@ -70,8 +71,7 @@
|
||||
class="ml-2 text-xs opacity-70 hover:opacity-100"
|
||||
>
|
||||
x
|
||||
</button
|
||||
>
|
||||
</button>
|
||||
</div>
|
||||
{/each}
|
||||
</div>
|
||||
|
||||
@@ -10,9 +10,7 @@ import { GitHubApiError } from './github-tags.js';
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
function mockFetch(status: number, body: unknown): void {
|
||||
vi.spyOn(global, 'fetch').mockResolvedValueOnce(
|
||||
new Response(JSON.stringify(body), { status })
|
||||
);
|
||||
vi.spyOn(global, 'fetch').mockResolvedValueOnce(new Response(JSON.stringify(body), { status }));
|
||||
}
|
||||
|
||||
beforeEach(() => {
|
||||
@@ -105,9 +103,9 @@ describe('fetchGitHubChangedFiles', () => {
|
||||
|
||||
it('throws GitHubApiError on 422 unprocessable entity', async () => {
|
||||
mockFetch(422, { message: 'Unprocessable Entity' });
|
||||
await expect(
|
||||
fetchGitHubChangedFiles('owner', 'repo', 'bad-ref', 'v1.1.0')
|
||||
).rejects.toThrow(GitHubApiError);
|
||||
await expect(fetchGitHubChangedFiles('owner', 'repo', 'bad-ref', 'v1.1.0')).rejects.toThrow(
|
||||
GitHubApiError
|
||||
);
|
||||
});
|
||||
|
||||
it('returns empty array when files property is missing', async () => {
|
||||
@@ -141,9 +139,11 @@ describe('fetchGitHubChangedFiles', () => {
|
||||
});
|
||||
|
||||
it('sends Authorization header when token is provided', async () => {
|
||||
const fetchSpy = vi.spyOn(global, 'fetch').mockResolvedValueOnce(
|
||||
new Response(JSON.stringify({ status: 'ahead', files: [] }), { status: 200 })
|
||||
);
|
||||
const fetchSpy = vi
|
||||
.spyOn(global, 'fetch')
|
||||
.mockResolvedValueOnce(
|
||||
new Response(JSON.stringify({ status: 'ahead', files: [] }), { status: 200 })
|
||||
);
|
||||
await fetchGitHubChangedFiles('owner', 'repo', 'v1.0.0', 'v1.1.0', 'my-token');
|
||||
const callArgs = fetchSpy.mock.calls[0];
|
||||
const headers = (callArgs[1] as RequestInit).headers as Record<string, string>;
|
||||
@@ -151,9 +151,11 @@ describe('fetchGitHubChangedFiles', () => {
|
||||
});
|
||||
|
||||
it('does not send Authorization header when no token provided', async () => {
|
||||
const fetchSpy = vi.spyOn(global, 'fetch').mockResolvedValueOnce(
|
||||
new Response(JSON.stringify({ status: 'ahead', files: [] }), { status: 200 })
|
||||
);
|
||||
const fetchSpy = vi
|
||||
.spyOn(global, 'fetch')
|
||||
.mockResolvedValueOnce(
|
||||
new Response(JSON.stringify({ status: 'ahead', files: [] }), { status: 200 })
|
||||
);
|
||||
await fetchGitHubChangedFiles('owner', 'repo', 'v1.0.0', 'v1.1.0');
|
||||
const callArgs = fetchSpy.mock.calls[0];
|
||||
const headers = (callArgs[1] as RequestInit).headers as Record<string, string>;
|
||||
|
||||
@@ -4,6 +4,7 @@
|
||||
*/
|
||||
import Database from 'better-sqlite3';
|
||||
import { env } from '$env/dynamic/private';
|
||||
import { applySqlitePragmas } from './connection';
|
||||
import { loadSqliteVec } from './sqlite-vec';
|
||||
|
||||
let _client: Database.Database | null = null;
|
||||
@@ -12,14 +13,7 @@ export function getClient(): Database.Database {
|
||||
if (!_client) {
|
||||
if (!env.DATABASE_URL) throw new Error('DATABASE_URL is not set');
|
||||
_client = new Database(env.DATABASE_URL);
|
||||
_client.pragma('journal_mode = WAL');
|
||||
_client.pragma('foreign_keys = ON');
|
||||
_client.pragma('busy_timeout = 5000');
|
||||
_client.pragma('synchronous = NORMAL');
|
||||
_client.pragma('cache_size = -65536');
|
||||
_client.pragma('temp_store = MEMORY');
|
||||
_client.pragma('mmap_size = 268435456');
|
||||
_client.pragma('wal_autocheckpoint = 1000');
|
||||
applySqlitePragmas(_client);
|
||||
loadSqliteVec(_client);
|
||||
}
|
||||
return _client;
|
||||
|
||||
14
src/lib/server/db/connection.ts
Normal file
14
src/lib/server/db/connection.ts
Normal file
@@ -0,0 +1,14 @@
|
||||
import type Database from 'better-sqlite3';
|
||||
|
||||
export const SQLITE_BUSY_TIMEOUT_MS = 30000;
|
||||
|
||||
export function applySqlitePragmas(db: Database.Database): void {
|
||||
db.pragma('journal_mode = WAL');
|
||||
db.pragma('foreign_keys = ON');
|
||||
db.pragma(`busy_timeout = ${SQLITE_BUSY_TIMEOUT_MS}`);
|
||||
db.pragma('synchronous = NORMAL');
|
||||
db.pragma('cache_size = -65536');
|
||||
db.pragma('temp_store = MEMORY');
|
||||
db.pragma('mmap_size = 268435456');
|
||||
db.pragma('wal_autocheckpoint = 1000');
|
||||
}
|
||||
@@ -5,6 +5,7 @@ import { readFileSync } from 'node:fs';
|
||||
import { fileURLToPath } from 'node:url';
|
||||
import { join, dirname } from 'node:path';
|
||||
import * as schema from './schema';
|
||||
import { applySqlitePragmas } from './connection';
|
||||
import { loadSqliteVec } from './sqlite-vec';
|
||||
import { env } from '$env/dynamic/private';
|
||||
|
||||
@@ -12,19 +13,7 @@ if (!env.DATABASE_URL) throw new Error('DATABASE_URL is not set');
|
||||
|
||||
const client = new Database(env.DATABASE_URL);
|
||||
|
||||
// Enable WAL mode for better concurrent read performance.
|
||||
client.pragma('journal_mode = WAL');
|
||||
// Enforce foreign key constraints.
|
||||
client.pragma('foreign_keys = ON');
|
||||
// Wait up to 5 s when the DB is locked instead of failing immediately.
|
||||
// Prevents SQLITE_BUSY errors when the indexing pipeline holds the write lock
|
||||
// and an HTTP request arrives simultaneously.
|
||||
client.pragma('busy_timeout = 5000');
|
||||
client.pragma('synchronous = NORMAL');
|
||||
client.pragma('cache_size = -65536');
|
||||
client.pragma('temp_store = MEMORY');
|
||||
client.pragma('mmap_size = 268435456');
|
||||
client.pragma('wal_autocheckpoint = 1000');
|
||||
applySqlitePragmas(client);
|
||||
loadSqliteVec(client);
|
||||
|
||||
export const db = drizzle(client, { schema });
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -1,55 +1,55 @@
|
||||
{
|
||||
"version": "7",
|
||||
"dialect": "sqlite",
|
||||
"entries": [
|
||||
{
|
||||
"idx": 0,
|
||||
"version": "6",
|
||||
"when": 1774196053634,
|
||||
"tag": "0000_large_master_chief",
|
||||
"breakpoints": true
|
||||
},
|
||||
{
|
||||
"idx": 1,
|
||||
"version": "6",
|
||||
"when": 1774448049161,
|
||||
"tag": "0001_quick_nighthawk",
|
||||
"breakpoints": true
|
||||
},
|
||||
{
|
||||
"idx": 2,
|
||||
"version": "6",
|
||||
"when": 1774461897742,
|
||||
"tag": "0002_silky_stellaris",
|
||||
"breakpoints": true
|
||||
},
|
||||
{
|
||||
"idx": 3,
|
||||
"version": "6",
|
||||
"when": 1743155877000,
|
||||
"tag": "0003_multiversion_config",
|
||||
"breakpoints": true
|
||||
},
|
||||
{
|
||||
"idx": 4,
|
||||
"version": "6",
|
||||
"when": 1774880275833,
|
||||
"tag": "0004_complete_sentry",
|
||||
"breakpoints": true
|
||||
},
|
||||
{
|
||||
"idx": 5,
|
||||
"version": "6",
|
||||
"when": 1774890536284,
|
||||
"tag": "0005_fix_stage_defaults",
|
||||
"breakpoints": true
|
||||
},
|
||||
{
|
||||
"idx": 6,
|
||||
"version": "6",
|
||||
"when": 1775038799913,
|
||||
"tag": "0006_yielding_centennial",
|
||||
"breakpoints": true
|
||||
}
|
||||
]
|
||||
"version": "7",
|
||||
"dialect": "sqlite",
|
||||
"entries": [
|
||||
{
|
||||
"idx": 0,
|
||||
"version": "6",
|
||||
"when": 1774196053634,
|
||||
"tag": "0000_large_master_chief",
|
||||
"breakpoints": true
|
||||
},
|
||||
{
|
||||
"idx": 1,
|
||||
"version": "6",
|
||||
"when": 1774448049161,
|
||||
"tag": "0001_quick_nighthawk",
|
||||
"breakpoints": true
|
||||
},
|
||||
{
|
||||
"idx": 2,
|
||||
"version": "6",
|
||||
"when": 1774461897742,
|
||||
"tag": "0002_silky_stellaris",
|
||||
"breakpoints": true
|
||||
},
|
||||
{
|
||||
"idx": 3,
|
||||
"version": "6",
|
||||
"when": 1743155877000,
|
||||
"tag": "0003_multiversion_config",
|
||||
"breakpoints": true
|
||||
},
|
||||
{
|
||||
"idx": 4,
|
||||
"version": "6",
|
||||
"when": 1774880275833,
|
||||
"tag": "0004_complete_sentry",
|
||||
"breakpoints": true
|
||||
},
|
||||
{
|
||||
"idx": 5,
|
||||
"version": "6",
|
||||
"when": 1774890536284,
|
||||
"tag": "0005_fix_stage_defaults",
|
||||
"breakpoints": true
|
||||
},
|
||||
{
|
||||
"idx": 6,
|
||||
"version": "6",
|
||||
"when": 1775038799913,
|
||||
"tag": "0006_yielding_centennial",
|
||||
"breakpoints": true
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -349,14 +349,14 @@ describe('snippet_embeddings table', () => {
|
||||
});
|
||||
|
||||
it('keeps the relational schema free of vec_embedding and retains the profile index', () => {
|
||||
const columns = client
|
||||
.prepare("PRAGMA table_info('snippet_embeddings')")
|
||||
.all() as Array<{ name: string }>;
|
||||
const columns = client.prepare("PRAGMA table_info('snippet_embeddings')").all() as Array<{
|
||||
name: string;
|
||||
}>;
|
||||
expect(columns.map((column) => column.name)).not.toContain('vec_embedding');
|
||||
|
||||
const indexes = client
|
||||
.prepare("PRAGMA index_list('snippet_embeddings')")
|
||||
.all() as Array<{ name: string }>;
|
||||
const indexes = client.prepare("PRAGMA index_list('snippet_embeddings')").all() as Array<{
|
||||
name: string;
|
||||
}>;
|
||||
expect(indexes.map((index) => index.name)).toContain('idx_embeddings_profile');
|
||||
});
|
||||
|
||||
|
||||
@@ -13,29 +13,33 @@ import {
|
||||
// ---------------------------------------------------------------------------
|
||||
// repositories
|
||||
// ---------------------------------------------------------------------------
|
||||
export const repositories = sqliteTable('repositories', {
|
||||
id: text('id').primaryKey(), // e.g. "/facebook/react" or "/local/my-sdk"
|
||||
title: text('title').notNull(),
|
||||
description: text('description'),
|
||||
source: text('source', { enum: ['github', 'local'] }).notNull(),
|
||||
sourceUrl: text('source_url').notNull(), // GitHub URL or absolute local path
|
||||
branch: text('branch').default('main'),
|
||||
state: text('state', {
|
||||
enum: ['pending', 'indexing', 'indexed', 'error']
|
||||
})
|
||||
.notNull()
|
||||
.default('pending'),
|
||||
totalSnippets: integer('total_snippets').default(0),
|
||||
totalTokens: integer('total_tokens').default(0),
|
||||
trustScore: real('trust_score').default(0), // 0.0–10.0
|
||||
benchmarkScore: real('benchmark_score').default(0), // 0.0–100.0; reserved for future quality metrics
|
||||
stars: integer('stars'),
|
||||
// TODO: encrypt at rest in production; stored as plaintext for v1
|
||||
githubToken: text('github_token'),
|
||||
lastIndexedAt: integer('last_indexed_at', { mode: 'timestamp' }),
|
||||
createdAt: integer('created_at', { mode: 'timestamp' }).notNull(),
|
||||
updatedAt: integer('updated_at', { mode: 'timestamp' }).notNull()
|
||||
}, (t) => [index('idx_repositories_state').on(t.state)]);
|
||||
export const repositories = sqliteTable(
|
||||
'repositories',
|
||||
{
|
||||
id: text('id').primaryKey(), // e.g. "/facebook/react" or "/local/my-sdk"
|
||||
title: text('title').notNull(),
|
||||
description: text('description'),
|
||||
source: text('source', { enum: ['github', 'local'] }).notNull(),
|
||||
sourceUrl: text('source_url').notNull(), // GitHub URL or absolute local path
|
||||
branch: text('branch').default('main'),
|
||||
state: text('state', {
|
||||
enum: ['pending', 'indexing', 'indexed', 'error']
|
||||
})
|
||||
.notNull()
|
||||
.default('pending'),
|
||||
totalSnippets: integer('total_snippets').default(0),
|
||||
totalTokens: integer('total_tokens').default(0),
|
||||
trustScore: real('trust_score').default(0), // 0.0–10.0
|
||||
benchmarkScore: real('benchmark_score').default(0), // 0.0–100.0; reserved for future quality metrics
|
||||
stars: integer('stars'),
|
||||
// TODO: encrypt at rest in production; stored as plaintext for v1
|
||||
githubToken: text('github_token'),
|
||||
lastIndexedAt: integer('last_indexed_at', { mode: 'timestamp' }),
|
||||
createdAt: integer('created_at', { mode: 'timestamp' }).notNull(),
|
||||
updatedAt: integer('updated_at', { mode: 'timestamp' }).notNull()
|
||||
},
|
||||
(t) => [index('idx_repositories_state').on(t.state)]
|
||||
);
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// repository_versions
|
||||
@@ -61,43 +65,51 @@ export const repositoryVersions = sqliteTable('repository_versions', {
|
||||
// ---------------------------------------------------------------------------
|
||||
// documents
|
||||
// ---------------------------------------------------------------------------
|
||||
export const documents = sqliteTable('documents', {
|
||||
id: text('id').primaryKey(), // UUID
|
||||
repositoryId: text('repository_id')
|
||||
.notNull()
|
||||
.references(() => repositories.id, { onDelete: 'cascade' }),
|
||||
versionId: text('version_id').references(() => repositoryVersions.id, { onDelete: 'cascade' }),
|
||||
filePath: text('file_path').notNull(), // relative path within repo
|
||||
title: text('title'),
|
||||
language: text('language'), // e.g. "typescript", "markdown"
|
||||
tokenCount: integer('token_count').default(0),
|
||||
checksum: text('checksum').notNull(), // SHA-256 of file content
|
||||
indexedAt: integer('indexed_at', { mode: 'timestamp' }).notNull()
|
||||
}, (t) => [index('idx_documents_repo_version').on(t.repositoryId, t.versionId)]);
|
||||
export const documents = sqliteTable(
|
||||
'documents',
|
||||
{
|
||||
id: text('id').primaryKey(), // UUID
|
||||
repositoryId: text('repository_id')
|
||||
.notNull()
|
||||
.references(() => repositories.id, { onDelete: 'cascade' }),
|
||||
versionId: text('version_id').references(() => repositoryVersions.id, { onDelete: 'cascade' }),
|
||||
filePath: text('file_path').notNull(), // relative path within repo
|
||||
title: text('title'),
|
||||
language: text('language'), // e.g. "typescript", "markdown"
|
||||
tokenCount: integer('token_count').default(0),
|
||||
checksum: text('checksum').notNull(), // SHA-256 of file content
|
||||
indexedAt: integer('indexed_at', { mode: 'timestamp' }).notNull()
|
||||
},
|
||||
(t) => [index('idx_documents_repo_version').on(t.repositoryId, t.versionId)]
|
||||
);
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// snippets
|
||||
// ---------------------------------------------------------------------------
|
||||
export const snippets = sqliteTable('snippets', {
|
||||
id: text('id').primaryKey(), // UUID
|
||||
documentId: text('document_id')
|
||||
.notNull()
|
||||
.references(() => documents.id, { onDelete: 'cascade' }),
|
||||
repositoryId: text('repository_id')
|
||||
.notNull()
|
||||
.references(() => repositories.id, { onDelete: 'cascade' }),
|
||||
versionId: text('version_id').references(() => repositoryVersions.id, { onDelete: 'cascade' }),
|
||||
type: text('type', { enum: ['code', 'info'] }).notNull(),
|
||||
title: text('title'),
|
||||
content: text('content').notNull(), // searchable text / code
|
||||
language: text('language'),
|
||||
breadcrumb: text('breadcrumb'), // e.g. "Installation > Getting Started"
|
||||
tokenCount: integer('token_count').default(0),
|
||||
createdAt: integer('created_at', { mode: 'timestamp' }).notNull()
|
||||
}, (t) => [
|
||||
index('idx_snippets_repo_version').on(t.repositoryId, t.versionId),
|
||||
index('idx_snippets_repo_type').on(t.repositoryId, t.type),
|
||||
]);
|
||||
export const snippets = sqliteTable(
|
||||
'snippets',
|
||||
{
|
||||
id: text('id').primaryKey(), // UUID
|
||||
documentId: text('document_id')
|
||||
.notNull()
|
||||
.references(() => documents.id, { onDelete: 'cascade' }),
|
||||
repositoryId: text('repository_id')
|
||||
.notNull()
|
||||
.references(() => repositories.id, { onDelete: 'cascade' }),
|
||||
versionId: text('version_id').references(() => repositoryVersions.id, { onDelete: 'cascade' }),
|
||||
type: text('type', { enum: ['code', 'info'] }).notNull(),
|
||||
title: text('title'),
|
||||
content: text('content').notNull(), // searchable text / code
|
||||
language: text('language'),
|
||||
breadcrumb: text('breadcrumb'), // e.g. "Installation > Getting Started"
|
||||
tokenCount: integer('token_count').default(0),
|
||||
createdAt: integer('created_at', { mode: 'timestamp' }).notNull()
|
||||
},
|
||||
(t) => [
|
||||
index('idx_snippets_repo_version').on(t.repositoryId, t.versionId),
|
||||
index('idx_snippets_repo_type').on(t.repositoryId, t.type)
|
||||
]
|
||||
);
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// embedding_profiles
|
||||
@@ -134,34 +146,52 @@ export const snippetEmbeddings = sqliteTable(
|
||||
},
|
||||
(table) => [
|
||||
primaryKey({ columns: [table.snippetId, table.profileId] }),
|
||||
index('idx_embeddings_profile').on(table.profileId, table.snippetId),
|
||||
index('idx_embeddings_profile').on(table.profileId, table.snippetId)
|
||||
]
|
||||
);
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// indexing_jobs
|
||||
// ---------------------------------------------------------------------------
|
||||
export const indexingJobs = sqliteTable('indexing_jobs', {
|
||||
id: text('id').primaryKey(), // UUID
|
||||
repositoryId: text('repository_id')
|
||||
.notNull()
|
||||
.references(() => repositories.id, { onDelete: 'cascade' }),
|
||||
versionId: text('version_id'),
|
||||
status: text('status', {
|
||||
enum: ['queued', 'running', 'paused', 'cancelled', 'done', 'failed']
|
||||
})
|
||||
.notNull()
|
||||
.default('queued'),
|
||||
progress: integer('progress').default(0), // 0–100
|
||||
totalFiles: integer('total_files').default(0),
|
||||
processedFiles: integer('processed_files').default(0),
|
||||
stage: text('stage', { enum: ['queued', 'differential', 'crawling', 'cloning', 'parsing', 'storing', 'embedding', 'done', 'failed'] }).notNull().default('queued'),
|
||||
stageDetail: text('stage_detail'),
|
||||
error: text('error'),
|
||||
startedAt: integer('started_at', { mode: 'timestamp' }),
|
||||
completedAt: integer('completed_at', { mode: 'timestamp' }),
|
||||
createdAt: integer('created_at', { mode: 'timestamp' }).notNull()
|
||||
}, (t) => [index('idx_jobs_repo_status').on(t.repositoryId, t.status)]);
|
||||
export const indexingJobs = sqliteTable(
|
||||
'indexing_jobs',
|
||||
{
|
||||
id: text('id').primaryKey(), // UUID
|
||||
repositoryId: text('repository_id')
|
||||
.notNull()
|
||||
.references(() => repositories.id, { onDelete: 'cascade' }),
|
||||
versionId: text('version_id'),
|
||||
status: text('status', {
|
||||
enum: ['queued', 'running', 'paused', 'cancelled', 'done', 'failed']
|
||||
})
|
||||
.notNull()
|
||||
.default('queued'),
|
||||
progress: integer('progress').default(0), // 0–100
|
||||
totalFiles: integer('total_files').default(0),
|
||||
processedFiles: integer('processed_files').default(0),
|
||||
stage: text('stage', {
|
||||
enum: [
|
||||
'queued',
|
||||
'differential',
|
||||
'crawling',
|
||||
'cloning',
|
||||
'parsing',
|
||||
'storing',
|
||||
'embedding',
|
||||
'done',
|
||||
'failed'
|
||||
]
|
||||
})
|
||||
.notNull()
|
||||
.default('queued'),
|
||||
stageDetail: text('stage_detail'),
|
||||
error: text('error'),
|
||||
startedAt: integer('started_at', { mode: 'timestamp' }),
|
||||
completedAt: integer('completed_at', { mode: 'timestamp' }),
|
||||
createdAt: integer('created_at', { mode: 'timestamp' }).notNull()
|
||||
},
|
||||
(t) => [index('idx_jobs_repo_status').on(t.repositoryId, t.status)]
|
||||
);
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// repository_configs
|
||||
|
||||
@@ -12,11 +12,7 @@ import { migrate } from 'drizzle-orm/better-sqlite3/migrator';
|
||||
import { readFileSync } from 'node:fs';
|
||||
import { join } from 'node:path';
|
||||
import * as schema from '../db/schema.js';
|
||||
import {
|
||||
loadSqliteVec,
|
||||
sqliteVecRowidTableName,
|
||||
sqliteVecTableName
|
||||
} from '../db/sqlite-vec.js';
|
||||
import { loadSqliteVec, sqliteVecRowidTableName, sqliteVecTableName } from '../db/sqlite-vec.js';
|
||||
import { SqliteVecStore } from '../search/sqlite-vec.store.js';
|
||||
|
||||
import { NoopEmbeddingProvider, EmbeddingError, type EmbeddingVector } from './provider.js';
|
||||
@@ -424,6 +420,25 @@ describe('EmbeddingService', () => {
|
||||
expect(embedding![2]).toBeCloseTo(0.2, 5);
|
||||
});
|
||||
|
||||
it('can delegate embedding persistence to an injected writer', async () => {
|
||||
const snippetId = seedSnippet(db, client);
|
||||
const provider = makeProvider(4);
|
||||
const persistEmbeddings = vi.fn().mockResolvedValue(undefined);
|
||||
const service = new EmbeddingService(client, provider, 'local-default', {
|
||||
persistEmbeddings
|
||||
});
|
||||
|
||||
await service.embedSnippets([snippetId]);
|
||||
|
||||
expect(persistEmbeddings).toHaveBeenCalledTimes(1);
|
||||
const rows = client
|
||||
.prepare(
|
||||
'SELECT COUNT(*) AS cnt FROM snippet_embeddings WHERE snippet_id = ? AND profile_id = ?'
|
||||
)
|
||||
.get(snippetId, 'local-default') as { cnt: number };
|
||||
expect(rows.cnt).toBe(0);
|
||||
});
|
||||
|
||||
it('stores embeddings under the configured profile ID', async () => {
|
||||
client
|
||||
.prepare(
|
||||
@@ -431,16 +446,7 @@ describe('EmbeddingService', () => {
|
||||
(id, provider_kind, title, enabled, is_default, model, dimensions, config, created_at, updated_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, unixepoch(), unixepoch())`
|
||||
)
|
||||
.run(
|
||||
'openai-custom',
|
||||
'openai-compatible',
|
||||
'OpenAI Custom',
|
||||
1,
|
||||
0,
|
||||
'test-model',
|
||||
4,
|
||||
'{}'
|
||||
);
|
||||
.run('openai-custom', 'openai-compatible', 'OpenAI Custom', 1, 0, 'test-model', 4, '{}');
|
||||
|
||||
const snippetId = seedSnippet(db, client);
|
||||
const provider = makeProvider(4, 'test-model');
|
||||
|
||||
@@ -6,6 +6,10 @@
|
||||
import type Database from 'better-sqlite3';
|
||||
import type { EmbeddingProvider } from './provider.js';
|
||||
import { SqliteVecStore } from '$lib/server/search/sqlite-vec.store.js';
|
||||
import {
|
||||
upsertEmbeddings,
|
||||
type PersistedEmbedding
|
||||
} from '$lib/server/pipeline/write-operations.js';
|
||||
|
||||
interface SnippetRow {
|
||||
id: string;
|
||||
@@ -23,7 +27,10 @@ export class EmbeddingService {
|
||||
constructor(
|
||||
private readonly db: Database.Database,
|
||||
private readonly provider: EmbeddingProvider,
|
||||
private readonly profileId: string = 'local-default'
|
||||
private readonly profileId: string = 'local-default',
|
||||
private readonly persistenceDelegate?: {
|
||||
persistEmbeddings?: (embeddings: PersistedEmbedding[]) => Promise<void>;
|
||||
}
|
||||
) {
|
||||
this.sqliteVecStore = new SqliteVecStore(db);
|
||||
}
|
||||
@@ -94,37 +101,31 @@ export class EmbeddingService {
|
||||
[s.title, s.breadcrumb, s.content].filter(Boolean).join('\n').slice(0, TEXT_MAX_CHARS)
|
||||
);
|
||||
|
||||
const insert = this.db.prepare<[string, string, string, number, Buffer]>(`
|
||||
INSERT OR REPLACE INTO snippet_embeddings (snippet_id, profile_id, model, dimensions, embedding, created_at)
|
||||
VALUES (?, ?, ?, ?, ?, unixepoch())
|
||||
`);
|
||||
|
||||
for (let i = 0; i < snippets.length; i += BATCH_SIZE) {
|
||||
const batchSnippets = snippets.slice(i, i + BATCH_SIZE);
|
||||
const batchTexts = texts.slice(i, i + BATCH_SIZE);
|
||||
|
||||
const embeddings = await this.provider.embed(batchTexts);
|
||||
|
||||
const insertMany = this.db.transaction(() => {
|
||||
for (let j = 0; j < batchSnippets.length; j++) {
|
||||
const snippet = batchSnippets[j];
|
||||
const embedding = embeddings[j];
|
||||
|
||||
insert.run(
|
||||
snippet.id,
|
||||
this.profileId,
|
||||
embedding.model,
|
||||
embedding.dimensions,
|
||||
Buffer.from(
|
||||
embedding.values.buffer,
|
||||
embedding.values.byteOffset,
|
||||
embedding.values.byteLength
|
||||
)
|
||||
);
|
||||
this.sqliteVecStore.upsertEmbedding(this.profileId, snippet.id, embedding.values);
|
||||
}
|
||||
const persistedEmbeddings: PersistedEmbedding[] = batchSnippets.map((snippet, index) => {
|
||||
const embedding = embeddings[index];
|
||||
return {
|
||||
snippetId: snippet.id,
|
||||
profileId: this.profileId,
|
||||
model: embedding.model,
|
||||
dimensions: embedding.dimensions,
|
||||
embedding: Buffer.from(
|
||||
embedding.values.buffer,
|
||||
embedding.values.byteOffset,
|
||||
embedding.values.byteLength
|
||||
)
|
||||
};
|
||||
});
|
||||
insertMany();
|
||||
|
||||
if (this.persistenceDelegate?.persistEmbeddings) {
|
||||
await this.persistenceDelegate.persistEmbeddings(persistedEmbeddings);
|
||||
} else {
|
||||
upsertEmbeddings(this.db, persistedEmbeddings);
|
||||
}
|
||||
|
||||
onProgress?.(Math.min(i + BATCH_SIZE, snippets.length), snippets.length);
|
||||
}
|
||||
|
||||
@@ -1,7 +1,4 @@
|
||||
import {
|
||||
EmbeddingProfile,
|
||||
EmbeddingProfileEntity
|
||||
} from '$lib/server/models/embedding-profile.js';
|
||||
import { EmbeddingProfile, EmbeddingProfileEntity } from '$lib/server/models/embedding-profile.js';
|
||||
|
||||
function parseConfig(config: Record<string, unknown> | string | null): Record<string, unknown> {
|
||||
if (!config) {
|
||||
|
||||
@@ -44,7 +44,10 @@ function createTestDb(): Database.Database {
|
||||
'0004_complete_sentry.sql'
|
||||
]) {
|
||||
const sql = readFileSync(join(migrationsFolder, migrationFile), 'utf-8');
|
||||
for (const stmt of sql.split('--> statement-breakpoint').map((s) => s.trim()).filter(Boolean)) {
|
||||
for (const stmt of sql
|
||||
.split('--> statement-breakpoint')
|
||||
.map((s) => s.trim())
|
||||
.filter(Boolean)) {
|
||||
client.exec(stmt);
|
||||
}
|
||||
}
|
||||
@@ -113,9 +116,10 @@ function insertDocument(db: Database.Database, versionId: string, filePath: stri
|
||||
.run(
|
||||
id,
|
||||
db
|
||||
.prepare<[string], { repository_id: string }>(
|
||||
`SELECT repository_id FROM repository_versions WHERE id = ?`
|
||||
)
|
||||
.prepare<
|
||||
[string],
|
||||
{ repository_id: string }
|
||||
>(`SELECT repository_id FROM repository_versions WHERE id = ?`)
|
||||
.get(versionId)?.repository_id ?? '/test/repo',
|
||||
versionId,
|
||||
filePath,
|
||||
@@ -280,9 +284,9 @@ describe('buildDifferentialPlan', () => {
|
||||
insertDocument(db, v1Id, 'packages/react/index.js');
|
||||
insertDocument(db, v1Id, 'packages/react-dom/index.js');
|
||||
|
||||
const fetchFn = vi.fn().mockResolvedValue([
|
||||
{ path: 'packages/react/index.js', status: 'modified' as const }
|
||||
]);
|
||||
const fetchFn = vi
|
||||
.fn()
|
||||
.mockResolvedValue([{ path: 'packages/react/index.js', status: 'modified' as const }]);
|
||||
|
||||
const plan = await buildDifferentialPlan({
|
||||
repo,
|
||||
@@ -292,13 +296,7 @@ describe('buildDifferentialPlan', () => {
|
||||
});
|
||||
|
||||
expect(fetchFn).toHaveBeenCalledOnce();
|
||||
expect(fetchFn).toHaveBeenCalledWith(
|
||||
'facebook',
|
||||
'react',
|
||||
'v18.0.0',
|
||||
'v18.1.0',
|
||||
'ghp_test123'
|
||||
);
|
||||
expect(fetchFn).toHaveBeenCalledWith('facebook', 'react', 'v18.0.0', 'v18.1.0', 'ghp_test123');
|
||||
|
||||
expect(plan).not.toBeNull();
|
||||
expect(plan!.changedPaths.has('packages/react/index.js')).toBe(true);
|
||||
|
||||
@@ -41,9 +41,7 @@ export async function buildDifferentialPlan(params: {
|
||||
try {
|
||||
// 1. Load all indexed versions for this repository
|
||||
const rows = db
|
||||
.prepare(
|
||||
`SELECT * FROM repository_versions WHERE repository_id = ? AND state = 'indexed'`
|
||||
)
|
||||
.prepare(`SELECT * FROM repository_versions WHERE repository_id = ? AND state = 'indexed'`)
|
||||
.all(repo.id) as RepositoryVersionEntity[];
|
||||
|
||||
const indexedVersions: RepositoryVersion[] = rows.map((row) =>
|
||||
|
||||
@@ -1,10 +1,19 @@
|
||||
import { workerData, parentPort } from 'node:worker_threads';
|
||||
import Database from 'better-sqlite3';
|
||||
import { EmbeddingService } from '$lib/server/embeddings/embedding.service.js';
|
||||
import { applySqlitePragmas } from '$lib/server/db/connection.js';
|
||||
import { createProviderFromProfile } from '$lib/server/embeddings/registry.js';
|
||||
import { EmbeddingProfileMapper } from '$lib/server/mappers/embedding-profile.mapper.js';
|
||||
import { EmbeddingProfileEntity, type EmbeddingProfileEntityProps } from '$lib/server/models/embedding-profile.js';
|
||||
import type { EmbedWorkerRequest, EmbedWorkerResponse, WorkerInitData } from './worker-types.js';
|
||||
import {
|
||||
EmbeddingProfileEntity,
|
||||
type EmbeddingProfileEntityProps
|
||||
} from '$lib/server/models/embedding-profile.js';
|
||||
import type {
|
||||
EmbedWorkerRequest,
|
||||
EmbedWorkerResponse,
|
||||
SerializedEmbedding,
|
||||
WorkerInitData
|
||||
} from './worker-types.js';
|
||||
|
||||
const { dbPath, embeddingProfileId } = workerData as WorkerInitData;
|
||||
|
||||
@@ -18,17 +27,12 @@ if (!embeddingProfileId) {
|
||||
}
|
||||
|
||||
const db = new Database(dbPath);
|
||||
db.pragma('journal_mode = WAL');
|
||||
db.pragma('foreign_keys = ON');
|
||||
db.pragma('busy_timeout = 5000');
|
||||
db.pragma('synchronous = NORMAL');
|
||||
db.pragma('cache_size = -65536');
|
||||
db.pragma('temp_store = MEMORY');
|
||||
db.pragma('mmap_size = 268435456');
|
||||
db.pragma('wal_autocheckpoint = 1000');
|
||||
applySqlitePragmas(db);
|
||||
|
||||
// Load the embedding profile from DB
|
||||
const rawProfile = db.prepare('SELECT * FROM embedding_profiles WHERE id = ?').get(embeddingProfileId);
|
||||
const rawProfile = db
|
||||
.prepare('SELECT * FROM embedding_profiles WHERE id = ?')
|
||||
.get(embeddingProfileId);
|
||||
|
||||
if (!rawProfile) {
|
||||
db.close();
|
||||
@@ -43,9 +47,55 @@ if (!rawProfile) {
|
||||
const profileEntity = new EmbeddingProfileEntity(rawProfile as EmbeddingProfileEntityProps);
|
||||
const profile = EmbeddingProfileMapper.fromEntity(profileEntity);
|
||||
|
||||
let pendingWrite: {
|
||||
jobId: string;
|
||||
resolve: () => void;
|
||||
reject: (error: Error) => void;
|
||||
} | null = null;
|
||||
let currentJobId: string | null = null;
|
||||
|
||||
function requestWrite(
|
||||
message: Extract<EmbedWorkerResponse, { type: 'write_embeddings' }>
|
||||
): Promise<void> {
|
||||
if (pendingWrite) {
|
||||
return Promise.reject(new Error(`write request already in flight for ${pendingWrite.jobId}`));
|
||||
}
|
||||
|
||||
return new Promise((resolve, reject) => {
|
||||
pendingWrite = {
|
||||
jobId: message.jobId,
|
||||
resolve: () => {
|
||||
pendingWrite = null;
|
||||
resolve();
|
||||
},
|
||||
reject: (error: Error) => {
|
||||
pendingWrite = null;
|
||||
reject(error);
|
||||
}
|
||||
};
|
||||
parentPort!.postMessage(message);
|
||||
});
|
||||
}
|
||||
|
||||
// Create provider and embedding service
|
||||
const provider = createProviderFromProfile(profile);
|
||||
const embeddingService = new EmbeddingService(db, provider, embeddingProfileId);
|
||||
const embeddingService = new EmbeddingService(db, provider, embeddingProfileId, {
|
||||
persistEmbeddings: async (embeddings) => {
|
||||
const serializedEmbeddings: SerializedEmbedding[] = embeddings.map((item) => ({
|
||||
snippetId: item.snippetId,
|
||||
profileId: item.profileId,
|
||||
model: item.model,
|
||||
dimensions: item.dimensions,
|
||||
embedding: Uint8Array.from(item.embedding)
|
||||
}));
|
||||
|
||||
await requestWrite({
|
||||
type: 'write_embeddings',
|
||||
jobId: currentJobId ?? 'unknown',
|
||||
embeddings: serializedEmbeddings
|
||||
});
|
||||
}
|
||||
});
|
||||
|
||||
// Signal ready after service initialization
|
||||
parentPort!.postMessage({
|
||||
@@ -53,12 +103,27 @@ parentPort!.postMessage({
|
||||
} satisfies EmbedWorkerResponse);
|
||||
|
||||
parentPort!.on('message', async (msg: EmbedWorkerRequest) => {
|
||||
if (msg.type === 'write_ack') {
|
||||
if (pendingWrite?.jobId === msg.jobId) {
|
||||
pendingWrite.resolve();
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === 'write_error') {
|
||||
if (pendingWrite?.jobId === msg.jobId) {
|
||||
pendingWrite.reject(new Error(msg.error));
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === 'shutdown') {
|
||||
db.close();
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
if (msg.type === 'embed') {
|
||||
currentJobId = msg.jobId;
|
||||
try {
|
||||
const snippetIds = embeddingService.findSnippetIdsMissingEmbeddings(
|
||||
msg.repositoryId,
|
||||
@@ -84,6 +149,8 @@ parentPort!.on('message', async (msg: EmbedWorkerRequest) => {
|
||||
jobId: msg.jobId,
|
||||
error: err instanceof Error ? err.message : String(err)
|
||||
} satisfies EmbedWorkerResponse);
|
||||
} finally {
|
||||
currentJobId = null;
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
@@ -466,12 +466,15 @@ describe('IndexingPipeline', () => {
|
||||
const job1 = makeJob();
|
||||
await pipeline.run(job1 as never);
|
||||
|
||||
const firstSnippetIds = (db.prepare(`SELECT id FROM snippets ORDER BY id`).all() as { id: string }[])
|
||||
.map((row) => row.id);
|
||||
const firstSnippetIds = (
|
||||
db.prepare(`SELECT id FROM snippets ORDER BY id`).all() as { id: string }[]
|
||||
).map((row) => row.id);
|
||||
expect(firstSnippetIds.length).toBeGreaterThan(0);
|
||||
|
||||
const firstEmbeddingCount = (
|
||||
db.prepare(`SELECT COUNT(*) as n FROM snippet_embeddings WHERE profile_id = 'local-default'`).get() as {
|
||||
db
|
||||
.prepare(`SELECT COUNT(*) as n FROM snippet_embeddings WHERE profile_id = 'local-default'`)
|
||||
.get() as {
|
||||
n: number;
|
||||
}
|
||||
).n;
|
||||
@@ -483,11 +486,15 @@ describe('IndexingPipeline', () => {
|
||||
const job2 = db.prepare(`SELECT * FROM indexing_jobs WHERE id = ?`).get(job2Id) as never;
|
||||
await pipeline.run(job2);
|
||||
|
||||
const secondSnippetIds = (db.prepare(`SELECT id FROM snippets ORDER BY id`).all() as {
|
||||
id: string;
|
||||
}[]).map((row) => row.id);
|
||||
const secondSnippetIds = (
|
||||
db.prepare(`SELECT id FROM snippets ORDER BY id`).all() as {
|
||||
id: string;
|
||||
}[]
|
||||
).map((row) => row.id);
|
||||
const secondEmbeddingCount = (
|
||||
db.prepare(`SELECT COUNT(*) as n FROM snippet_embeddings WHERE profile_id = 'local-default'`).get() as {
|
||||
db
|
||||
.prepare(`SELECT COUNT(*) as n FROM snippet_embeddings WHERE profile_id = 'local-default'`)
|
||||
.get() as {
|
||||
n: number;
|
||||
}
|
||||
).n;
|
||||
@@ -918,9 +925,9 @@ describe('IndexingPipeline', () => {
|
||||
|
||||
await pipeline.run(job as never);
|
||||
|
||||
const docs = db
|
||||
.prepare(`SELECT file_path FROM documents ORDER BY file_path`)
|
||||
.all() as { file_path: string }[];
|
||||
const docs = db.prepare(`SELECT file_path FROM documents ORDER BY file_path`).all() as {
|
||||
file_path: string;
|
||||
}[];
|
||||
const filePaths = docs.map((d) => d.file_path);
|
||||
|
||||
// migration-guide.md and docs/legacy-api.md must be absent.
|
||||
@@ -956,7 +963,10 @@ describe('IndexingPipeline', () => {
|
||||
|
||||
expect(row).toBeDefined();
|
||||
const rules = JSON.parse(row!.rules);
|
||||
expect(rules).toEqual(['Always use TypeScript strict mode', 'Prefer async/await over callbacks']);
|
||||
expect(rules).toEqual([
|
||||
'Always use TypeScript strict mode',
|
||||
'Prefer async/await over callbacks'
|
||||
]);
|
||||
});
|
||||
|
||||
it('persists version-specific rules under (repositoryId, versionId) when job has versionId', async () => {
|
||||
@@ -1219,12 +1229,7 @@ describe('differential indexing', () => {
|
||||
insertSnippet(db, doc1Id, { repository_id: '/test/repo', version_id: ancestorVersionId });
|
||||
insertSnippet(db, doc2Id, { repository_id: '/test/repo', version_id: ancestorVersionId });
|
||||
|
||||
const pipeline = new IndexingPipeline(
|
||||
db,
|
||||
vi.fn() as never,
|
||||
{ crawl: vi.fn() } as never,
|
||||
null
|
||||
);
|
||||
const pipeline = new IndexingPipeline(db, vi.fn() as never, { crawl: vi.fn() } as never, null);
|
||||
(pipeline as unknown as PipelineInternals).cloneFromAncestor(
|
||||
ancestorVersionId,
|
||||
targetVersionId,
|
||||
@@ -1236,9 +1241,7 @@ describe('differential indexing', () => {
|
||||
.prepare(`SELECT * FROM documents WHERE version_id = ?`)
|
||||
.all(targetVersionId) as { id: string; file_path: string }[];
|
||||
expect(targetDocs).toHaveLength(2);
|
||||
expect(targetDocs.map((d) => d.file_path).sort()).toEqual(
|
||||
['README.md', 'src/index.ts'].sort()
|
||||
);
|
||||
expect(targetDocs.map((d) => d.file_path).sort()).toEqual(['README.md', 'src/index.ts'].sort());
|
||||
// New IDs must differ from ancestor doc IDs.
|
||||
const targetDocIds = targetDocs.map((d) => d.id);
|
||||
expect(targetDocIds).not.toContain(doc1Id);
|
||||
@@ -1261,12 +1264,7 @@ describe('differential indexing', () => {
|
||||
checksum: 'sha-main'
|
||||
});
|
||||
|
||||
const pipeline = new IndexingPipeline(
|
||||
db,
|
||||
vi.fn() as never,
|
||||
{ crawl: vi.fn() } as never,
|
||||
null
|
||||
);
|
||||
const pipeline = new IndexingPipeline(db, vi.fn() as never, { crawl: vi.fn() } as never, null);
|
||||
(pipeline as unknown as PipelineInternals).cloneFromAncestor(
|
||||
ancestorVersionId,
|
||||
targetVersionId,
|
||||
@@ -1323,9 +1321,9 @@ describe('differential indexing', () => {
|
||||
|
||||
await pipeline.run(job);
|
||||
|
||||
const updatedJob = db
|
||||
.prepare(`SELECT status FROM indexing_jobs WHERE id = ?`)
|
||||
.get(jobId) as { status: string };
|
||||
const updatedJob = db.prepare(`SELECT status FROM indexing_jobs WHERE id = ?`).get(jobId) as {
|
||||
status: string;
|
||||
};
|
||||
expect(updatedJob.status).toBe('done');
|
||||
|
||||
const docs = db
|
||||
@@ -1375,9 +1373,7 @@ describe('differential indexing', () => {
|
||||
deletedPaths: new Set<string>(),
|
||||
unchangedPaths: new Set(['unchanged.md'])
|
||||
};
|
||||
const spy = vi
|
||||
.spyOn(diffStrategy, 'buildDifferentialPlan')
|
||||
.mockResolvedValueOnce(mockPlan);
|
||||
const spy = vi.spyOn(diffStrategy, 'buildDifferentialPlan').mockResolvedValueOnce(mockPlan);
|
||||
|
||||
const pipeline = new IndexingPipeline(
|
||||
db,
|
||||
@@ -1398,9 +1394,9 @@ describe('differential indexing', () => {
|
||||
spy.mockRestore();
|
||||
|
||||
// 6. Assert job completed and both docs exist under the target version.
|
||||
const finalJob = db
|
||||
.prepare(`SELECT status FROM indexing_jobs WHERE id = ?`)
|
||||
.get(jobId) as { status: string };
|
||||
const finalJob = db.prepare(`SELECT status FROM indexing_jobs WHERE id = ?`).get(jobId) as {
|
||||
status: string;
|
||||
};
|
||||
expect(finalJob.status).toBe('done');
|
||||
|
||||
const targetDocs = db
|
||||
|
||||
@@ -28,6 +28,14 @@ import { parseFile } from '$lib/server/parser/index.js';
|
||||
import { computeTrustScore } from '$lib/server/search/trust-score.js';
|
||||
import { computeDiff } from './diff.js';
|
||||
import { buildDifferentialPlan, type DifferentialPlan } from './differential-strategy.js';
|
||||
import {
|
||||
cloneFromAncestor as cloneFromAncestorInDatabase,
|
||||
replaceSnippets as replaceSnippetsInDatabase,
|
||||
updateRepo as updateRepoInDatabase,
|
||||
updateVersion as updateVersionInDatabase,
|
||||
type CloneFromAncestorRequest
|
||||
} from './write-operations.js';
|
||||
import type { SerializedFields } from './worker-types.js';
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Progress calculation
|
||||
@@ -70,7 +78,23 @@ export class IndexingPipeline {
|
||||
private readonly db: Database.Database,
|
||||
private readonly githubCrawl: typeof GithubCrawlFn,
|
||||
private readonly localCrawler: LocalCrawler,
|
||||
private readonly embeddingService: EmbeddingService | null
|
||||
private readonly embeddingService: EmbeddingService | null,
|
||||
private readonly writeDelegate?: {
|
||||
persistJobUpdates?: boolean;
|
||||
replaceSnippets?: (
|
||||
changedDocIds: string[],
|
||||
newDocuments: NewDocument[],
|
||||
newSnippets: NewSnippet[]
|
||||
) => Promise<void>;
|
||||
cloneFromAncestor?: (request: CloneFromAncestorRequest) => Promise<void>;
|
||||
updateRepo?: (repositoryId: string, fields: SerializedFields) => Promise<void>;
|
||||
updateVersion?: (versionId: string, fields: SerializedFields) => Promise<void>;
|
||||
upsertRepoConfig?: (
|
||||
repositoryId: string,
|
||||
versionId: string | null,
|
||||
rules: string[]
|
||||
) => Promise<void>;
|
||||
}
|
||||
) {
|
||||
this.sqliteVecStore = new SqliteVecStore(db);
|
||||
}
|
||||
@@ -117,14 +141,12 @@ export class IndexingPipeline {
|
||||
if (!repo) throw new Error(`Repository ${repositoryId} not found`);
|
||||
|
||||
// Mark repo as actively indexing.
|
||||
this.updateRepo(repo.id, { state: 'indexing' });
|
||||
await this.updateRepo(repo.id, { state: 'indexing' });
|
||||
if (normJob.versionId) {
|
||||
this.updateVersion(normJob.versionId, { state: 'indexing' });
|
||||
await this.updateVersion(normJob.versionId, { state: 'indexing' });
|
||||
}
|
||||
|
||||
const versionTag = normJob.versionId
|
||||
? this.getVersionTag(normJob.versionId)
|
||||
: undefined;
|
||||
const versionTag = normJob.versionId ? this.getVersionTag(normJob.versionId) : undefined;
|
||||
|
||||
// ---- Stage 0: Differential strategy (TRUEREF-0021) ----------------------
|
||||
// When indexing a tagged version, check if we can inherit unchanged files
|
||||
@@ -147,12 +169,12 @@ export class IndexingPipeline {
|
||||
// If a differential plan exists, clone unchanged files from ancestor.
|
||||
if (differentialPlan && differentialPlan.unchangedPaths.size > 0) {
|
||||
reportStage('cloning');
|
||||
this.cloneFromAncestor(
|
||||
differentialPlan.ancestorVersionId,
|
||||
normJob.versionId!,
|
||||
repo.id,
|
||||
differentialPlan.unchangedPaths
|
||||
);
|
||||
await this.cloneFromAncestor({
|
||||
ancestorVersionId: differentialPlan.ancestorVersionId,
|
||||
targetVersionId: normJob.versionId!,
|
||||
repositoryId: repo.id,
|
||||
unchangedPaths: [...differentialPlan.unchangedPaths]
|
||||
});
|
||||
console.info(
|
||||
`[IndexingPipeline] Differential indexing: cloned ${differentialPlan.unchangedPaths.size} unchanged files from ${differentialPlan.ancestorTag}`
|
||||
);
|
||||
@@ -174,7 +196,11 @@ export class IndexingPipeline {
|
||||
if (crawlResult.config) {
|
||||
// Config was pre-parsed by the crawler — wrap it in a ParsedConfig
|
||||
// shell so the rest of the pipeline can use it uniformly.
|
||||
parsedConfig = { config: crawlResult.config, source: 'trueref.json', warnings: [] } satisfies ParsedConfig;
|
||||
parsedConfig = {
|
||||
config: crawlResult.config,
|
||||
source: 'trueref.json',
|
||||
warnings: []
|
||||
} satisfies ParsedConfig;
|
||||
} else {
|
||||
const configFile = crawlResult.files.find(
|
||||
(f) => f.path === 'trueref.json' || f.path === 'context7.json'
|
||||
@@ -189,7 +215,10 @@ export class IndexingPipeline {
|
||||
const filteredFiles =
|
||||
excludeFiles.length > 0
|
||||
? crawlResult.files.filter(
|
||||
(f) => !excludeFiles.some((pattern) => IndexingPipeline.matchesExcludePattern(f.path, pattern))
|
||||
(f) =>
|
||||
!excludeFiles.some((pattern) =>
|
||||
IndexingPipeline.matchesExcludePattern(f.path, pattern)
|
||||
)
|
||||
)
|
||||
: crawlResult.files;
|
||||
|
||||
@@ -303,7 +332,13 @@ export class IndexingPipeline {
|
||||
this.embeddingService !== null
|
||||
);
|
||||
this.updateJob(job.id, { processedFiles: totalProcessed, progress });
|
||||
reportStage('parsing', `${totalProcessed} / ${totalFiles} files`, progress, totalProcessed, totalFiles);
|
||||
reportStage(
|
||||
'parsing',
|
||||
`${totalProcessed} / ${totalFiles} files`,
|
||||
progress,
|
||||
totalProcessed,
|
||||
totalFiles
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -312,7 +347,7 @@ export class IndexingPipeline {
|
||||
|
||||
// ---- Stage 3: Atomic replacement ------------------------------------
|
||||
reportStage('storing');
|
||||
this.replaceSnippets(repo.id, changedDocIds, newDocuments, newSnippets);
|
||||
await this.replaceSnippets(repo.id, changedDocIds, newDocuments, newSnippets);
|
||||
|
||||
// ---- Stage 4: Embeddings (if provider is configured) ----------------
|
||||
if (this.embeddingService) {
|
||||
@@ -325,7 +360,7 @@ export class IndexingPipeline {
|
||||
if (snippetIds.length === 0) {
|
||||
// No missing embeddings for the active profile; parsing progress is final.
|
||||
} else {
|
||||
const embeddingsTotal = snippetIds.length;
|
||||
const embeddingsTotal = snippetIds.length;
|
||||
|
||||
await this.embeddingService.embedSnippets(snippetIds, (done) => {
|
||||
const progress = calculateProgress(
|
||||
@@ -350,7 +385,7 @@ export class IndexingPipeline {
|
||||
state: 'indexed'
|
||||
});
|
||||
|
||||
this.updateRepo(repo.id, {
|
||||
await this.updateRepo(repo.id, {
|
||||
state: 'indexed',
|
||||
totalSnippets: stats.totalSnippets,
|
||||
totalTokens: stats.totalTokens,
|
||||
@@ -360,7 +395,7 @@ export class IndexingPipeline {
|
||||
|
||||
if (normJob.versionId) {
|
||||
const versionStats = this.computeVersionStats(normJob.versionId);
|
||||
this.updateVersion(normJob.versionId, {
|
||||
await this.updateVersion(normJob.versionId, {
|
||||
state: 'indexed',
|
||||
totalSnippets: versionStats.totalSnippets,
|
||||
indexedAt: Math.floor(Date.now() / 1000)
|
||||
@@ -371,12 +406,12 @@ export class IndexingPipeline {
|
||||
if (parsedConfig?.config.rules?.length) {
|
||||
if (!normJob.versionId) {
|
||||
// Main-branch job: write the repo-wide entry only.
|
||||
this.upsertRepoConfig(repo.id, null, parsedConfig.config.rules);
|
||||
await this.upsertRepoConfig(repo.id, null, parsedConfig.config.rules);
|
||||
} else {
|
||||
// Version job: write only the version-specific entry.
|
||||
// Writing to the NULL row here would overwrite repo-wide rules
|
||||
// with whatever the last-indexed version happened to carry.
|
||||
this.upsertRepoConfig(repo.id, normJob.versionId, parsedConfig.config.rules);
|
||||
await this.upsertRepoConfig(repo.id, normJob.versionId, parsedConfig.config.rules);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -398,9 +433,9 @@ export class IndexingPipeline {
|
||||
});
|
||||
|
||||
// Restore repo to error state but preserve any existing indexed data.
|
||||
this.updateRepo(repositoryId, { state: 'error' });
|
||||
await this.updateRepo(repositoryId, { state: 'error' });
|
||||
if (normJob.versionId) {
|
||||
this.updateVersion(normJob.versionId, { state: 'error' });
|
||||
await this.updateVersion(normJob.versionId, { state: 'error' });
|
||||
}
|
||||
|
||||
throw error;
|
||||
@@ -411,7 +446,11 @@ export class IndexingPipeline {
|
||||
// Private — crawl
|
||||
// -------------------------------------------------------------------------
|
||||
|
||||
private async crawl(repo: Repository, ref?: string, allowedPaths?: Set<string>): Promise<{
|
||||
private async crawl(
|
||||
repo: Repository,
|
||||
ref?: string,
|
||||
allowedPaths?: Set<string>
|
||||
): Promise<{
|
||||
files: Array<{ path: string; content: string; sha: string; size: number; language: string }>;
|
||||
totalFiles: number;
|
||||
/** Pre-parsed trueref.json / context7.json, or undefined when absent. */
|
||||
@@ -473,219 +512,50 @@ export class IndexingPipeline {
|
||||
*
|
||||
* Runs in a single SQLite transaction for atomicity.
|
||||
*/
|
||||
private cloneFromAncestor(
|
||||
ancestorVersionId: string,
|
||||
targetVersionId: string,
|
||||
repositoryId: string,
|
||||
unchangedPaths: Set<string>
|
||||
): void {
|
||||
this.db.transaction(() => {
|
||||
const pathList = [...unchangedPaths];
|
||||
const placeholders = pathList.map(() => '?').join(',');
|
||||
const ancestorDocs = this.db
|
||||
.prepare(
|
||||
`SELECT * FROM documents WHERE version_id = ? AND file_path IN (${placeholders})`
|
||||
)
|
||||
.all(ancestorVersionId, ...pathList) as Array<{
|
||||
id: string;
|
||||
repository_id: string;
|
||||
file_path: string;
|
||||
title: string | null;
|
||||
language: string | null;
|
||||
token_count: number;
|
||||
checksum: string;
|
||||
indexed_at: number;
|
||||
}>;
|
||||
private async cloneFromAncestor(
|
||||
requestOrAncestorVersionId: CloneFromAncestorRequest | string,
|
||||
targetVersionId?: string,
|
||||
repositoryId?: string,
|
||||
unchangedPaths?: Set<string>
|
||||
): Promise<void> {
|
||||
const request: CloneFromAncestorRequest =
|
||||
typeof requestOrAncestorVersionId === 'string'
|
||||
? {
|
||||
ancestorVersionId: requestOrAncestorVersionId,
|
||||
targetVersionId: targetVersionId!,
|
||||
repositoryId: repositoryId!,
|
||||
unchangedPaths: [...(unchangedPaths ?? new Set<string>())]
|
||||
}
|
||||
: requestOrAncestorVersionId;
|
||||
|
||||
const docIdMap = new Map<string, string>();
|
||||
const nowEpoch = Math.floor(Date.now() / 1000);
|
||||
if (request.unchangedPaths.length === 0) {
|
||||
return;
|
||||
}
|
||||
|
||||
for (const doc of ancestorDocs) {
|
||||
const newDocId = randomUUID();
|
||||
docIdMap.set(doc.id, newDocId);
|
||||
this.db
|
||||
.prepare(
|
||||
`INSERT INTO documents (id, repository_id, version_id, file_path, title, language, token_count, checksum, indexed_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
||||
)
|
||||
.run(
|
||||
newDocId,
|
||||
repositoryId,
|
||||
targetVersionId,
|
||||
doc.file_path,
|
||||
doc.title,
|
||||
doc.language,
|
||||
doc.token_count,
|
||||
doc.checksum,
|
||||
nowEpoch
|
||||
);
|
||||
}
|
||||
if (this.writeDelegate?.cloneFromAncestor) {
|
||||
await this.writeDelegate.cloneFromAncestor(request);
|
||||
return;
|
||||
}
|
||||
|
||||
if (docIdMap.size === 0) return;
|
||||
|
||||
const oldDocIds = [...docIdMap.keys()];
|
||||
const snippetPlaceholders = oldDocIds.map(() => '?').join(',');
|
||||
const ancestorSnippets = this.db
|
||||
.prepare(
|
||||
`SELECT * FROM snippets WHERE document_id IN (${snippetPlaceholders})`
|
||||
)
|
||||
.all(...oldDocIds) as Array<{
|
||||
id: string;
|
||||
document_id: string;
|
||||
repository_id: string;
|
||||
version_id: string | null;
|
||||
type: string;
|
||||
title: string | null;
|
||||
content: string;
|
||||
language: string | null;
|
||||
breadcrumb: string | null;
|
||||
token_count: number;
|
||||
created_at: number;
|
||||
}>;
|
||||
|
||||
const snippetIdMap = new Map<string, string>();
|
||||
for (const snippet of ancestorSnippets) {
|
||||
const newSnippetId = randomUUID();
|
||||
snippetIdMap.set(snippet.id, newSnippetId);
|
||||
const newDocId = docIdMap.get(snippet.document_id)!;
|
||||
this.db
|
||||
.prepare(
|
||||
`INSERT INTO snippets (id, document_id, repository_id, version_id, type, title, content, language, breadcrumb, token_count, created_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
||||
)
|
||||
.run(
|
||||
newSnippetId,
|
||||
newDocId,
|
||||
repositoryId,
|
||||
targetVersionId,
|
||||
snippet.type,
|
||||
snippet.title,
|
||||
snippet.content,
|
||||
snippet.language,
|
||||
snippet.breadcrumb,
|
||||
snippet.token_count,
|
||||
snippet.created_at
|
||||
);
|
||||
}
|
||||
|
||||
if (snippetIdMap.size > 0) {
|
||||
const oldSnippetIds = [...snippetIdMap.keys()];
|
||||
const embPlaceholders = oldSnippetIds.map(() => '?').join(',');
|
||||
const ancestorEmbeddings = this.db
|
||||
.prepare(
|
||||
`SELECT * FROM snippet_embeddings WHERE snippet_id IN (${embPlaceholders})`
|
||||
)
|
||||
.all(...oldSnippetIds) as Array<{
|
||||
snippet_id: string;
|
||||
profile_id: string;
|
||||
model: string;
|
||||
dimensions: number;
|
||||
embedding: Buffer;
|
||||
created_at: number;
|
||||
}>;
|
||||
for (const emb of ancestorEmbeddings) {
|
||||
const newSnippetId = snippetIdMap.get(emb.snippet_id)!;
|
||||
this.db
|
||||
.prepare(
|
||||
`INSERT INTO snippet_embeddings (snippet_id, profile_id, model, dimensions, embedding, created_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?)`
|
||||
)
|
||||
.run(
|
||||
newSnippetId,
|
||||
emb.profile_id,
|
||||
emb.model,
|
||||
emb.dimensions,
|
||||
emb.embedding,
|
||||
emb.created_at
|
||||
);
|
||||
this.sqliteVecStore.upsertEmbeddingBuffer(
|
||||
emb.profile_id,
|
||||
newSnippetId,
|
||||
emb.embedding,
|
||||
emb.dimensions
|
||||
);
|
||||
}
|
||||
}
|
||||
})();
|
||||
cloneFromAncestorInDatabase(this.db, request);
|
||||
}
|
||||
|
||||
// -------------------------------------------------------------------------
|
||||
// Private — atomic snippet replacement
|
||||
// -------------------------------------------------------------------------
|
||||
|
||||
private replaceSnippets(
|
||||
private async replaceSnippets(
|
||||
_repositoryId: string,
|
||||
changedDocIds: string[],
|
||||
newDocuments: NewDocument[],
|
||||
newSnippets: NewSnippet[]
|
||||
): void {
|
||||
const insertDoc = this.db.prepare(
|
||||
`INSERT INTO documents
|
||||
(id, repository_id, version_id, file_path, title, language,
|
||||
token_count, checksum, indexed_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
||||
);
|
||||
): Promise<void> {
|
||||
if (this.writeDelegate?.replaceSnippets) {
|
||||
await this.writeDelegate.replaceSnippets(changedDocIds, newDocuments, newSnippets);
|
||||
return;
|
||||
}
|
||||
|
||||
const insertSnippet = this.db.prepare(
|
||||
`INSERT INTO snippets
|
||||
(id, document_id, repository_id, version_id, type, title,
|
||||
content, language, breadcrumb, token_count, created_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
||||
);
|
||||
|
||||
this.db.transaction(() => {
|
||||
this.sqliteVecStore.deleteEmbeddingsForDocumentIds(changedDocIds);
|
||||
|
||||
// Delete stale documents (cascade deletes their snippets via FK).
|
||||
if (changedDocIds.length > 0) {
|
||||
const placeholders = changedDocIds.map(() => '?').join(',');
|
||||
this.db
|
||||
.prepare(`DELETE FROM documents WHERE id IN (${placeholders})`)
|
||||
.run(...changedDocIds);
|
||||
}
|
||||
|
||||
// Insert new documents.
|
||||
for (const doc of newDocuments) {
|
||||
const indexedAtSeconds =
|
||||
doc.indexedAt instanceof Date
|
||||
? Math.floor(doc.indexedAt.getTime() / 1000)
|
||||
: Math.floor(Date.now() / 1000);
|
||||
|
||||
insertDoc.run(
|
||||
doc.id,
|
||||
doc.repositoryId,
|
||||
doc.versionId ?? null,
|
||||
doc.filePath,
|
||||
doc.title ?? null,
|
||||
doc.language ?? null,
|
||||
doc.tokenCount ?? 0,
|
||||
doc.checksum,
|
||||
indexedAtSeconds
|
||||
);
|
||||
}
|
||||
|
||||
// Insert new snippets.
|
||||
for (const snippet of newSnippets) {
|
||||
const createdAtSeconds =
|
||||
snippet.createdAt instanceof Date
|
||||
? Math.floor(snippet.createdAt.getTime() / 1000)
|
||||
: Math.floor(Date.now() / 1000);
|
||||
|
||||
insertSnippet.run(
|
||||
snippet.id,
|
||||
snippet.documentId,
|
||||
snippet.repositoryId,
|
||||
snippet.versionId ?? null,
|
||||
snippet.type,
|
||||
snippet.title ?? null,
|
||||
snippet.content,
|
||||
snippet.language ?? null,
|
||||
snippet.breadcrumb ?? null,
|
||||
snippet.tokenCount ?? 0,
|
||||
createdAtSeconds
|
||||
);
|
||||
}
|
||||
})();
|
||||
replaceSnippetsInDatabase(this.db, changedDocIds, newDocuments, newSnippets);
|
||||
}
|
||||
|
||||
// -------------------------------------------------------------------------
|
||||
@@ -709,9 +579,10 @@ export class IndexingPipeline {
|
||||
|
||||
private computeVersionStats(versionId: string): { totalSnippets: number } {
|
||||
const row = this.db
|
||||
.prepare<[string], { total_snippets: number }>(
|
||||
`SELECT COUNT(*) as total_snippets FROM snippets WHERE version_id = ?`
|
||||
)
|
||||
.prepare<
|
||||
[string],
|
||||
{ total_snippets: number }
|
||||
>(`SELECT COUNT(*) as total_snippets FROM snippets WHERE version_id = ?`)
|
||||
.get(versionId);
|
||||
|
||||
return { totalSnippets: row?.total_snippets ?? 0 };
|
||||
@@ -750,6 +621,10 @@ export class IndexingPipeline {
|
||||
}
|
||||
|
||||
private updateJob(id: string, fields: Record<string, unknown>): void {
|
||||
if (this.writeDelegate?.persistJobUpdates === false) {
|
||||
return;
|
||||
}
|
||||
|
||||
const sets = Object.keys(fields)
|
||||
.map((k) => `${toSnake(k)} = ?`)
|
||||
.join(', ');
|
||||
@@ -757,43 +632,44 @@ export class IndexingPipeline {
|
||||
this.db.prepare(`UPDATE indexing_jobs SET ${sets} WHERE id = ?`).run(...values);
|
||||
}
|
||||
|
||||
private updateRepo(id: string, fields: Record<string, unknown>): void {
|
||||
const now = Math.floor(Date.now() / 1000);
|
||||
const allFields = { ...fields, updatedAt: now };
|
||||
const sets = Object.keys(allFields)
|
||||
.map((k) => `${toSnake(k)} = ?`)
|
||||
.join(', ');
|
||||
const values = [...Object.values(allFields), id];
|
||||
this.db.prepare(`UPDATE repositories SET ${sets} WHERE id = ?`).run(...values);
|
||||
private async updateRepo(id: string, fields: SerializedFields): Promise<void> {
|
||||
if (this.writeDelegate?.updateRepo) {
|
||||
await this.writeDelegate.updateRepo(id, fields);
|
||||
return;
|
||||
}
|
||||
|
||||
updateRepoInDatabase(this.db, id, fields);
|
||||
}
|
||||
|
||||
private updateVersion(id: string, fields: Record<string, unknown>): void {
|
||||
const sets = Object.keys(fields)
|
||||
.map((k) => `${toSnake(k)} = ?`)
|
||||
.join(', ');
|
||||
const values = [...Object.values(fields), id];
|
||||
this.db.prepare(`UPDATE repository_versions SET ${sets} WHERE id = ?`).run(...values);
|
||||
private async updateVersion(id: string, fields: SerializedFields): Promise<void> {
|
||||
if (this.writeDelegate?.updateVersion) {
|
||||
await this.writeDelegate.updateVersion(id, fields);
|
||||
return;
|
||||
}
|
||||
|
||||
updateVersionInDatabase(this.db, id, fields);
|
||||
}
|
||||
|
||||
private upsertRepoConfig(
|
||||
private async upsertRepoConfig(
|
||||
repositoryId: string,
|
||||
versionId: string | null,
|
||||
rules: string[]
|
||||
): void {
|
||||
): Promise<void> {
|
||||
if (this.writeDelegate?.upsertRepoConfig) {
|
||||
await this.writeDelegate.upsertRepoConfig(repositoryId, versionId, rules);
|
||||
return;
|
||||
}
|
||||
|
||||
const now = Math.floor(Date.now() / 1000);
|
||||
// Use DELETE + INSERT because ON CONFLICT … DO UPDATE doesn't work reliably
|
||||
// with partial unique indexes in all SQLite versions.
|
||||
if (versionId === null) {
|
||||
this.db
|
||||
.prepare(
|
||||
`DELETE FROM repository_configs WHERE repository_id = ? AND version_id IS NULL`
|
||||
)
|
||||
.prepare(`DELETE FROM repository_configs WHERE repository_id = ? AND version_id IS NULL`)
|
||||
.run(repositoryId);
|
||||
} else {
|
||||
this.db
|
||||
.prepare(
|
||||
`DELETE FROM repository_configs WHERE repository_id = ? AND version_id = ?`
|
||||
)
|
||||
.prepare(`DELETE FROM repository_configs WHERE repository_id = ? AND version_id = ?`)
|
||||
.run(repositoryId, versionId);
|
||||
}
|
||||
this.db
|
||||
|
||||
@@ -36,10 +36,10 @@ function normalizeStatuses(status?: JobStatusFilter): Array<IndexingJob['status'
|
||||
return [...new Set(statuses)];
|
||||
}
|
||||
|
||||
function buildJobFilterQuery(options?: {
|
||||
repositoryId?: string;
|
||||
status?: JobStatusFilter;
|
||||
}): { where: string; params: unknown[] } {
|
||||
function buildJobFilterQuery(options?: { repositoryId?: string; status?: JobStatusFilter }): {
|
||||
where: string;
|
||||
params: unknown[];
|
||||
} {
|
||||
const conditions: string[] = [];
|
||||
const params: unknown[] = [];
|
||||
|
||||
@@ -164,7 +164,9 @@ export class JobQueue {
|
||||
*/
|
||||
private async processNext(): Promise<void> {
|
||||
// Fallback path: no worker pool configured, run directly (used by tests and dev mode)
|
||||
console.warn('[JobQueue] Running in fallback mode (no worker pool) — direct pipeline execution.');
|
||||
console.warn(
|
||||
'[JobQueue] Running in fallback mode (no worker pool) — direct pipeline execution.'
|
||||
);
|
||||
|
||||
const rawJob = this.db
|
||||
.prepare<[], IndexingJobEntity>(
|
||||
@@ -176,7 +178,9 @@ export class JobQueue {
|
||||
|
||||
if (!rawJob) return;
|
||||
|
||||
console.warn('[JobQueue] processNext: no pipeline or pool configured — skipping job processing');
|
||||
console.warn(
|
||||
'[JobQueue] processNext: no pipeline or pool configured — skipping job processing'
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
|
||||
@@ -181,7 +181,9 @@ describe('ProgressBroadcaster', () => {
|
||||
concurrency: 2,
|
||||
active: 1,
|
||||
idle: 1,
|
||||
workers: [{ index: 0, state: 'running', jobId: 'job-1', repositoryId: '/repo/1', versionId: null }]
|
||||
workers: [
|
||||
{ index: 0, state: 'running', jobId: 'job-1', repositoryId: '/repo/1', versionId: null }
|
||||
]
|
||||
});
|
||||
|
||||
const { value } = await reader.read();
|
||||
|
||||
@@ -19,6 +19,7 @@ import { WorkerPool } from './worker-pool.js';
|
||||
import { initBroadcaster } from './progress-broadcaster.js';
|
||||
import type { ProgressBroadcaster } from './progress-broadcaster.js';
|
||||
import path from 'node:path';
|
||||
import { existsSync } from 'node:fs';
|
||||
import { fileURLToPath } from 'node:url';
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
@@ -57,6 +58,21 @@ let _pipeline: IndexingPipeline | null = null;
|
||||
let _pool: WorkerPool | null = null;
|
||||
let _broadcaster: ProgressBroadcaster | null = null;
|
||||
|
||||
function resolveWorkerScript(...segments: string[]): string {
|
||||
const candidates = [
|
||||
path.resolve(process.cwd(), ...segments),
|
||||
path.resolve(path.dirname(fileURLToPath(import.meta.url)), '../../../../', ...segments)
|
||||
];
|
||||
|
||||
for (const candidate of candidates) {
|
||||
if (existsSync(candidate)) {
|
||||
return candidate;
|
||||
}
|
||||
}
|
||||
|
||||
return candidates[0];
|
||||
}
|
||||
|
||||
/**
|
||||
* Initialise (or return the existing) JobQueue + IndexingPipeline pair.
|
||||
*
|
||||
@@ -91,19 +107,17 @@ export function initializePipeline(
|
||||
|
||||
const getRepositoryIdForJob = (jobId: string): string => {
|
||||
const row = db
|
||||
.prepare<[string], { repository_id: string }>(
|
||||
`SELECT repository_id FROM indexing_jobs WHERE id = ?`
|
||||
)
|
||||
.prepare<
|
||||
[string],
|
||||
{ repository_id: string }
|
||||
>(`SELECT repository_id FROM indexing_jobs WHERE id = ?`)
|
||||
.get(jobId);
|
||||
return row?.repository_id ?? '';
|
||||
};
|
||||
|
||||
// Resolve worker script paths relative to this file (build/workers/ directory)
|
||||
const __filename = fileURLToPath(import.meta.url);
|
||||
const __dirname = path.dirname(__filename);
|
||||
const workerScript = path.join(__dirname, '../../../build/workers/worker-entry.mjs');
|
||||
const embedWorkerScript = path.join(__dirname, '../../../build/workers/embed-worker-entry.mjs');
|
||||
const writeWorkerScript = path.join(__dirname, '../../../build/workers/write-worker-entry.mjs');
|
||||
const workerScript = resolveWorkerScript('build', 'workers', 'worker-entry.mjs');
|
||||
const embedWorkerScript = resolveWorkerScript('build', 'workers', 'embed-worker-entry.mjs');
|
||||
const writeWorkerScript = resolveWorkerScript('build', 'workers', 'write-worker-entry.mjs');
|
||||
|
||||
try {
|
||||
_pool = new WorkerPool({
|
||||
@@ -113,13 +127,6 @@ export function initializePipeline(
|
||||
writeWorkerScript,
|
||||
dbPath: options.dbPath,
|
||||
onProgress: (jobId, msg) => {
|
||||
// Update DB with progress
|
||||
db.prepare(
|
||||
`UPDATE indexing_jobs
|
||||
SET stage = ?, stage_detail = ?, progress = ?, processed_files = ?, total_files = ?
|
||||
WHERE id = ?`
|
||||
).run(msg.stage, msg.stageDetail ?? null, msg.progress, msg.processedFiles, msg.totalFiles, jobId);
|
||||
|
||||
// Broadcast progress event
|
||||
if (_broadcaster) {
|
||||
_broadcaster.broadcast(jobId, getRepositoryIdForJob(jobId), 'job-progress', {
|
||||
@@ -129,11 +136,6 @@ export function initializePipeline(
|
||||
}
|
||||
},
|
||||
onJobDone: (jobId: string) => {
|
||||
// Update job status to done
|
||||
db.prepare(`UPDATE indexing_jobs SET status = 'done', completed_at = unixepoch() WHERE id = ?`).run(
|
||||
jobId
|
||||
);
|
||||
|
||||
// Broadcast done event
|
||||
if (_broadcaster) {
|
||||
_broadcaster.broadcast(jobId, getRepositoryIdForJob(jobId), 'job-done', {
|
||||
@@ -143,11 +145,6 @@ export function initializePipeline(
|
||||
}
|
||||
},
|
||||
onJobFailed: (jobId: string, error: string) => {
|
||||
// Update job status to failed with error message
|
||||
db.prepare(
|
||||
`UPDATE indexing_jobs SET status = 'failed', error = ?, completed_at = unixepoch() WHERE id = ?`
|
||||
).run(error, jobId);
|
||||
|
||||
// Broadcast failed event
|
||||
if (_broadcaster) {
|
||||
_broadcaster.broadcast(jobId, getRepositoryIdForJob(jobId), 'job-failed', {
|
||||
@@ -231,5 +228,3 @@ export function _resetSingletons(): void {
|
||||
_pool = null;
|
||||
_broadcaster = null;
|
||||
}
|
||||
|
||||
|
||||
|
||||
@@ -5,24 +5,175 @@ import { crawl as githubCrawl } from '$lib/server/crawler/github.crawler.js';
|
||||
import { LocalCrawler } from '$lib/server/crawler/local.crawler.js';
|
||||
import { IndexingJobMapper } from '$lib/server/mappers/indexing-job.mapper.js';
|
||||
import { IndexingJobEntity, type IndexingJobEntityProps } from '$lib/server/models/indexing-job.js';
|
||||
import type { ParseWorkerRequest, ParseWorkerResponse, WorkerInitData } from './worker-types.js';
|
||||
import { applySqlitePragmas } from '$lib/server/db/connection.js';
|
||||
import type {
|
||||
ParseWorkerRequest,
|
||||
ParseWorkerResponse,
|
||||
SerializedDocument,
|
||||
SerializedSnippet,
|
||||
WorkerInitData
|
||||
} from './worker-types.js';
|
||||
import type { IndexingStage } from '$lib/types.js';
|
||||
|
||||
const { dbPath } = workerData as WorkerInitData;
|
||||
const db = new Database(dbPath);
|
||||
db.pragma('journal_mode = WAL');
|
||||
db.pragma('foreign_keys = ON');
|
||||
db.pragma('busy_timeout = 5000');
|
||||
db.pragma('synchronous = NORMAL');
|
||||
db.pragma('cache_size = -65536');
|
||||
db.pragma('temp_store = MEMORY');
|
||||
db.pragma('mmap_size = 268435456');
|
||||
db.pragma('wal_autocheckpoint = 1000');
|
||||
applySqlitePragmas(db);
|
||||
|
||||
const pipeline = new IndexingPipeline(db, githubCrawl, new LocalCrawler(), null);
|
||||
let pendingWrite: {
|
||||
jobId: string;
|
||||
resolve: () => void;
|
||||
reject: (error: Error) => void;
|
||||
} | null = null;
|
||||
|
||||
function serializeDocument(document: {
|
||||
id: string;
|
||||
repositoryId: string;
|
||||
versionId?: string | null;
|
||||
filePath: string;
|
||||
title?: string | null;
|
||||
language?: string | null;
|
||||
tokenCount?: number | null;
|
||||
checksum: string;
|
||||
indexedAt: Date;
|
||||
}): SerializedDocument {
|
||||
return {
|
||||
id: document.id,
|
||||
repositoryId: document.repositoryId,
|
||||
versionId: document.versionId ?? null,
|
||||
filePath: document.filePath,
|
||||
title: document.title ?? null,
|
||||
language: document.language ?? null,
|
||||
tokenCount: document.tokenCount ?? 0,
|
||||
checksum: document.checksum,
|
||||
indexedAt: Math.floor(document.indexedAt.getTime() / 1000)
|
||||
};
|
||||
}
|
||||
|
||||
function serializeSnippet(snippet: {
|
||||
id: string;
|
||||
documentId: string;
|
||||
repositoryId: string;
|
||||
versionId?: string | null;
|
||||
type: 'code' | 'info';
|
||||
title?: string | null;
|
||||
content: string;
|
||||
language?: string | null;
|
||||
breadcrumb?: string | null;
|
||||
tokenCount?: number | null;
|
||||
createdAt: Date;
|
||||
}): SerializedSnippet {
|
||||
return {
|
||||
id: snippet.id,
|
||||
documentId: snippet.documentId,
|
||||
repositoryId: snippet.repositoryId,
|
||||
versionId: snippet.versionId ?? null,
|
||||
type: snippet.type,
|
||||
title: snippet.title ?? null,
|
||||
content: snippet.content,
|
||||
language: snippet.language ?? null,
|
||||
breadcrumb: snippet.breadcrumb ?? null,
|
||||
tokenCount: snippet.tokenCount ?? 0,
|
||||
createdAt: Math.floor(snippet.createdAt.getTime() / 1000)
|
||||
};
|
||||
}
|
||||
|
||||
function requestWrite(
|
||||
message: Extract<
|
||||
ParseWorkerResponse,
|
||||
{
|
||||
type:
|
||||
| 'write_replace'
|
||||
| 'write_clone'
|
||||
| 'write_repo_update'
|
||||
| 'write_version_update'
|
||||
| 'write_repo_config';
|
||||
}
|
||||
>
|
||||
): Promise<void> {
|
||||
if (pendingWrite) {
|
||||
return Promise.reject(new Error(`write request already in flight for ${pendingWrite.jobId}`));
|
||||
}
|
||||
|
||||
return new Promise((resolve, reject) => {
|
||||
pendingWrite = {
|
||||
jobId: message.jobId,
|
||||
resolve: () => {
|
||||
pendingWrite = null;
|
||||
resolve();
|
||||
},
|
||||
reject: (error: Error) => {
|
||||
pendingWrite = null;
|
||||
reject(error);
|
||||
}
|
||||
};
|
||||
parentPort!.postMessage(message);
|
||||
});
|
||||
}
|
||||
|
||||
const pipeline = new IndexingPipeline(db, githubCrawl, new LocalCrawler(), null, {
|
||||
persistJobUpdates: false,
|
||||
replaceSnippets: async (changedDocIds, newDocuments, newSnippets) => {
|
||||
await requestWrite({
|
||||
type: 'write_replace',
|
||||
jobId: currentJobId ?? 'unknown',
|
||||
changedDocIds,
|
||||
documents: newDocuments.map(serializeDocument),
|
||||
snippets: newSnippets.map(serializeSnippet)
|
||||
});
|
||||
},
|
||||
cloneFromAncestor: async (request) => {
|
||||
await requestWrite({
|
||||
type: 'write_clone',
|
||||
jobId: currentJobId ?? 'unknown',
|
||||
ancestorVersionId: request.ancestorVersionId,
|
||||
targetVersionId: request.targetVersionId,
|
||||
repositoryId: request.repositoryId,
|
||||
unchangedPaths: request.unchangedPaths
|
||||
});
|
||||
},
|
||||
updateRepo: async (repositoryId, fields) => {
|
||||
await requestWrite({
|
||||
type: 'write_repo_update',
|
||||
jobId: currentJobId ?? 'unknown',
|
||||
repositoryId,
|
||||
fields
|
||||
});
|
||||
},
|
||||
updateVersion: async (versionId, fields) => {
|
||||
await requestWrite({
|
||||
type: 'write_version_update',
|
||||
jobId: currentJobId ?? 'unknown',
|
||||
versionId,
|
||||
fields
|
||||
});
|
||||
},
|
||||
upsertRepoConfig: async (repositoryId, versionId, rules) => {
|
||||
await requestWrite({
|
||||
type: 'write_repo_config',
|
||||
jobId: currentJobId ?? 'unknown',
|
||||
repositoryId,
|
||||
versionId,
|
||||
rules
|
||||
});
|
||||
}
|
||||
});
|
||||
let currentJobId: string | null = null;
|
||||
|
||||
parentPort!.on('message', async (msg: ParseWorkerRequest) => {
|
||||
if (msg.type === 'write_ack') {
|
||||
if (pendingWrite?.jobId === msg.jobId) {
|
||||
pendingWrite.resolve();
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === 'write_error') {
|
||||
if (pendingWrite?.jobId === msg.jobId) {
|
||||
pendingWrite.reject(new Error(msg.error));
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === 'shutdown') {
|
||||
db.close();
|
||||
process.exit(0);
|
||||
@@ -35,11 +186,19 @@ parentPort!.on('message', async (msg: ParseWorkerRequest) => {
|
||||
if (!rawJob) {
|
||||
throw new Error(`Job ${msg.jobId} not found`);
|
||||
}
|
||||
const job = IndexingJobMapper.fromEntity(new IndexingJobEntity(rawJob as IndexingJobEntityProps));
|
||||
const job = IndexingJobMapper.fromEntity(
|
||||
new IndexingJobEntity(rawJob as IndexingJobEntityProps)
|
||||
);
|
||||
|
||||
await pipeline.run(
|
||||
job,
|
||||
(stage: IndexingStage, detail?: string, progress?: number, processedFiles?: number, totalFiles?: number) => {
|
||||
(
|
||||
stage: IndexingStage,
|
||||
detail?: string,
|
||||
progress?: number,
|
||||
processedFiles?: number,
|
||||
totalFiles?: number
|
||||
) => {
|
||||
parentPort!.postMessage({
|
||||
type: 'progress',
|
||||
jobId: msg.jobId,
|
||||
|
||||
@@ -8,7 +8,6 @@
|
||||
|
||||
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
|
||||
import { writeFileSync, unlinkSync, existsSync } from 'node:fs';
|
||||
import { EventEmitter } from 'node:events';
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Hoist FakeWorker + registry so vi.mock can reference them.
|
||||
@@ -36,7 +35,7 @@ const { createdWorkers, FakeWorker } = vi.hoisted(() => {
|
||||
this.threadId = 0;
|
||||
});
|
||||
|
||||
constructor(_script: string, _opts?: unknown) {
|
||||
constructor() {
|
||||
super();
|
||||
createdWorkers.push(this);
|
||||
}
|
||||
@@ -67,6 +66,7 @@ function makeOpts(overrides: Partial<WorkerPoolOptions> = {}): WorkerPoolOptions
|
||||
concurrency: 2,
|
||||
workerScript: FAKE_SCRIPT,
|
||||
embedWorkerScript: MISSING_SCRIPT,
|
||||
writeWorkerScript: MISSING_SCRIPT,
|
||||
dbPath: ':memory:',
|
||||
onProgress: vi.fn(),
|
||||
onJobDone: vi.fn(),
|
||||
@@ -142,6 +142,12 @@ describe('WorkerPool normal mode', () => {
|
||||
expect(createdWorkers).toHaveLength(3);
|
||||
});
|
||||
|
||||
it('spawns a write worker when writeWorkerScript exists', () => {
|
||||
new WorkerPool(makeOpts({ concurrency: 2, writeWorkerScript: FAKE_SCRIPT }));
|
||||
|
||||
expect(createdWorkers).toHaveLength(3);
|
||||
});
|
||||
|
||||
// -------------------------------------------------------------------------
|
||||
// enqueue dispatches to an idle worker
|
||||
// -------------------------------------------------------------------------
|
||||
@@ -208,8 +214,12 @@ describe('WorkerPool normal mode', () => {
|
||||
const runCalls = createdWorkers.flatMap((w) =>
|
||||
w.postMessage.mock.calls.filter((c) => (c[0] as { type: string })?.type === 'run')
|
||||
);
|
||||
expect(runCalls.filter((c) => (c[0] as unknown as { jobId: string }).jobId === 'job-1')).toHaveLength(1);
|
||||
expect(runCalls.filter((c) => (c[0] as unknown as { jobId: string }).jobId === 'job-2')).toHaveLength(0);
|
||||
expect(
|
||||
runCalls.filter((c) => (c[0] as unknown as { jobId: string }).jobId === 'job-1')
|
||||
).toHaveLength(1);
|
||||
expect(
|
||||
runCalls.filter((c) => (c[0] as unknown as { jobId: string }).jobId === 'job-2')
|
||||
).toHaveLength(0);
|
||||
});
|
||||
|
||||
it('starts jobs for different repos concurrently', () => {
|
||||
@@ -227,6 +237,83 @@ describe('WorkerPool normal mode', () => {
|
||||
expect(dispatchedIds).toContain('job-beta');
|
||||
});
|
||||
|
||||
it('dispatches same-repo jobs concurrently when versionIds differ', () => {
|
||||
const pool = new WorkerPool(makeOpts({ concurrency: 2 }));
|
||||
|
||||
pool.enqueue('job-v1', '/repo/same', 'v1');
|
||||
pool.enqueue('job-v2', '/repo/same', 'v2');
|
||||
|
||||
const runCalls = createdWorkers.flatMap((w) =>
|
||||
w.postMessage.mock.calls.filter((c) => (c[0] as { type: string })?.type === 'run')
|
||||
);
|
||||
const dispatchedIds = runCalls.map((c) => (c[0] as unknown as { jobId: string }).jobId);
|
||||
expect(dispatchedIds).toContain('job-v1');
|
||||
expect(dispatchedIds).toContain('job-v2');
|
||||
});
|
||||
|
||||
it('forwards write worker acknowledgements back to the originating parse worker', () => {
|
||||
new WorkerPool(makeOpts({ concurrency: 1, writeWorkerScript: FAKE_SCRIPT }));
|
||||
const parseWorker = createdWorkers[0];
|
||||
const writeWorker = createdWorkers[1];
|
||||
writeWorker.emit('message', { type: 'ready' });
|
||||
|
||||
parseWorker.emit('message', {
|
||||
type: 'write_replace',
|
||||
jobId: 'job-write',
|
||||
changedDocIds: [],
|
||||
documents: [],
|
||||
snippets: []
|
||||
});
|
||||
writeWorker.emit('message', { type: 'write_ack', jobId: 'job-write' });
|
||||
|
||||
expect(writeWorker.postMessage).toHaveBeenCalledWith({
|
||||
type: 'write_replace',
|
||||
jobId: 'job-write',
|
||||
changedDocIds: [],
|
||||
documents: [],
|
||||
snippets: []
|
||||
});
|
||||
expect(parseWorker.postMessage).toHaveBeenCalledWith({ type: 'write_ack', jobId: 'job-write' });
|
||||
});
|
||||
|
||||
it('forwards write worker acknowledgements back to the embed worker', () => {
|
||||
new WorkerPool(
|
||||
makeOpts({
|
||||
concurrency: 1,
|
||||
writeWorkerScript: FAKE_SCRIPT,
|
||||
embedWorkerScript: FAKE_SCRIPT,
|
||||
embeddingProfileId: 'local-default'
|
||||
})
|
||||
);
|
||||
const parseWorker = createdWorkers[0];
|
||||
const embedWorker = createdWorkers[1];
|
||||
const writeWorker = createdWorkers[2];
|
||||
writeWorker.emit('message', { type: 'ready' });
|
||||
embedWorker.emit('message', { type: 'ready' });
|
||||
|
||||
embedWorker.emit('message', {
|
||||
type: 'write_embeddings',
|
||||
jobId: 'job-embed',
|
||||
embeddings: []
|
||||
});
|
||||
writeWorker.emit('message', { type: 'write_ack', jobId: 'job-embed', embeddingCount: 0 });
|
||||
|
||||
expect(parseWorker.postMessage).not.toHaveBeenCalledWith({
|
||||
type: 'write_ack',
|
||||
jobId: 'job-embed'
|
||||
});
|
||||
expect(writeWorker.postMessage).toHaveBeenCalledWith({
|
||||
type: 'write_embeddings',
|
||||
jobId: 'job-embed',
|
||||
embeddings: []
|
||||
});
|
||||
expect(embedWorker.postMessage).toHaveBeenCalledWith({
|
||||
type: 'write_ack',
|
||||
jobId: 'job-embed',
|
||||
embeddingCount: 0
|
||||
});
|
||||
});
|
||||
|
||||
// -------------------------------------------------------------------------
|
||||
// Worker crash (exit code != 0)
|
||||
// -------------------------------------------------------------------------
|
||||
@@ -248,7 +335,7 @@ describe('WorkerPool normal mode', () => {
|
||||
|
||||
it('does NOT call onJobFailed when a worker exits cleanly (code 0)', () => {
|
||||
const opts = makeOpts({ concurrency: 1 });
|
||||
const pool = new WorkerPool(opts);
|
||||
new WorkerPool(opts);
|
||||
|
||||
// Exit without any running job
|
||||
const worker = createdWorkers[0];
|
||||
|
||||
@@ -6,9 +6,12 @@ import type {
|
||||
EmbedWorkerRequest,
|
||||
EmbedWorkerResponse,
|
||||
WorkerInitData,
|
||||
WriteWorkerRequest,
|
||||
WriteWorkerResponse
|
||||
} from './worker-types.js';
|
||||
|
||||
type InFlightWriteRequest = Exclude<WriteWorkerRequest, { type: 'shutdown' }>;
|
||||
|
||||
export interface WorkerPoolOptions {
|
||||
concurrency: number;
|
||||
workerScript: string;
|
||||
@@ -68,6 +71,7 @@ export class WorkerPool {
|
||||
private runningJobs = new Map<Worker, RunningJob>();
|
||||
private runningJobKeys = new Set<string>();
|
||||
private embedQueue: EmbedQueuedJob[] = [];
|
||||
private pendingWriteWorkers = new Map<string, Worker>();
|
||||
private options: WorkerPoolOptions;
|
||||
private fallbackMode = false;
|
||||
private shuttingDown = false;
|
||||
@@ -179,7 +183,11 @@ export class WorkerPool {
|
||||
const job = this.jobQueue.splice(jobIdx, 1)[0];
|
||||
const worker = this.idleWorkers.pop()!;
|
||||
|
||||
this.runningJobs.set(worker, { jobId: job.jobId, repositoryId: job.repositoryId, versionId: job.versionId });
|
||||
this.runningJobs.set(worker, {
|
||||
jobId: job.jobId,
|
||||
repositoryId: job.repositoryId,
|
||||
versionId: job.versionId
|
||||
});
|
||||
this.runningJobKeys.add(WorkerPool.jobKey(job.repositoryId, job.versionId));
|
||||
statusChanged = true;
|
||||
|
||||
@@ -192,14 +200,66 @@ export class WorkerPool {
|
||||
}
|
||||
}
|
||||
|
||||
private postWriteRequest(request: InFlightWriteRequest, worker?: Worker): void {
|
||||
if (!this.writeWorker || !this.writeReady) {
|
||||
if (worker) {
|
||||
worker.postMessage({
|
||||
type: 'write_error',
|
||||
jobId: request.jobId,
|
||||
error: 'Write worker is not ready'
|
||||
} satisfies ParseWorkerRequest);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (worker) {
|
||||
this.pendingWriteWorkers.set(request.jobId, worker);
|
||||
}
|
||||
|
||||
this.writeWorker.postMessage(request);
|
||||
}
|
||||
|
||||
private onWorkerMessage(worker: Worker, msg: ParseWorkerResponse): void {
|
||||
if (msg.type === 'progress') {
|
||||
this.postWriteRequest({
|
||||
type: 'write_job_update',
|
||||
jobId: msg.jobId,
|
||||
fields: {
|
||||
status: 'running',
|
||||
startedAt: Math.floor(Date.now() / 1000),
|
||||
stage: msg.stage,
|
||||
stageDetail: msg.stageDetail ?? null,
|
||||
progress: msg.progress,
|
||||
processedFiles: msg.processedFiles,
|
||||
totalFiles: msg.totalFiles
|
||||
}
|
||||
});
|
||||
this.options.onProgress(msg.jobId, msg);
|
||||
} else if (
|
||||
msg.type === 'write_replace' ||
|
||||
msg.type === 'write_clone' ||
|
||||
msg.type === 'write_repo_update' ||
|
||||
msg.type === 'write_version_update' ||
|
||||
msg.type === 'write_repo_config'
|
||||
) {
|
||||
this.postWriteRequest(msg, worker);
|
||||
} else if (msg.type === 'done') {
|
||||
const runningJob = this.runningJobs.get(worker);
|
||||
this.postWriteRequest({
|
||||
type: 'write_job_update',
|
||||
jobId: msg.jobId,
|
||||
fields: {
|
||||
status: 'done',
|
||||
stage: 'done',
|
||||
progress: 100,
|
||||
completedAt: Math.floor(Date.now() / 1000)
|
||||
}
|
||||
});
|
||||
if (runningJob) {
|
||||
this.runningJobs.delete(worker);
|
||||
this.runningJobKeys.delete(WorkerPool.jobKey(runningJob.repositoryId, runningJob.versionId));
|
||||
this.runningJobKeys.delete(
|
||||
WorkerPool.jobKey(runningJob.repositoryId, runningJob.versionId)
|
||||
);
|
||||
}
|
||||
this.idleWorkers.push(worker);
|
||||
this.options.onJobDone(msg.jobId);
|
||||
@@ -207,20 +267,32 @@ export class WorkerPool {
|
||||
|
||||
// If embedding configured, enqueue embed request
|
||||
if (this.embedWorker && this.options.embeddingProfileId) {
|
||||
const runningJobData = runningJob || { jobId: msg.jobId, repositoryId: '', versionId: null };
|
||||
this.enqueueEmbed(
|
||||
msg.jobId,
|
||||
runningJobData.repositoryId,
|
||||
runningJobData.versionId ?? null
|
||||
);
|
||||
const runningJobData = runningJob || {
|
||||
jobId: msg.jobId,
|
||||
repositoryId: '',
|
||||
versionId: null
|
||||
};
|
||||
this.enqueueEmbed(msg.jobId, runningJobData.repositoryId, runningJobData.versionId ?? null);
|
||||
}
|
||||
|
||||
this.dispatch();
|
||||
} else if (msg.type === 'failed') {
|
||||
const runningJob = this.runningJobs.get(worker);
|
||||
this.postWriteRequest({
|
||||
type: 'write_job_update',
|
||||
jobId: msg.jobId,
|
||||
fields: {
|
||||
status: 'failed',
|
||||
stage: 'failed',
|
||||
error: msg.error,
|
||||
completedAt: Math.floor(Date.now() / 1000)
|
||||
}
|
||||
});
|
||||
if (runningJob) {
|
||||
this.runningJobs.delete(worker);
|
||||
this.runningJobKeys.delete(WorkerPool.jobKey(runningJob.repositoryId, runningJob.versionId));
|
||||
this.runningJobKeys.delete(
|
||||
WorkerPool.jobKey(runningJob.repositoryId, runningJob.versionId)
|
||||
);
|
||||
}
|
||||
this.idleWorkers.push(worker);
|
||||
this.options.onJobFailed(msg.jobId, msg.error);
|
||||
@@ -273,6 +345,22 @@ export class WorkerPool {
|
||||
this.embedReady = true;
|
||||
// Process any queued embed requests
|
||||
this.processEmbedQueue();
|
||||
} else if (msg.type === 'write_embeddings') {
|
||||
const embedWorker = this.embedWorker;
|
||||
if (!embedWorker) {
|
||||
return;
|
||||
}
|
||||
|
||||
if (!this.writeWorker || !this.writeReady) {
|
||||
embedWorker.postMessage({
|
||||
type: 'write_error',
|
||||
jobId: msg.jobId,
|
||||
error: 'Write worker is not ready'
|
||||
} satisfies EmbedWorkerRequest);
|
||||
return;
|
||||
}
|
||||
|
||||
this.postWriteRequest(msg, embedWorker);
|
||||
} else if (msg.type === 'embed-progress') {
|
||||
// Progress message - could be tracked but not strictly required
|
||||
} else if (msg.type === 'embed-done') {
|
||||
@@ -288,6 +376,12 @@ export class WorkerPool {
|
||||
return;
|
||||
}
|
||||
|
||||
const worker = this.pendingWriteWorkers.get(msg.jobId);
|
||||
if (worker) {
|
||||
this.pendingWriteWorkers.delete(msg.jobId);
|
||||
worker.postMessage(msg satisfies ParseWorkerRequest);
|
||||
}
|
||||
|
||||
if (msg.type === 'write_error') {
|
||||
console.error('[WorkerPool] Write worker failed for job:', msg.jobId, msg.error);
|
||||
}
|
||||
@@ -433,6 +527,7 @@ export class WorkerPool {
|
||||
this.idleWorkers = [];
|
||||
this.embedWorker = null;
|
||||
this.writeWorker = null;
|
||||
this.pendingWriteWorkers.clear();
|
||||
this.emitStatusChanged();
|
||||
}
|
||||
|
||||
|
||||
@@ -2,29 +2,58 @@ import type { IndexingStage } from '$lib/types.js';
|
||||
|
||||
export type ParseWorkerRequest =
|
||||
| { type: 'run'; jobId: string }
|
||||
| { type: 'write_ack'; jobId: string }
|
||||
| { type: 'write_error'; jobId: string; error: string }
|
||||
| { type: 'shutdown' };
|
||||
|
||||
export type ParseWorkerResponse =
|
||||
| { type: 'progress'; jobId: string; stage: IndexingStage; stageDetail?: string; progress: number; processedFiles: number; totalFiles: number }
|
||||
| {
|
||||
type: 'progress';
|
||||
jobId: string;
|
||||
stage: IndexingStage;
|
||||
stageDetail?: string;
|
||||
progress: number;
|
||||
processedFiles: number;
|
||||
totalFiles: number;
|
||||
}
|
||||
| { type: 'done'; jobId: string }
|
||||
| { type: 'failed'; jobId: string; error: string };
|
||||
| { type: 'failed'; jobId: string; error: string }
|
||||
| WriteReplaceRequest
|
||||
| WriteCloneRequest
|
||||
| WriteRepoUpdateRequest
|
||||
| WriteVersionUpdateRequest
|
||||
| WriteRepoConfigRequest;
|
||||
|
||||
export type EmbedWorkerRequest =
|
||||
| { type: 'embed'; jobId: string; repositoryId: string; versionId: string | null }
|
||||
| {
|
||||
type: 'write_ack';
|
||||
jobId: string;
|
||||
documentCount?: number;
|
||||
snippetCount?: number;
|
||||
embeddingCount?: number;
|
||||
}
|
||||
| { type: 'write_error'; jobId: string; error: string }
|
||||
| { type: 'shutdown' };
|
||||
|
||||
export type EmbedWorkerResponse =
|
||||
| { type: 'ready' }
|
||||
| { type: 'embed-progress'; jobId: string; done: number; total: number }
|
||||
| { type: 'embed-done'; jobId: string }
|
||||
| { type: 'embed-failed'; jobId: string; error: string };
|
||||
| { type: 'embed-failed'; jobId: string; error: string }
|
||||
| WriteEmbeddingsRequest;
|
||||
|
||||
export type WriteWorkerRequest = WriteRequest | { type: 'shutdown' };
|
||||
export type WriteWorkerRequest =
|
||||
| ReplaceWriteRequest
|
||||
| CloneWriteRequest
|
||||
| JobUpdateWriteRequest
|
||||
| RepoUpdateWriteRequest
|
||||
| VersionUpdateWriteRequest
|
||||
| RepoConfigWriteRequest
|
||||
| EmbeddingsWriteRequest
|
||||
| { type: 'shutdown' };
|
||||
|
||||
export type WriteWorkerResponse =
|
||||
| { type: 'ready' }
|
||||
| WriteAck
|
||||
| WriteError;
|
||||
export type WriteWorkerResponse = { type: 'ready' } | WriteAck | WriteError;
|
||||
|
||||
export interface WorkerInitData {
|
||||
dbPath: string;
|
||||
@@ -58,18 +87,84 @@ export interface SerializedSnippet {
|
||||
createdAt: number;
|
||||
}
|
||||
|
||||
export type WriteRequest = {
|
||||
type: 'write';
|
||||
export interface SerializedEmbedding {
|
||||
snippetId: string;
|
||||
profileId: string;
|
||||
model: string;
|
||||
dimensions: number;
|
||||
embedding: Uint8Array;
|
||||
}
|
||||
|
||||
export type SerializedFieldValue = string | number | null;
|
||||
|
||||
export type SerializedFields = Record<string, SerializedFieldValue>;
|
||||
|
||||
export type ReplaceWriteRequest = {
|
||||
type: 'write_replace';
|
||||
jobId: string;
|
||||
changedDocIds: string[];
|
||||
documents: SerializedDocument[];
|
||||
snippets: SerializedSnippet[];
|
||||
};
|
||||
|
||||
export type CloneWriteRequest = {
|
||||
type: 'write_clone';
|
||||
jobId: string;
|
||||
ancestorVersionId: string;
|
||||
targetVersionId: string;
|
||||
repositoryId: string;
|
||||
unchangedPaths: string[];
|
||||
};
|
||||
|
||||
export type WriteReplaceRequest = ReplaceWriteRequest;
|
||||
|
||||
export type WriteCloneRequest = CloneWriteRequest;
|
||||
|
||||
export type EmbeddingsWriteRequest = {
|
||||
type: 'write_embeddings';
|
||||
jobId: string;
|
||||
embeddings: SerializedEmbedding[];
|
||||
};
|
||||
|
||||
export type RepoUpdateWriteRequest = {
|
||||
type: 'write_repo_update';
|
||||
jobId: string;
|
||||
repositoryId: string;
|
||||
fields: SerializedFields;
|
||||
};
|
||||
|
||||
export type VersionUpdateWriteRequest = {
|
||||
type: 'write_version_update';
|
||||
jobId: string;
|
||||
versionId: string;
|
||||
fields: SerializedFields;
|
||||
};
|
||||
|
||||
export type RepoConfigWriteRequest = {
|
||||
type: 'write_repo_config';
|
||||
jobId: string;
|
||||
repositoryId: string;
|
||||
versionId: string | null;
|
||||
rules: string[];
|
||||
};
|
||||
|
||||
export type JobUpdateWriteRequest = {
|
||||
type: 'write_job_update';
|
||||
jobId: string;
|
||||
fields: SerializedFields;
|
||||
};
|
||||
|
||||
export type WriteEmbeddingsRequest = EmbeddingsWriteRequest;
|
||||
export type WriteRepoUpdateRequest = RepoUpdateWriteRequest;
|
||||
export type WriteVersionUpdateRequest = VersionUpdateWriteRequest;
|
||||
export type WriteRepoConfigRequest = RepoConfigWriteRequest;
|
||||
|
||||
export type WriteAck = {
|
||||
type: 'write_ack';
|
||||
jobId: string;
|
||||
documentCount: number;
|
||||
snippetCount: number;
|
||||
documentCount?: number;
|
||||
snippetCount?: number;
|
||||
embeddingCount?: number;
|
||||
};
|
||||
|
||||
export type WriteError = {
|
||||
|
||||
343
src/lib/server/pipeline/write-operations.ts
Normal file
343
src/lib/server/pipeline/write-operations.ts
Normal file
@@ -0,0 +1,343 @@
|
||||
import { randomUUID } from 'node:crypto';
|
||||
import type Database from 'better-sqlite3';
|
||||
import type { NewDocument, NewSnippet } from '$lib/types';
|
||||
import { SqliteVecStore } from '$lib/server/search/sqlite-vec.store.js';
|
||||
import type {
|
||||
SerializedDocument,
|
||||
SerializedEmbedding,
|
||||
SerializedFields,
|
||||
SerializedSnippet
|
||||
} from './worker-types.js';
|
||||
|
||||
type DocumentLike = Pick<
|
||||
NewDocument,
|
||||
| 'id'
|
||||
| 'repositoryId'
|
||||
| 'versionId'
|
||||
| 'filePath'
|
||||
| 'title'
|
||||
| 'language'
|
||||
| 'tokenCount'
|
||||
| 'checksum'
|
||||
> & {
|
||||
indexedAt: Date | number;
|
||||
};
|
||||
|
||||
type SnippetLike = Pick<
|
||||
NewSnippet,
|
||||
| 'id'
|
||||
| 'documentId'
|
||||
| 'repositoryId'
|
||||
| 'versionId'
|
||||
| 'type'
|
||||
| 'title'
|
||||
| 'content'
|
||||
| 'language'
|
||||
| 'breadcrumb'
|
||||
| 'tokenCount'
|
||||
> & {
|
||||
createdAt: Date | number;
|
||||
};
|
||||
|
||||
export interface CloneFromAncestorRequest {
|
||||
ancestorVersionId: string;
|
||||
targetVersionId: string;
|
||||
repositoryId: string;
|
||||
unchangedPaths: string[];
|
||||
}
|
||||
|
||||
export interface PersistedEmbedding {
|
||||
snippetId: string;
|
||||
profileId: string;
|
||||
model: string;
|
||||
dimensions: number;
|
||||
embedding: Buffer | Uint8Array;
|
||||
}
|
||||
|
||||
function toEpochSeconds(value: Date | number): number {
|
||||
return value instanceof Date ? Math.floor(value.getTime() / 1000) : value;
|
||||
}
|
||||
|
||||
function toSnake(key: string): string {
|
||||
return key.replace(/[A-Z]/g, (char) => `_${char.toLowerCase()}`);
|
||||
}
|
||||
|
||||
function replaceSnippetsInternal(
|
||||
db: Database.Database,
|
||||
changedDocIds: string[],
|
||||
newDocuments: DocumentLike[],
|
||||
newSnippets: SnippetLike[]
|
||||
): void {
|
||||
const sqliteVecStore = new SqliteVecStore(db);
|
||||
const insertDoc = db.prepare(
|
||||
`INSERT INTO documents
|
||||
(id, repository_id, version_id, file_path, title, language,
|
||||
token_count, checksum, indexed_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
||||
);
|
||||
|
||||
const insertSnippet = db.prepare(
|
||||
`INSERT INTO snippets
|
||||
(id, document_id, repository_id, version_id, type, title,
|
||||
content, language, breadcrumb, token_count, created_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
||||
);
|
||||
|
||||
db.transaction(() => {
|
||||
sqliteVecStore.deleteEmbeddingsForDocumentIds(changedDocIds);
|
||||
|
||||
if (changedDocIds.length > 0) {
|
||||
const placeholders = changedDocIds.map(() => '?').join(',');
|
||||
db.prepare(`DELETE FROM documents WHERE id IN (${placeholders})`).run(...changedDocIds);
|
||||
}
|
||||
|
||||
for (const doc of newDocuments) {
|
||||
insertDoc.run(
|
||||
doc.id,
|
||||
doc.repositoryId,
|
||||
doc.versionId ?? null,
|
||||
doc.filePath,
|
||||
doc.title ?? null,
|
||||
doc.language ?? null,
|
||||
doc.tokenCount ?? 0,
|
||||
doc.checksum,
|
||||
toEpochSeconds(doc.indexedAt)
|
||||
);
|
||||
}
|
||||
|
||||
for (const snippet of newSnippets) {
|
||||
insertSnippet.run(
|
||||
snippet.id,
|
||||
snippet.documentId,
|
||||
snippet.repositoryId,
|
||||
snippet.versionId ?? null,
|
||||
snippet.type,
|
||||
snippet.title ?? null,
|
||||
snippet.content,
|
||||
snippet.language ?? null,
|
||||
snippet.breadcrumb ?? null,
|
||||
snippet.tokenCount ?? 0,
|
||||
toEpochSeconds(snippet.createdAt)
|
||||
);
|
||||
}
|
||||
})();
|
||||
}
|
||||
|
||||
export function replaceSnippets(
|
||||
db: Database.Database,
|
||||
changedDocIds: string[],
|
||||
newDocuments: NewDocument[],
|
||||
newSnippets: NewSnippet[]
|
||||
): void {
|
||||
replaceSnippetsInternal(db, changedDocIds, newDocuments, newSnippets);
|
||||
}
|
||||
|
||||
export function replaceSerializedSnippets(
|
||||
db: Database.Database,
|
||||
changedDocIds: string[],
|
||||
documents: SerializedDocument[],
|
||||
snippets: SerializedSnippet[]
|
||||
): void {
|
||||
replaceSnippetsInternal(db, changedDocIds, documents, snippets);
|
||||
}
|
||||
|
||||
export function cloneFromAncestor(db: Database.Database, request: CloneFromAncestorRequest): void {
|
||||
const sqliteVecStore = new SqliteVecStore(db);
|
||||
const { ancestorVersionId, targetVersionId, repositoryId, unchangedPaths } = request;
|
||||
|
||||
db.transaction(() => {
|
||||
const pathList = [...unchangedPaths];
|
||||
if (pathList.length === 0) {
|
||||
return;
|
||||
}
|
||||
|
||||
const placeholders = pathList.map(() => '?').join(',');
|
||||
const ancestorDocs = db
|
||||
.prepare(`SELECT * FROM documents WHERE version_id = ? AND file_path IN (${placeholders})`)
|
||||
.all(ancestorVersionId, ...pathList) as Array<{
|
||||
id: string;
|
||||
repository_id: string;
|
||||
file_path: string;
|
||||
title: string | null;
|
||||
language: string | null;
|
||||
token_count: number;
|
||||
checksum: string;
|
||||
indexed_at: number;
|
||||
}>;
|
||||
|
||||
const docIdMap = new Map<string, string>();
|
||||
const nowEpoch = Math.floor(Date.now() / 1000);
|
||||
|
||||
for (const doc of ancestorDocs) {
|
||||
const newDocId = randomUUID();
|
||||
docIdMap.set(doc.id, newDocId);
|
||||
db.prepare(
|
||||
`INSERT INTO documents (id, repository_id, version_id, file_path, title, language, token_count, checksum, indexed_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
||||
).run(
|
||||
newDocId,
|
||||
repositoryId,
|
||||
targetVersionId,
|
||||
doc.file_path,
|
||||
doc.title,
|
||||
doc.language,
|
||||
doc.token_count,
|
||||
doc.checksum,
|
||||
nowEpoch
|
||||
);
|
||||
}
|
||||
|
||||
if (docIdMap.size === 0) return;
|
||||
|
||||
const oldDocIds = [...docIdMap.keys()];
|
||||
const snippetPlaceholders = oldDocIds.map(() => '?').join(',');
|
||||
const ancestorSnippets = db
|
||||
.prepare(`SELECT * FROM snippets WHERE document_id IN (${snippetPlaceholders})`)
|
||||
.all(...oldDocIds) as Array<{
|
||||
id: string;
|
||||
document_id: string;
|
||||
repository_id: string;
|
||||
version_id: string | null;
|
||||
type: string;
|
||||
title: string | null;
|
||||
content: string;
|
||||
language: string | null;
|
||||
breadcrumb: string | null;
|
||||
token_count: number;
|
||||
created_at: number;
|
||||
}>;
|
||||
|
||||
const snippetIdMap = new Map<string, string>();
|
||||
for (const snippet of ancestorSnippets) {
|
||||
const newSnippetId = randomUUID();
|
||||
snippetIdMap.set(snippet.id, newSnippetId);
|
||||
const newDocId = docIdMap.get(snippet.document_id)!;
|
||||
db.prepare(
|
||||
`INSERT INTO snippets (id, document_id, repository_id, version_id, type, title, content, language, breadcrumb, token_count, created_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
||||
).run(
|
||||
newSnippetId,
|
||||
newDocId,
|
||||
repositoryId,
|
||||
targetVersionId,
|
||||
snippet.type,
|
||||
snippet.title,
|
||||
snippet.content,
|
||||
snippet.language,
|
||||
snippet.breadcrumb,
|
||||
snippet.token_count,
|
||||
snippet.created_at
|
||||
);
|
||||
}
|
||||
|
||||
if (snippetIdMap.size === 0) {
|
||||
return;
|
||||
}
|
||||
|
||||
const oldSnippetIds = [...snippetIdMap.keys()];
|
||||
const embPlaceholders = oldSnippetIds.map(() => '?').join(',');
|
||||
const ancestorEmbeddings = db
|
||||
.prepare(`SELECT * FROM snippet_embeddings WHERE snippet_id IN (${embPlaceholders})`)
|
||||
.all(...oldSnippetIds) as Array<{
|
||||
snippet_id: string;
|
||||
profile_id: string;
|
||||
model: string;
|
||||
dimensions: number;
|
||||
embedding: Buffer;
|
||||
created_at: number;
|
||||
}>;
|
||||
|
||||
for (const emb of ancestorEmbeddings) {
|
||||
const newSnippetId = snippetIdMap.get(emb.snippet_id)!;
|
||||
db.prepare(
|
||||
`INSERT INTO snippet_embeddings (snippet_id, profile_id, model, dimensions, embedding, created_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?)`
|
||||
).run(newSnippetId, emb.profile_id, emb.model, emb.dimensions, emb.embedding, emb.created_at);
|
||||
sqliteVecStore.upsertEmbeddingBuffer(
|
||||
emb.profile_id,
|
||||
newSnippetId,
|
||||
emb.embedding,
|
||||
emb.dimensions
|
||||
);
|
||||
}
|
||||
})();
|
||||
}
|
||||
|
||||
export function upsertEmbeddings(db: Database.Database, embeddings: PersistedEmbedding[]): void {
|
||||
if (embeddings.length === 0) {
|
||||
return;
|
||||
}
|
||||
|
||||
const sqliteVecStore = new SqliteVecStore(db);
|
||||
const insert = db.prepare<[string, string, string, number, Buffer]>(`
|
||||
INSERT OR REPLACE INTO snippet_embeddings (snippet_id, profile_id, model, dimensions, embedding, created_at)
|
||||
VALUES (?, ?, ?, ?, ?, unixepoch())
|
||||
`);
|
||||
|
||||
db.transaction(() => {
|
||||
for (const item of embeddings) {
|
||||
const embeddingBuffer = Buffer.isBuffer(item.embedding)
|
||||
? item.embedding
|
||||
: Buffer.from(item.embedding);
|
||||
|
||||
insert.run(item.snippetId, item.profileId, item.model, item.dimensions, embeddingBuffer);
|
||||
|
||||
sqliteVecStore.upsertEmbeddingBuffer(
|
||||
item.profileId,
|
||||
item.snippetId,
|
||||
embeddingBuffer,
|
||||
item.dimensions
|
||||
);
|
||||
}
|
||||
})();
|
||||
}
|
||||
|
||||
export function upsertSerializedEmbeddings(
|
||||
db: Database.Database,
|
||||
embeddings: SerializedEmbedding[]
|
||||
): void {
|
||||
upsertEmbeddings(
|
||||
db,
|
||||
embeddings.map((item) => ({
|
||||
snippetId: item.snippetId,
|
||||
profileId: item.profileId,
|
||||
model: item.model,
|
||||
dimensions: item.dimensions,
|
||||
embedding: item.embedding
|
||||
}))
|
||||
);
|
||||
}
|
||||
|
||||
export function updateRepo(
|
||||
db: Database.Database,
|
||||
repositoryId: string,
|
||||
fields: SerializedFields
|
||||
): void {
|
||||
const now = Math.floor(Date.now() / 1000);
|
||||
const allFields = { ...fields, updatedAt: now };
|
||||
const sets = Object.keys(allFields)
|
||||
.map((key) => `${toSnake(key)} = ?`)
|
||||
.join(', ');
|
||||
const values = [...Object.values(allFields), repositoryId];
|
||||
db.prepare(`UPDATE repositories SET ${sets} WHERE id = ?`).run(...values);
|
||||
}
|
||||
|
||||
export function updateJob(db: Database.Database, jobId: string, fields: SerializedFields): void {
|
||||
const sets = Object.keys(fields)
|
||||
.map((key) => `${toSnake(key)} = ?`)
|
||||
.join(', ');
|
||||
const values = [...Object.values(fields), jobId];
|
||||
db.prepare(`UPDATE indexing_jobs SET ${sets} WHERE id = ?`).run(...values);
|
||||
}
|
||||
|
||||
export function updateVersion(
|
||||
db: Database.Database,
|
||||
versionId: string,
|
||||
fields: SerializedFields
|
||||
): void {
|
||||
const sets = Object.keys(fields)
|
||||
.map((key) => `${toSnake(key)} = ?`)
|
||||
.join(', ');
|
||||
const values = [...Object.values(fields), versionId];
|
||||
db.prepare(`UPDATE repository_versions SET ${sets} WHERE id = ?`).run(...values);
|
||||
}
|
||||
@@ -1,67 +1,21 @@
|
||||
import { workerData, parentPort } from 'node:worker_threads';
|
||||
import Database from 'better-sqlite3';
|
||||
import type {
|
||||
SerializedDocument,
|
||||
SerializedSnippet,
|
||||
WorkerInitData,
|
||||
WriteWorkerRequest,
|
||||
WriteWorkerResponse
|
||||
} from './worker-types.js';
|
||||
import { applySqlitePragmas } from '$lib/server/db/connection.js';
|
||||
import { loadSqliteVec } from '$lib/server/db/sqlite-vec.js';
|
||||
import type { WorkerInitData, WriteWorkerRequest, WriteWorkerResponse } from './worker-types.js';
|
||||
import {
|
||||
cloneFromAncestor,
|
||||
replaceSerializedSnippets,
|
||||
updateJob,
|
||||
updateRepo,
|
||||
updateVersion,
|
||||
upsertSerializedEmbeddings
|
||||
} from './write-operations.js';
|
||||
|
||||
const { dbPath } = workerData as WorkerInitData;
|
||||
const db = new Database(dbPath);
|
||||
db.pragma('journal_mode = WAL');
|
||||
db.pragma('foreign_keys = ON');
|
||||
db.pragma('busy_timeout = 5000');
|
||||
db.pragma('synchronous = NORMAL');
|
||||
db.pragma('cache_size = -65536');
|
||||
db.pragma('temp_store = MEMORY');
|
||||
db.pragma('mmap_size = 268435456');
|
||||
db.pragma('wal_autocheckpoint = 1000');
|
||||
|
||||
const insertDocument = db.prepare(
|
||||
`INSERT OR REPLACE INTO documents
|
||||
(id, repository_id, version_id, file_path, title, language, token_count, checksum, indexed_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
||||
);
|
||||
|
||||
const insertSnippet = db.prepare(
|
||||
`INSERT OR REPLACE INTO snippets
|
||||
(id, document_id, repository_id, version_id, type, title, content, language, breadcrumb, token_count, created_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
||||
);
|
||||
|
||||
const writeBatch = db.transaction((documents: SerializedDocument[], snippets: SerializedSnippet[]) => {
|
||||
for (const document of documents) {
|
||||
insertDocument.run(
|
||||
document.id,
|
||||
document.repositoryId,
|
||||
document.versionId,
|
||||
document.filePath,
|
||||
document.title,
|
||||
document.language,
|
||||
document.tokenCount,
|
||||
document.checksum,
|
||||
document.indexedAt
|
||||
);
|
||||
}
|
||||
|
||||
for (const snippet of snippets) {
|
||||
insertSnippet.run(
|
||||
snippet.id,
|
||||
snippet.documentId,
|
||||
snippet.repositoryId,
|
||||
snippet.versionId,
|
||||
snippet.type,
|
||||
snippet.title,
|
||||
snippet.content,
|
||||
snippet.language,
|
||||
snippet.breadcrumb,
|
||||
snippet.tokenCount,
|
||||
snippet.createdAt
|
||||
);
|
||||
}
|
||||
});
|
||||
applySqlitePragmas(db);
|
||||
loadSqliteVec(db);
|
||||
|
||||
parentPort?.postMessage({ type: 'ready' } satisfies WriteWorkerResponse);
|
||||
|
||||
@@ -71,23 +25,145 @@ parentPort?.on('message', (msg: WriteWorkerRequest) => {
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
if (msg.type !== 'write') {
|
||||
if (msg.type === 'write_replace') {
|
||||
try {
|
||||
replaceSerializedSnippets(db, msg.changedDocIds, msg.documents, msg.snippets);
|
||||
parentPort?.postMessage({
|
||||
type: 'write_ack',
|
||||
jobId: msg.jobId,
|
||||
documentCount: msg.documents.length,
|
||||
snippetCount: msg.snippets.length
|
||||
} satisfies WriteWorkerResponse);
|
||||
} catch (error) {
|
||||
parentPort?.postMessage({
|
||||
type: 'write_error',
|
||||
jobId: msg.jobId,
|
||||
error: error instanceof Error ? error.message : String(error)
|
||||
} satisfies WriteWorkerResponse);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
try {
|
||||
writeBatch(msg.documents, msg.snippets);
|
||||
parentPort?.postMessage({
|
||||
type: 'write_ack',
|
||||
jobId: msg.jobId,
|
||||
documentCount: msg.documents.length,
|
||||
snippetCount: msg.snippets.length
|
||||
} satisfies WriteWorkerResponse);
|
||||
} catch (error) {
|
||||
parentPort?.postMessage({
|
||||
type: 'write_error',
|
||||
jobId: msg.jobId,
|
||||
error: error instanceof Error ? error.message : String(error)
|
||||
} satisfies WriteWorkerResponse);
|
||||
if (msg.type === 'write_clone') {
|
||||
try {
|
||||
cloneFromAncestor(db, {
|
||||
ancestorVersionId: msg.ancestorVersionId,
|
||||
targetVersionId: msg.targetVersionId,
|
||||
repositoryId: msg.repositoryId,
|
||||
unchangedPaths: msg.unchangedPaths
|
||||
});
|
||||
parentPort?.postMessage({
|
||||
type: 'write_ack',
|
||||
jobId: msg.jobId
|
||||
} satisfies WriteWorkerResponse);
|
||||
} catch (error) {
|
||||
parentPort?.postMessage({
|
||||
type: 'write_error',
|
||||
jobId: msg.jobId,
|
||||
error: error instanceof Error ? error.message : String(error)
|
||||
} satisfies WriteWorkerResponse);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === 'write_embeddings') {
|
||||
try {
|
||||
upsertSerializedEmbeddings(db, msg.embeddings);
|
||||
parentPort?.postMessage({
|
||||
type: 'write_ack',
|
||||
jobId: msg.jobId,
|
||||
embeddingCount: msg.embeddings.length
|
||||
} satisfies WriteWorkerResponse);
|
||||
} catch (error) {
|
||||
parentPort?.postMessage({
|
||||
type: 'write_error',
|
||||
jobId: msg.jobId,
|
||||
error: error instanceof Error ? error.message : String(error)
|
||||
} satisfies WriteWorkerResponse);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === 'write_job_update') {
|
||||
try {
|
||||
updateJob(db, msg.jobId, msg.fields);
|
||||
parentPort?.postMessage({
|
||||
type: 'write_ack',
|
||||
jobId: msg.jobId
|
||||
} satisfies WriteWorkerResponse);
|
||||
} catch (error) {
|
||||
parentPort?.postMessage({
|
||||
type: 'write_error',
|
||||
jobId: msg.jobId,
|
||||
error: error instanceof Error ? error.message : String(error)
|
||||
} satisfies WriteWorkerResponse);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === 'write_repo_update') {
|
||||
try {
|
||||
updateRepo(db, msg.repositoryId, msg.fields);
|
||||
parentPort?.postMessage({
|
||||
type: 'write_ack',
|
||||
jobId: msg.jobId
|
||||
} satisfies WriteWorkerResponse);
|
||||
} catch (error) {
|
||||
parentPort?.postMessage({
|
||||
type: 'write_error',
|
||||
jobId: msg.jobId,
|
||||
error: error instanceof Error ? error.message : String(error)
|
||||
} satisfies WriteWorkerResponse);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === 'write_version_update') {
|
||||
try {
|
||||
updateVersion(db, msg.versionId, msg.fields);
|
||||
parentPort?.postMessage({
|
||||
type: 'write_ack',
|
||||
jobId: msg.jobId
|
||||
} satisfies WriteWorkerResponse);
|
||||
} catch (error) {
|
||||
parentPort?.postMessage({
|
||||
type: 'write_error',
|
||||
jobId: msg.jobId,
|
||||
error: error instanceof Error ? error.message : String(error)
|
||||
} satisfies WriteWorkerResponse);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === 'write_repo_config') {
|
||||
try {
|
||||
const now = Math.floor(Date.now() / 1000);
|
||||
if (msg.versionId === null) {
|
||||
db.prepare(
|
||||
`DELETE FROM repository_configs WHERE repository_id = ? AND version_id IS NULL`
|
||||
).run(msg.repositoryId);
|
||||
} else {
|
||||
db.prepare(`DELETE FROM repository_configs WHERE repository_id = ? AND version_id = ?`).run(
|
||||
msg.repositoryId,
|
||||
msg.versionId
|
||||
);
|
||||
}
|
||||
|
||||
db.prepare(
|
||||
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
|
||||
VALUES (?, ?, ?, ?)`
|
||||
).run(msg.repositoryId, msg.versionId, JSON.stringify(msg.rules), now);
|
||||
|
||||
parentPort?.postMessage({
|
||||
type: 'write_ack',
|
||||
jobId: msg.jobId
|
||||
} satisfies WriteWorkerResponse);
|
||||
} catch (error) {
|
||||
parentPort?.postMessage({
|
||||
type: 'write_error',
|
||||
jobId: msg.jobId,
|
||||
error: error instanceof Error ? error.message : String(error)
|
||||
} satisfies WriteWorkerResponse);
|
||||
}
|
||||
}
|
||||
});
|
||||
@@ -383,7 +383,18 @@ describe('VectorSearch', () => {
|
||||
`INSERT INTO embedding_profiles (id, provider_kind, title, enabled, is_default, model, dimensions, config, created_at, updated_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
||||
)
|
||||
.run('secondary-profile', 'local-transformers', 'Secondary', 1, 0, 'test-model', 2, '{}', NOW_S, NOW_S);
|
||||
.run(
|
||||
'secondary-profile',
|
||||
'local-transformers',
|
||||
'Secondary',
|
||||
1,
|
||||
0,
|
||||
'test-model',
|
||||
2,
|
||||
'{}',
|
||||
NOW_S,
|
||||
NOW_S
|
||||
);
|
||||
|
||||
const defaultSnippet = seedSnippet(client, {
|
||||
repositoryId: repoId,
|
||||
|
||||
@@ -90,17 +90,18 @@ export class SqliteVecStore {
|
||||
this.ensureProfileStore(profileId, tables.dimensions);
|
||||
|
||||
const existingRow = this.db
|
||||
.prepare<[string], SnippetRowidRow>(
|
||||
`SELECT rowid FROM ${tables.quotedRowidTableName} WHERE snippet_id = ?`
|
||||
)
|
||||
.prepare<
|
||||
[string],
|
||||
SnippetRowidRow
|
||||
>(`SELECT rowid FROM ${tables.quotedRowidTableName} WHERE snippet_id = ?`)
|
||||
.get(snippetId);
|
||||
|
||||
const embeddingBuffer = toEmbeddingBuffer(embedding);
|
||||
if (existingRow) {
|
||||
this.db
|
||||
.prepare<[Buffer, number]>(
|
||||
`UPDATE ${tables.quotedVectorTableName} SET embedding = ? WHERE rowid = ?`
|
||||
)
|
||||
.prepare<
|
||||
[Buffer, number]
|
||||
>(`UPDATE ${tables.quotedVectorTableName} SET embedding = ? WHERE rowid = ?`)
|
||||
.run(embeddingBuffer, existingRow.rowid);
|
||||
return;
|
||||
}
|
||||
@@ -109,9 +110,9 @@ export class SqliteVecStore {
|
||||
.prepare<[Buffer]>(`INSERT INTO ${tables.quotedVectorTableName} (embedding) VALUES (?)`)
|
||||
.run(embeddingBuffer);
|
||||
this.db
|
||||
.prepare<[number, string]>(
|
||||
`INSERT INTO ${tables.quotedRowidTableName} (rowid, snippet_id) VALUES (?, ?)`
|
||||
)
|
||||
.prepare<
|
||||
[number, string]
|
||||
>(`INSERT INTO ${tables.quotedRowidTableName} (rowid, snippet_id) VALUES (?, ?)`)
|
||||
.run(Number(insertResult.lastInsertRowid), snippetId);
|
||||
}
|
||||
|
||||
@@ -134,9 +135,10 @@ export class SqliteVecStore {
|
||||
this.ensureProfileStore(profileId);
|
||||
|
||||
const existingRow = this.db
|
||||
.prepare<[string], SnippetRowidRow>(
|
||||
`SELECT rowid FROM ${tables.quotedRowidTableName} WHERE snippet_id = ?`
|
||||
)
|
||||
.prepare<
|
||||
[string],
|
||||
SnippetRowidRow
|
||||
>(`SELECT rowid FROM ${tables.quotedRowidTableName} WHERE snippet_id = ?`)
|
||||
.get(snippetId);
|
||||
|
||||
if (!existingRow) {
|
||||
@@ -280,11 +282,7 @@ export class SqliteVecStore {
|
||||
this.upsertEmbedding(
|
||||
profileId,
|
||||
row.snippet_id,
|
||||
new Float32Array(
|
||||
row.embedding.buffer,
|
||||
row.embedding.byteOffset,
|
||||
tables.dimensions
|
||||
)
|
||||
new Float32Array(row.embedding.buffer, row.embedding.byteOffset, tables.dimensions)
|
||||
);
|
||||
}
|
||||
});
|
||||
@@ -323,9 +321,10 @@ export class SqliteVecStore {
|
||||
loadSqliteVec(this.db);
|
||||
|
||||
const dimensionsRow = this.db
|
||||
.prepare<[string], ProfileDimensionsRow>(
|
||||
'SELECT dimensions FROM embedding_profiles WHERE id = ?'
|
||||
)
|
||||
.prepare<
|
||||
[string],
|
||||
ProfileDimensionsRow
|
||||
>('SELECT dimensions FROM embedding_profiles WHERE id = ?')
|
||||
.get(profileId);
|
||||
if (!dimensionsRow) {
|
||||
throw new Error(`Embedding profile not found: ${profileId}`);
|
||||
@@ -377,10 +376,7 @@ export class SqliteVecStore {
|
||||
throw new Error(`Stored embedding dimensions are missing for profile ${profileId}`);
|
||||
}
|
||||
|
||||
if (
|
||||
preferredDimensions !== undefined &&
|
||||
preferredDimensions !== canonicalDimensions
|
||||
) {
|
||||
if (preferredDimensions !== undefined && preferredDimensions !== canonicalDimensions) {
|
||||
throw new Error(
|
||||
`Embedding dimension mismatch for profile ${profileId}: expected ${canonicalDimensions}, received ${preferredDimensions}`
|
||||
);
|
||||
|
||||
@@ -1,6 +1,9 @@
|
||||
import type Database from 'better-sqlite3';
|
||||
import type { EmbeddingSettingsUpdateDto } from '$lib/dtos/embedding-settings.js';
|
||||
import { createProviderFromProfile, getDefaultLocalProfile } from '$lib/server/embeddings/registry.js';
|
||||
import {
|
||||
createProviderFromProfile,
|
||||
getDefaultLocalProfile
|
||||
} from '$lib/server/embeddings/registry.js';
|
||||
import { EmbeddingProfileMapper } from '$lib/server/mappers/embedding-profile.mapper.js';
|
||||
import { EmbeddingProfile, EmbeddingProfileEntity } from '$lib/server/models/embedding-profile.js';
|
||||
import { EmbeddingSettings } from '$lib/server/models/embedding-settings.js';
|
||||
@@ -94,7 +97,10 @@ export class EmbeddingSettingsService {
|
||||
private getCreatedAt(id: string, fallback: number): number {
|
||||
return (
|
||||
this.db
|
||||
.prepare<[string], { created_at: number }>('SELECT created_at FROM embedding_profiles WHERE id = ?')
|
||||
.prepare<
|
||||
[string],
|
||||
{ created_at: number }
|
||||
>('SELECT created_at FROM embedding_profiles WHERE id = ?')
|
||||
.get(id)?.created_at ?? fallback
|
||||
);
|
||||
}
|
||||
|
||||
@@ -11,7 +11,11 @@ import Database from 'better-sqlite3';
|
||||
import { readFileSync } from 'node:fs';
|
||||
import { join } from 'node:path';
|
||||
import { RepositoryService } from './repository.service';
|
||||
import { loadSqliteVec, sqliteVecRowidTableName, sqliteVecTableName } from '$lib/server/db/sqlite-vec.js';
|
||||
import {
|
||||
loadSqliteVec,
|
||||
sqliteVecRowidTableName,
|
||||
sqliteVecTableName
|
||||
} from '$lib/server/db/sqlite-vec.js';
|
||||
import { SqliteVecStore } from '$lib/server/search/sqlite-vec.store.js';
|
||||
import {
|
||||
AlreadyExistsError,
|
||||
@@ -465,7 +469,11 @@ describe('RepositoryService.getIndexSummary()', () => {
|
||||
beforeEach(() => {
|
||||
client = createTestDb();
|
||||
service = makeService(client);
|
||||
service.add({ source: 'github', sourceUrl: 'https://github.com/facebook/react', branch: 'main' });
|
||||
service.add({
|
||||
source: 'github',
|
||||
sourceUrl: 'https://github.com/facebook/react',
|
||||
branch: 'main'
|
||||
});
|
||||
});
|
||||
|
||||
it('returns embedding counts and indexed version labels', () => {
|
||||
|
||||
@@ -10,7 +10,11 @@ import { describe, it, expect } from 'vitest';
|
||||
import Database from 'better-sqlite3';
|
||||
import { readFileSync } from 'node:fs';
|
||||
import { join } from 'node:path';
|
||||
import { loadSqliteVec, sqliteVecRowidTableName, sqliteVecTableName } from '$lib/server/db/sqlite-vec.js';
|
||||
import {
|
||||
loadSqliteVec,
|
||||
sqliteVecRowidTableName,
|
||||
sqliteVecTableName
|
||||
} from '$lib/server/db/sqlite-vec.js';
|
||||
import { SqliteVecStore } from '$lib/server/search/sqlite-vec.store.js';
|
||||
import { VersionService } from './version.service';
|
||||
import { RepositoryService } from './repository.service';
|
||||
@@ -206,18 +210,24 @@ describe('VersionService.remove()', () => {
|
||||
const now = Math.floor(Date.now() / 1000);
|
||||
const vecStore = new SqliteVecStore(client);
|
||||
|
||||
client.prepare(
|
||||
`INSERT INTO documents (id, repository_id, version_id, file_path, checksum, indexed_at)
|
||||
client
|
||||
.prepare(
|
||||
`INSERT INTO documents (id, repository_id, version_id, file_path, checksum, indexed_at)
|
||||
VALUES (?, '/facebook/react', ?, 'README.md', 'version-doc', ?)`
|
||||
).run(docId, version.id, now);
|
||||
client.prepare(
|
||||
`INSERT INTO snippets (id, document_id, repository_id, version_id, type, content, created_at)
|
||||
)
|
||||
.run(docId, version.id, now);
|
||||
client
|
||||
.prepare(
|
||||
`INSERT INTO snippets (id, document_id, repository_id, version_id, type, content, created_at)
|
||||
VALUES (?, ?, '/facebook/react', ?, 'info', 'version snippet', ?)`
|
||||
).run(snippetId, docId, version.id, now);
|
||||
client.prepare(
|
||||
`INSERT INTO snippet_embeddings (snippet_id, profile_id, model, dimensions, embedding, created_at)
|
||||
)
|
||||
.run(snippetId, docId, version.id, now);
|
||||
client
|
||||
.prepare(
|
||||
`INSERT INTO snippet_embeddings (snippet_id, profile_id, model, dimensions, embedding, created_at)
|
||||
VALUES (?, 'local-default', 'test-model', 3, ?, ?)`
|
||||
).run(snippetId, Buffer.from(embedding.buffer), now);
|
||||
)
|
||||
.run(snippetId, Buffer.from(embedding.buffer), now);
|
||||
vecStore.upsertEmbedding('local-default', snippetId, embedding);
|
||||
|
||||
versionService.remove('/facebook/react', 'v18.3.0');
|
||||
|
||||
@@ -9,7 +9,10 @@ import { RepositoryVersion } from '$lib/server/models/repository-version.js';
|
||||
// Helpers
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
function makeVersion(tag: string, state: RepositoryVersion['state'] = 'indexed'): RepositoryVersion {
|
||||
function makeVersion(
|
||||
tag: string,
|
||||
state: RepositoryVersion['state'] = 'indexed'
|
||||
): RepositoryVersion {
|
||||
return new RepositoryVersion({
|
||||
id: `/facebook/react/${tag}`,
|
||||
repositoryId: '/facebook/react',
|
||||
@@ -42,21 +45,13 @@ describe('findBestAncestorVersion', () => {
|
||||
});
|
||||
|
||||
it('returns the nearest semver predecessor from a list', () => {
|
||||
const candidates = [
|
||||
makeVersion('v1.0.0'),
|
||||
makeVersion('v1.1.0'),
|
||||
makeVersion('v2.0.0')
|
||||
];
|
||||
const candidates = [makeVersion('v1.0.0'), makeVersion('v1.1.0'), makeVersion('v2.0.0')];
|
||||
const result = findBestAncestorVersion('v2.1.0', candidates);
|
||||
expect(result?.tag).toBe('v2.0.0');
|
||||
});
|
||||
|
||||
it('handles v-prefix stripping correctly', () => {
|
||||
const candidates = [
|
||||
makeVersion('v1.0.0'),
|
||||
makeVersion('v1.5.0'),
|
||||
makeVersion('v2.0.0')
|
||||
];
|
||||
const candidates = [makeVersion('v1.0.0'), makeVersion('v1.5.0'), makeVersion('v2.0.0')];
|
||||
const result = findBestAncestorVersion('v2.0.1', candidates);
|
||||
expect(result?.tag).toBe('v2.0.0');
|
||||
});
|
||||
|
||||
@@ -31,7 +31,16 @@ export type RepositorySource = 'github' | 'local';
|
||||
export type RepositoryState = 'pending' | 'indexing' | 'indexed' | 'error';
|
||||
export type SnippetType = 'code' | 'info';
|
||||
export type JobStatus = 'queued' | 'running' | 'done' | 'failed';
|
||||
export type IndexingStage = 'queued' | 'differential' | 'crawling' | 'cloning' | 'parsing' | 'storing' | 'embedding' | 'done' | 'failed';
|
||||
export type IndexingStage =
|
||||
| 'queued'
|
||||
| 'differential'
|
||||
| 'crawling'
|
||||
| 'cloning'
|
||||
| 'parsing'
|
||||
| 'storing'
|
||||
| 'embedding'
|
||||
| 'done'
|
||||
| 'failed';
|
||||
export type VersionState = 'pending' | 'indexing' | 'indexed' | 'error';
|
||||
export type EmbeddingProviderKind = 'local-transformers' | 'openai-compatible';
|
||||
|
||||
|
||||
@@ -38,6 +38,9 @@
|
||||
<a href={resolveRoute('/search')} class="text-sm text-gray-600 hover:text-gray-900">
|
||||
Search
|
||||
</a>
|
||||
<a href={resolveRoute('/admin/jobs')} class="text-sm text-gray-600 hover:text-gray-900">
|
||||
Admin
|
||||
</a>
|
||||
<a href={resolveRoute('/settings')} class="text-sm text-gray-600 hover:text-gray-900">
|
||||
Settings
|
||||
</a>
|
||||
|
||||
@@ -95,7 +95,10 @@
|
||||
}
|
||||
|
||||
function filtersDirty(): boolean {
|
||||
return repositoryInput.trim() !== appliedRepositoryFilter || !sameStatuses(selectedStatuses, appliedStatuses);
|
||||
return (
|
||||
repositoryInput.trim() !== appliedRepositoryFilter ||
|
||||
!sameStatuses(selectedStatuses, appliedStatuses)
|
||||
);
|
||||
}
|
||||
|
||||
function isSpecificRepositoryId(repositoryId: string): boolean {
|
||||
@@ -107,7 +110,8 @@
|
||||
const repositoryFilter = appliedRepositoryFilter;
|
||||
const repositoryMatches = isSpecificRepositoryId(repositoryFilter)
|
||||
? job.repositoryId === repositoryFilter
|
||||
: job.repositoryId === repositoryFilter || job.repositoryId.startsWith(`${repositoryFilter}/`);
|
||||
: job.repositoryId === repositoryFilter ||
|
||||
job.repositoryId.startsWith(`${repositoryFilter}/`);
|
||||
|
||||
if (!repositoryMatches) {
|
||||
return false;
|
||||
@@ -199,8 +203,8 @@
|
||||
selectedStatuses = selectedStatuses.includes(status)
|
||||
? selectedStatuses.filter((candidate) => candidate !== status)
|
||||
: [...selectedStatuses, status].sort(
|
||||
(left, right) => filterStatuses.indexOf(left) - filterStatuses.indexOf(right)
|
||||
);
|
||||
(left, right) => filterStatuses.indexOf(left) - filterStatuses.indexOf(right)
|
||||
);
|
||||
}
|
||||
|
||||
function applyFilters(event?: SubmitEvent) {
|
||||
@@ -316,7 +320,10 @@
|
||||
|
||||
<WorkerStatusPanel />
|
||||
|
||||
<form class="mb-6 rounded-lg border border-gray-200 bg-white p-4 shadow-sm" onsubmit={applyFilters}>
|
||||
<form
|
||||
class="mb-6 rounded-lg border border-gray-200 bg-white p-4 shadow-sm"
|
||||
onsubmit={applyFilters}
|
||||
>
|
||||
<div class="flex flex-col gap-4 lg:flex-row lg:items-end lg:justify-between">
|
||||
<div class="flex-1">
|
||||
<label class="mb-2 block text-sm font-medium text-gray-700" for="repository-filter">
|
||||
@@ -327,10 +334,11 @@
|
||||
type="text"
|
||||
bind:value={repositoryInput}
|
||||
placeholder="/owner or /owner/repo"
|
||||
class="w-full rounded-md border border-gray-300 px-3 py-2 text-sm text-gray-900 shadow-sm focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-200"
|
||||
class="w-full rounded-md border border-gray-300 px-3 py-2 text-sm text-gray-900 shadow-sm focus:border-blue-500 focus:ring-2 focus:ring-blue-200 focus:outline-none"
|
||||
/>
|
||||
<p class="mt-2 text-xs text-gray-500">
|
||||
Use an owner prefix like <code>/facebook</code> or a full repository ID like <code>/facebook/react</code>.
|
||||
Use an owner prefix like <code>/facebook</code> or a full repository ID like
|
||||
<code>/facebook/react</code>.
|
||||
</p>
|
||||
</div>
|
||||
|
||||
@@ -341,7 +349,9 @@
|
||||
<button
|
||||
type="button"
|
||||
onclick={() => toggleStatusFilter(status)}
|
||||
class="rounded-full border px-3 py-1 text-xs font-semibold uppercase transition {selectedStatuses.includes(status)
|
||||
class="rounded-full border px-3 py-1 text-xs font-semibold uppercase transition {selectedStatuses.includes(
|
||||
status
|
||||
)
|
||||
? 'border-blue-600 bg-blue-50 text-blue-700'
|
||||
: 'border-gray-300 text-gray-600 hover:border-gray-400 hover:text-gray-900'}"
|
||||
>
|
||||
@@ -370,7 +380,9 @@
|
||||
</div>
|
||||
</form>
|
||||
|
||||
<div class="mb-4 flex flex-col gap-2 text-sm text-gray-600 md:flex-row md:items-center md:justify-between">
|
||||
<div
|
||||
class="mb-4 flex flex-col gap-2 text-sm text-gray-600 md:flex-row md:items-center md:justify-between"
|
||||
>
|
||||
<p>
|
||||
Showing <span class="font-semibold text-gray-900">{jobs.length}</span> of
|
||||
<span class="font-semibold text-gray-900">{total}</span> jobs
|
||||
@@ -444,103 +456,105 @@
|
||||
<JobSkeleton rows={6} />
|
||||
{:else}
|
||||
{#each jobs as job (job.id)}
|
||||
<tr class="hover:bg-gray-50">
|
||||
<td class="px-6 py-4 text-sm font-medium whitespace-nowrap text-gray-900">
|
||||
{job.repositoryId}
|
||||
{#if job.versionId}
|
||||
<span class="ml-1 text-xs text-gray-500">@{job.versionId}</span>
|
||||
{/if}
|
||||
<div class="mt-1 text-xs text-gray-400">{job.id}</div>
|
||||
</td>
|
||||
<td class="px-6 py-4 text-sm whitespace-nowrap text-gray-500">
|
||||
<JobStatusBadge status={job.status} spinning={job.status === 'running'} />
|
||||
</td>
|
||||
<td class="px-6 py-4 text-sm whitespace-nowrap text-gray-500">
|
||||
<div class="flex items-center gap-2">
|
||||
<span>{getStageLabel(job.stage)}</span>
|
||||
{#if job.stageDetail}
|
||||
<span class="text-xs text-gray-400">{job.stageDetail}</span>
|
||||
<tr class="hover:bg-gray-50">
|
||||
<td class="px-6 py-4 text-sm font-medium whitespace-nowrap text-gray-900">
|
||||
{job.repositoryId}
|
||||
{#if job.versionId}
|
||||
<span class="ml-1 text-xs text-gray-500">@{job.versionId}</span>
|
||||
{/if}
|
||||
</div>
|
||||
</td>
|
||||
<td class="px-6 py-4 text-sm whitespace-nowrap text-gray-500">
|
||||
<div class="space-y-2">
|
||||
<div class="mt-1 text-xs text-gray-400">{job.id}</div>
|
||||
</td>
|
||||
<td class="px-6 py-4 text-sm whitespace-nowrap text-gray-500">
|
||||
<JobStatusBadge status={job.status} spinning={job.status === 'running'} />
|
||||
</td>
|
||||
<td class="px-6 py-4 text-sm whitespace-nowrap text-gray-500">
|
||||
<div class="flex items-center gap-2">
|
||||
<span class="w-12 text-right text-xs font-semibold text-gray-600">{job.progress}%</span>
|
||||
<div class="h-2 w-32 rounded-full bg-gray-200">
|
||||
<div
|
||||
class="h-2 rounded-full bg-blue-600 transition-all"
|
||||
style="width: {job.progress}%"
|
||||
></div>
|
||||
</div>
|
||||
<span>{getStageLabel(job.stage)}</span>
|
||||
{#if job.stageDetail}
|
||||
<span class="text-xs text-gray-400">{job.stageDetail}</span>
|
||||
{/if}
|
||||
</div>
|
||||
{#if job.totalFiles > 0}
|
||||
<div class="text-xs text-gray-400">
|
||||
{job.processedFiles}/{job.totalFiles} files processed
|
||||
</td>
|
||||
<td class="px-6 py-4 text-sm whitespace-nowrap text-gray-500">
|
||||
<div class="space-y-2">
|
||||
<div class="flex items-center gap-2">
|
||||
<span class="w-12 text-right text-xs font-semibold text-gray-600"
|
||||
>{job.progress}%</span
|
||||
>
|
||||
<div class="h-2 w-32 rounded-full bg-gray-200">
|
||||
<div
|
||||
class="h-2 rounded-full bg-blue-600 transition-all"
|
||||
style="width: {job.progress}%"
|
||||
></div>
|
||||
</div>
|
||||
</div>
|
||||
{/if}
|
||||
</div>
|
||||
</td>
|
||||
<td class="px-6 py-4 text-sm whitespace-nowrap text-gray-500">
|
||||
{formatDate(job.createdAt)}
|
||||
</td>
|
||||
<td class="px-6 py-4 text-right text-sm font-medium whitespace-nowrap">
|
||||
<div class="flex justify-end gap-2">
|
||||
{#if pendingCancelJobId === job.id}
|
||||
<button
|
||||
type="button"
|
||||
onclick={() => void runJobAction(job, 'cancel')}
|
||||
disabled={isRowBusy(job.id)}
|
||||
class="rounded bg-red-600 px-3 py-1 text-xs font-semibold text-white hover:bg-red-700 disabled:cursor-not-allowed disabled:opacity-50"
|
||||
>
|
||||
{rowActions[job.id] === 'cancel' ? 'Cancelling...' : 'Confirm cancel'}
|
||||
</button>
|
||||
<button
|
||||
type="button"
|
||||
onclick={() => requestCancel(job.id)}
|
||||
disabled={isRowBusy(job.id)}
|
||||
class="rounded border border-gray-300 px-3 py-1 text-xs font-semibold text-gray-700 hover:border-gray-400 hover:text-gray-900 disabled:cursor-not-allowed disabled:opacity-50"
|
||||
>
|
||||
Keep job
|
||||
</button>
|
||||
{:else}
|
||||
{#if canPause(job.status)}
|
||||
{#if job.totalFiles > 0}
|
||||
<div class="text-xs text-gray-400">
|
||||
{job.processedFiles}/{job.totalFiles} files processed
|
||||
</div>
|
||||
{/if}
|
||||
</div>
|
||||
</td>
|
||||
<td class="px-6 py-4 text-sm whitespace-nowrap text-gray-500">
|
||||
{formatDate(job.createdAt)}
|
||||
</td>
|
||||
<td class="px-6 py-4 text-right text-sm font-medium whitespace-nowrap">
|
||||
<div class="flex justify-end gap-2">
|
||||
{#if pendingCancelJobId === job.id}
|
||||
<button
|
||||
type="button"
|
||||
onclick={() => void runJobAction(job, 'pause')}
|
||||
onclick={() => void runJobAction(job, 'cancel')}
|
||||
disabled={isRowBusy(job.id)}
|
||||
class="rounded bg-yellow-600 px-3 py-1 text-xs font-semibold text-white hover:bg-yellow-700 disabled:cursor-not-allowed disabled:opacity-50"
|
||||
class="rounded bg-red-600 px-3 py-1 text-xs font-semibold text-white hover:bg-red-700 disabled:cursor-not-allowed disabled:opacity-50"
|
||||
>
|
||||
{rowActions[job.id] === 'pause' ? 'Pausing...' : 'Pause'}
|
||||
{rowActions[job.id] === 'cancel' ? 'Cancelling...' : 'Confirm cancel'}
|
||||
</button>
|
||||
{/if}
|
||||
{#if canResume(job.status)}
|
||||
<button
|
||||
type="button"
|
||||
onclick={() => void runJobAction(job, 'resume')}
|
||||
disabled={isRowBusy(job.id)}
|
||||
class="rounded bg-green-600 px-3 py-1 text-xs font-semibold text-white hover:bg-green-700 disabled:cursor-not-allowed disabled:opacity-50"
|
||||
>
|
||||
{rowActions[job.id] === 'resume' ? 'Resuming...' : 'Resume'}
|
||||
</button>
|
||||
{/if}
|
||||
{#if canCancel(job.status)}
|
||||
<button
|
||||
type="button"
|
||||
onclick={() => requestCancel(job.id)}
|
||||
disabled={isRowBusy(job.id)}
|
||||
class="rounded bg-red-600 px-3 py-1 text-xs font-semibold text-white hover:bg-red-700 disabled:cursor-not-allowed disabled:opacity-50"
|
||||
class="rounded border border-gray-300 px-3 py-1 text-xs font-semibold text-gray-700 hover:border-gray-400 hover:text-gray-900 disabled:cursor-not-allowed disabled:opacity-50"
|
||||
>
|
||||
Cancel
|
||||
Keep job
|
||||
</button>
|
||||
{:else}
|
||||
{#if canPause(job.status)}
|
||||
<button
|
||||
type="button"
|
||||
onclick={() => void runJobAction(job, 'pause')}
|
||||
disabled={isRowBusy(job.id)}
|
||||
class="rounded bg-yellow-600 px-3 py-1 text-xs font-semibold text-white hover:bg-yellow-700 disabled:cursor-not-allowed disabled:opacity-50"
|
||||
>
|
||||
{rowActions[job.id] === 'pause' ? 'Pausing...' : 'Pause'}
|
||||
</button>
|
||||
{/if}
|
||||
{#if canResume(job.status)}
|
||||
<button
|
||||
type="button"
|
||||
onclick={() => void runJobAction(job, 'resume')}
|
||||
disabled={isRowBusy(job.id)}
|
||||
class="rounded bg-green-600 px-3 py-1 text-xs font-semibold text-white hover:bg-green-700 disabled:cursor-not-allowed disabled:opacity-50"
|
||||
>
|
||||
{rowActions[job.id] === 'resume' ? 'Resuming...' : 'Resume'}
|
||||
</button>
|
||||
{/if}
|
||||
{#if canCancel(job.status)}
|
||||
<button
|
||||
type="button"
|
||||
onclick={() => requestCancel(job.id)}
|
||||
disabled={isRowBusy(job.id)}
|
||||
class="rounded bg-red-600 px-3 py-1 text-xs font-semibold text-white hover:bg-red-700 disabled:cursor-not-allowed disabled:opacity-50"
|
||||
>
|
||||
Cancel
|
||||
</button>
|
||||
{/if}
|
||||
{#if !canPause(job.status) && !canResume(job.status) && !canCancel(job.status)}
|
||||
<span class="text-xs text-gray-400">—</span>
|
||||
{/if}
|
||||
{/if}
|
||||
{#if !canPause(job.status) && !canResume(job.status) && !canCancel(job.status)}
|
||||
<span class="text-xs text-gray-400">—</span>
|
||||
{/if}
|
||||
{/if}
|
||||
</div>
|
||||
</td>
|
||||
</tr>
|
||||
</div>
|
||||
</td>
|
||||
</tr>
|
||||
{/each}
|
||||
{/if}
|
||||
</tbody>
|
||||
@@ -553,4 +567,4 @@
|
||||
{/if}
|
||||
</div>
|
||||
|
||||
<Toast bind:toasts={toasts} />
|
||||
<Toast bind:toasts />
|
||||
|
||||
@@ -36,9 +36,10 @@ function getServices(db: ReturnType<typeof getClient>) {
|
||||
|
||||
// Load the active embedding profile from the database
|
||||
const profileRow = db
|
||||
.prepare<[], EmbeddingProfileEntityProps>(
|
||||
'SELECT * FROM embedding_profiles WHERE is_default = 1 AND enabled = 1 LIMIT 1'
|
||||
)
|
||||
.prepare<
|
||||
[],
|
||||
EmbeddingProfileEntityProps
|
||||
>('SELECT * FROM embedding_profiles WHERE is_default = 1 AND enabled = 1 LIMIT 1')
|
||||
.get();
|
||||
|
||||
const profile = profileRow
|
||||
@@ -227,10 +228,7 @@ export const GET: RequestHandler = async ({ url }) => {
|
||||
// Fall back to commit hash prefix match (min 7 chars).
|
||||
if (!resolvedVersion && parsed.version.length >= 7) {
|
||||
resolvedVersion = db
|
||||
.prepare<
|
||||
[string, string],
|
||||
RawVersionRow
|
||||
>(
|
||||
.prepare<[string, string], RawVersionRow>(
|
||||
`SELECT id, tag FROM repository_versions
|
||||
WHERE repository_id = ? AND commit_hash LIKE ?`
|
||||
)
|
||||
@@ -261,14 +259,14 @@ export const GET: RequestHandler = async ({ url }) => {
|
||||
|
||||
const selectedResults = applyTokenBudget
|
||||
? (() => {
|
||||
const snippets = searchResults.map((r) => r.snippet);
|
||||
const selected = selectSnippetsWithinBudget(snippets, maxTokens);
|
||||
const snippets = searchResults.map((r) => r.snippet);
|
||||
const selected = selectSnippetsWithinBudget(snippets, maxTokens);
|
||||
|
||||
return selected.map((snippet) => {
|
||||
const found = searchResults.find((r) => r.snippet.id === snippet.id)!;
|
||||
return found;
|
||||
});
|
||||
})()
|
||||
return selected.map((snippet) => {
|
||||
const found = searchResults.find((r) => r.snippet.id === snippet.id)!;
|
||||
return found;
|
||||
});
|
||||
})()
|
||||
: searchResults;
|
||||
|
||||
const snippetVersionIds = Array.from(
|
||||
|
||||
@@ -22,17 +22,23 @@ const VALID_JOB_STATUSES: ReadonlySet<IndexingJob['status']> = new Set([
|
||||
'failed'
|
||||
]);
|
||||
|
||||
function parseStatusFilter(searchValue: string | null): IndexingJob['status'] | Array<IndexingJob['status']> | undefined {
|
||||
function parseStatusFilter(
|
||||
searchValue: string | null
|
||||
): IndexingJob['status'] | Array<IndexingJob['status']> | undefined {
|
||||
if (!searchValue) {
|
||||
return undefined;
|
||||
}
|
||||
|
||||
const statuses = [...new Set(
|
||||
searchValue
|
||||
.split(',')
|
||||
.map((value) => value.trim())
|
||||
.filter((value): value is IndexingJob['status'] => VALID_JOB_STATUSES.has(value as IndexingJob['status']))
|
||||
)];
|
||||
const statuses = [
|
||||
...new Set(
|
||||
searchValue
|
||||
.split(',')
|
||||
.map((value) => value.trim())
|
||||
.filter((value): value is IndexingJob['status'] =>
|
||||
VALID_JOB_STATUSES.has(value as IndexingJob['status'])
|
||||
)
|
||||
)
|
||||
];
|
||||
|
||||
if (statuses.length === 0) {
|
||||
return undefined;
|
||||
|
||||
@@ -51,7 +51,9 @@ export const GET: RequestHandler = ({ params, request }) => {
|
||||
if (lastEventId) {
|
||||
const lastEvent = broadcaster.getLastEvent(jobId);
|
||||
if (lastEvent && lastEvent.id >= parseInt(lastEventId, 10)) {
|
||||
controller.enqueue(`id: ${lastEvent.id}\nevent: ${lastEvent.event}\ndata: ${lastEvent.data}\n\n`);
|
||||
controller.enqueue(
|
||||
`id: ${lastEvent.id}\nevent: ${lastEvent.event}\ndata: ${lastEvent.data}\n\n`
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -80,10 +82,7 @@ export const GET: RequestHandler = ({ params, request }) => {
|
||||
controller.enqueue(value);
|
||||
|
||||
// Check if the incoming event indicates job completion
|
||||
if (
|
||||
value.includes('event: job-done') ||
|
||||
value.includes('event: job-failed')
|
||||
) {
|
||||
if (value.includes('event: job-done') || value.includes('event: job-failed')) {
|
||||
controller.close();
|
||||
break;
|
||||
}
|
||||
@@ -111,7 +110,7 @@ export const GET: RequestHandler = ({ params, request }) => {
|
||||
headers: {
|
||||
'Content-Type': 'text/event-stream',
|
||||
'Cache-Control': 'no-cache',
|
||||
'Connection': 'keep-alive',
|
||||
Connection: 'keep-alive',
|
||||
'X-Accel-Buffering': 'no',
|
||||
'Access-Control-Allow-Origin': '*'
|
||||
}
|
||||
|
||||
@@ -30,7 +30,7 @@ export const GET: RequestHandler = ({ url }) => {
|
||||
headers: {
|
||||
'Content-Type': 'text/event-stream',
|
||||
'Cache-Control': 'no-cache',
|
||||
'Connection': 'keep-alive',
|
||||
Connection: 'keep-alive',
|
||||
'X-Accel-Buffering': 'no',
|
||||
'Access-Control-Allow-Origin': '*'
|
||||
}
|
||||
|
||||
@@ -124,9 +124,11 @@ describe('POST /api/v1/libs/:id/index', () => {
|
||||
versionService.add('/facebook/react', 'v18.3.0', 'React v18.3.0');
|
||||
versionService.add('/facebook/react', 'v17.0.0', 'React v17.0.0');
|
||||
|
||||
const enqueue = vi.fn().mockImplementation(
|
||||
(repositoryId: string, versionId?: string) => makeEnqueueJob(repositoryId, versionId)
|
||||
);
|
||||
const enqueue = vi
|
||||
.fn()
|
||||
.mockImplementation((repositoryId: string, versionId?: string) =>
|
||||
makeEnqueueJob(repositoryId, versionId)
|
||||
);
|
||||
mockQueue = { enqueue };
|
||||
|
||||
const response = await postIndex({
|
||||
@@ -158,9 +160,11 @@ describe('POST /api/v1/libs/:id/index', () => {
|
||||
repoService.add({ source: 'github', sourceUrl: 'https://github.com/facebook/react' });
|
||||
versionService.add('/facebook/react', 'v18.3.0', 'React v18.3.0');
|
||||
|
||||
const enqueue = vi.fn().mockImplementation(
|
||||
(repositoryId: string, versionId?: string) => makeEnqueueJob(repositoryId, versionId)
|
||||
);
|
||||
const enqueue = vi
|
||||
.fn()
|
||||
.mockImplementation((repositoryId: string, versionId?: string) =>
|
||||
makeEnqueueJob(repositoryId, versionId)
|
||||
);
|
||||
mockQueue = { enqueue };
|
||||
|
||||
const response = await postIndex({
|
||||
|
||||
@@ -49,7 +49,10 @@ function createTestDb(): Database.Database {
|
||||
const client = new Database(':memory:');
|
||||
client.pragma('foreign_keys = ON');
|
||||
|
||||
const migrationsFolder = join(import.meta.dirname, '../../../../../../../lib/server/db/migrations');
|
||||
const migrationsFolder = join(
|
||||
import.meta.dirname,
|
||||
'../../../../../../../lib/server/db/migrations'
|
||||
);
|
||||
const ftsFile = join(import.meta.dirname, '../../../../../../../lib/server/db/fts.sql');
|
||||
|
||||
const migration0 = readFileSync(join(migrationsFolder, '0000_large_master_chief.sql'), 'utf-8');
|
||||
|
||||
@@ -18,9 +18,10 @@ export const GET: RequestHandler = () => {
|
||||
try {
|
||||
const db = getClient();
|
||||
const row = db
|
||||
.prepare<[], { value: string }>(
|
||||
"SELECT value FROM settings WHERE key = 'indexing.concurrency'"
|
||||
)
|
||||
.prepare<
|
||||
[],
|
||||
{ value: string }
|
||||
>("SELECT value FROM settings WHERE key = 'indexing.concurrency'")
|
||||
.get();
|
||||
|
||||
let concurrency = 2;
|
||||
@@ -54,13 +55,13 @@ export const PUT: RequestHandler = async ({ request }) => {
|
||||
|
||||
// Validate and clamp concurrency
|
||||
const maxConcurrency = Math.max(os.cpus().length - 1, 1);
|
||||
const concurrency = Math.max(1, Math.min(parseInt(String(body.concurrency ?? 2), 10), maxConcurrency));
|
||||
const concurrency = Math.max(
|
||||
1,
|
||||
Math.min(parseInt(String(body.concurrency ?? 2), 10), maxConcurrency)
|
||||
);
|
||||
|
||||
if (isNaN(concurrency)) {
|
||||
return json(
|
||||
{ error: 'Concurrency must be a valid integer' },
|
||||
{ status: 400 }
|
||||
);
|
||||
return json({ error: 'Concurrency must be a valid integer' }, { status: 400 });
|
||||
}
|
||||
|
||||
const db = getClient();
|
||||
|
||||
@@ -18,7 +18,8 @@ import type { ProgressBroadcaster as BroadcasterType } from '$lib/server/pipelin
|
||||
let db: Database.Database;
|
||||
// Closed over by the vi.mock factory below.
|
||||
let mockBroadcaster: BroadcasterType | null = null;
|
||||
let mockPool: { getStatus: () => object; setMaxConcurrency?: (value: number) => void } | null = null;
|
||||
let mockPool: { getStatus: () => object; setMaxConcurrency?: (value: number) => void } | null =
|
||||
null;
|
||||
|
||||
vi.mock('$lib/server/db/client', () => ({
|
||||
getClient: () => db
|
||||
@@ -39,7 +40,8 @@ vi.mock('$lib/server/pipeline/startup.js', () => ({
|
||||
}));
|
||||
|
||||
vi.mock('$lib/server/pipeline/progress-broadcaster', async (importOriginal) => {
|
||||
const original = await importOriginal<typeof import('$lib/server/pipeline/progress-broadcaster.js')>();
|
||||
const original =
|
||||
await importOriginal<typeof import('$lib/server/pipeline/progress-broadcaster.js')>();
|
||||
return {
|
||||
...original,
|
||||
getBroadcaster: () => mockBroadcaster
|
||||
@@ -47,7 +49,8 @@ vi.mock('$lib/server/pipeline/progress-broadcaster', async (importOriginal) => {
|
||||
});
|
||||
|
||||
vi.mock('$lib/server/pipeline/progress-broadcaster.js', async (importOriginal) => {
|
||||
const original = await importOriginal<typeof import('$lib/server/pipeline/progress-broadcaster.js')>();
|
||||
const original =
|
||||
await importOriginal<typeof import('$lib/server/pipeline/progress-broadcaster.js')>();
|
||||
return {
|
||||
...original,
|
||||
getBroadcaster: () => mockBroadcaster
|
||||
@@ -62,7 +65,10 @@ import { ProgressBroadcaster } from '$lib/server/pipeline/progress-broadcaster.j
|
||||
import { GET as getJobsList } from './jobs/+server.js';
|
||||
import { GET as getJobStream } from './jobs/[id]/stream/+server.js';
|
||||
import { GET as getJobsStream } from './jobs/stream/+server.js';
|
||||
import { GET as getIndexingSettings, PUT as putIndexingSettings } from './settings/indexing/+server.js';
|
||||
import {
|
||||
GET as getIndexingSettings,
|
||||
PUT as putIndexingSettings
|
||||
} from './settings/indexing/+server.js';
|
||||
import { GET as getWorkers } from './workers/+server.js';
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
@@ -84,7 +90,10 @@ function createTestDb(): Database.Database {
|
||||
'0005_fix_stage_defaults.sql'
|
||||
]) {
|
||||
const sql = readFileSync(join(migrationsFolder, migrationFile), 'utf-8');
|
||||
for (const stmt of sql.split('--> statement-breakpoint').map((s) => s.trim()).filter(Boolean)) {
|
||||
for (const stmt of sql
|
||||
.split('--> statement-breakpoint')
|
||||
.map((s) => s.trim())
|
||||
.filter(Boolean)) {
|
||||
client.exec(stmt);
|
||||
}
|
||||
}
|
||||
@@ -201,9 +210,7 @@ describe('GET /api/v1/jobs/:id/stream', () => {
|
||||
it('returns 404 when the job does not exist', async () => {
|
||||
seedRepo(db);
|
||||
|
||||
const response = await getJobStream(
|
||||
makeEvent({ params: { id: 'non-existent-job-id' } })
|
||||
);
|
||||
const response = await getJobStream(makeEvent({ params: { id: 'non-existent-job-id' } }));
|
||||
|
||||
expect(response.status).toBe(404);
|
||||
});
|
||||
@@ -363,7 +370,9 @@ describe('GET /api/v1/jobs/stream', () => {
|
||||
const subscribeSpy = vi.spyOn(mockBroadcaster!, 'subscribeRepository');
|
||||
|
||||
await getJobsStream(
|
||||
makeEvent<Parameters<typeof getJobsStream>[0]>({ url: 'http://localhost/api/v1/jobs/stream?repositoryId=/test/repo' })
|
||||
makeEvent<Parameters<typeof getJobsStream>[0]>({
|
||||
url: 'http://localhost/api/v1/jobs/stream?repositoryId=/test/repo'
|
||||
})
|
||||
);
|
||||
|
||||
expect(subscribeSpy).toHaveBeenCalledWith('/test/repo');
|
||||
@@ -383,7 +392,9 @@ describe('GET /api/v1/jobs/stream', () => {
|
||||
seedRepo(db, '/repo/alpha');
|
||||
|
||||
const response = await getJobsStream(
|
||||
makeEvent<Parameters<typeof getJobsStream>[0]>({ url: 'http://localhost/api/v1/jobs/stream?repositoryId=/repo/alpha' })
|
||||
makeEvent<Parameters<typeof getJobsStream>[0]>({
|
||||
url: 'http://localhost/api/v1/jobs/stream?repositoryId=/repo/alpha'
|
||||
})
|
||||
);
|
||||
|
||||
// Broadcast an event for this repository
|
||||
@@ -521,7 +532,9 @@ describe('GET /api/v1/settings/indexing', () => {
|
||||
});
|
||||
|
||||
it('returns { concurrency: 2 } when no setting exists in DB', async () => {
|
||||
const response = await getIndexingSettings(makeEvent<Parameters<typeof getIndexingSettings>[0]>({}));
|
||||
const response = await getIndexingSettings(
|
||||
makeEvent<Parameters<typeof getIndexingSettings>[0]>({})
|
||||
);
|
||||
const body = await response.json();
|
||||
|
||||
expect(response.status).toBe(200);
|
||||
@@ -533,7 +546,9 @@ describe('GET /api/v1/settings/indexing', () => {
|
||||
"INSERT INTO settings (key, value, updated_at) VALUES ('indexing.concurrency', ?, ?)"
|
||||
).run(JSON.stringify(4), NOW_S);
|
||||
|
||||
const response = await getIndexingSettings(makeEvent<Parameters<typeof getIndexingSettings>[0]>({}));
|
||||
const response = await getIndexingSettings(
|
||||
makeEvent<Parameters<typeof getIndexingSettings>[0]>({})
|
||||
);
|
||||
const body = await response.json();
|
||||
|
||||
expect(body.concurrency).toBe(4);
|
||||
@@ -544,7 +559,9 @@ describe('GET /api/v1/settings/indexing', () => {
|
||||
"INSERT INTO settings (key, value, updated_at) VALUES ('indexing.concurrency', ?, ?)"
|
||||
).run(JSON.stringify({ value: 5 }), NOW_S);
|
||||
|
||||
const response = await getIndexingSettings(makeEvent<Parameters<typeof getIndexingSettings>[0]>({}));
|
||||
const response = await getIndexingSettings(
|
||||
makeEvent<Parameters<typeof getIndexingSettings>[0]>({})
|
||||
);
|
||||
const body = await response.json();
|
||||
|
||||
expect(body.concurrency).toBe(5);
|
||||
@@ -600,9 +617,10 @@ describe('PUT /api/v1/settings/indexing', () => {
|
||||
await putIndexingSettings(makePutEvent({ concurrency: 3 }));
|
||||
|
||||
const row = db
|
||||
.prepare<[], { value: string }>(
|
||||
"SELECT value FROM settings WHERE key = 'indexing.concurrency'"
|
||||
)
|
||||
.prepare<
|
||||
[],
|
||||
{ value: string }
|
||||
>("SELECT value FROM settings WHERE key = 'indexing.concurrency'")
|
||||
.get();
|
||||
|
||||
expect(row).toBeDefined();
|
||||
@@ -634,9 +652,7 @@ describe('PUT /api/v1/settings/indexing', () => {
|
||||
// The actual flow: parseInt('abc') => NaN, Math.max(1, Math.min(NaN, max)) => NaN,
|
||||
// then `if (isNaN(concurrency))` returns 400.
|
||||
// We pass the raw string directly.
|
||||
const response = await putIndexingSettings(
|
||||
makePutEvent({ concurrency: 'not-a-number' })
|
||||
);
|
||||
const response = await putIndexingSettings(makePutEvent({ concurrency: 'not-a-number' }));
|
||||
|
||||
// parseInt('not-a-number') = NaN, so the handler should return 400
|
||||
expect(response.status).toBe(400);
|
||||
|
||||
@@ -39,8 +39,11 @@
|
||||
indexedAt: string | null;
|
||||
createdAt: string;
|
||||
}
|
||||
type VersionStateFilter = VersionDto['state'] | 'all';
|
||||
let versions = $state<VersionDto[]>([]);
|
||||
let versionsLoading = $state(false);
|
||||
let activeVersionFilter = $state<VersionStateFilter>('all');
|
||||
let bulkReprocessBusy = $state(false);
|
||||
|
||||
// Add version form
|
||||
let addVersionTag = $state('');
|
||||
@@ -49,7 +52,7 @@
|
||||
// Discover tags state
|
||||
let discoverBusy = $state(false);
|
||||
let discoveredTags = $state<Array<{ tag: string; commitHash: string }>>([]);
|
||||
let selectedDiscoveredTags = new SvelteSet<string>();
|
||||
const selectedDiscoveredTags = new SvelteSet<string>();
|
||||
let showDiscoverPanel = $state(false);
|
||||
let registerBusy = $state(false);
|
||||
|
||||
@@ -76,6 +79,14 @@
|
||||
error: 'Error'
|
||||
};
|
||||
|
||||
const versionFilterOptions: Array<{ value: VersionStateFilter; label: string }> = [
|
||||
{ value: 'all', label: 'All' },
|
||||
{ value: 'pending', label: stateLabels.pending },
|
||||
{ value: 'indexing', label: stateLabels.indexing },
|
||||
{ value: 'indexed', label: stateLabels.indexed },
|
||||
{ value: 'error', label: stateLabels.error }
|
||||
];
|
||||
|
||||
const stageLabels: Record<string, string> = {
|
||||
queued: 'Queued',
|
||||
differential: 'Diff',
|
||||
@@ -88,6 +99,20 @@
|
||||
failed: 'Failed'
|
||||
};
|
||||
|
||||
const filteredVersions = $derived(
|
||||
activeVersionFilter === 'all'
|
||||
? versions
|
||||
: versions.filter((version) => version.state === activeVersionFilter)
|
||||
);
|
||||
const actionableErroredTags = $derived(
|
||||
versions
|
||||
.filter((version) => version.state === 'error' && !activeVersionJobs[version.tag])
|
||||
.map((version) => version.tag)
|
||||
);
|
||||
const activeVersionFilterLabel = $derived(
|
||||
versionFilterOptions.find((option) => option.value === activeVersionFilter)?.label ?? 'All'
|
||||
);
|
||||
|
||||
async function refreshRepo() {
|
||||
try {
|
||||
const res = await fetch(`/api/v1/libs/${encodeURIComponent(repo.id)}`);
|
||||
@@ -123,9 +148,7 @@
|
||||
if (!repo.id) return;
|
||||
|
||||
let stopped = false;
|
||||
const es = new EventSource(
|
||||
`/api/v1/jobs/stream?repositoryId=${encodeURIComponent(repo.id)}`
|
||||
);
|
||||
const es = new EventSource(`/api/v1/jobs/stream?repositoryId=${encodeURIComponent(repo.id)}`);
|
||||
|
||||
es.addEventListener('job-progress', (event) => {
|
||||
if (stopped) return;
|
||||
@@ -277,23 +300,58 @@
|
||||
async function handleIndexVersion(tag: string) {
|
||||
errorMessage = null;
|
||||
try {
|
||||
const res = await fetch(
|
||||
`/api/v1/libs/${encodeURIComponent(repo.id)}/versions/${encodeURIComponent(tag)}/index`,
|
||||
{ method: 'POST' }
|
||||
);
|
||||
if (!res.ok) {
|
||||
const d = await res.json();
|
||||
throw new Error(d.error ?? 'Failed to queue version indexing');
|
||||
}
|
||||
const d = await res.json();
|
||||
if (d.job?.id) {
|
||||
activeVersionJobs = { ...activeVersionJobs, [tag]: d.job.id };
|
||||
const jobId = await queueVersionIndex(tag);
|
||||
if (jobId) {
|
||||
activeVersionJobs = { ...activeVersionJobs, [tag]: jobId };
|
||||
}
|
||||
} catch (e) {
|
||||
errorMessage = (e as Error).message;
|
||||
}
|
||||
}
|
||||
|
||||
async function queueVersionIndex(tag: string): Promise<string | null> {
|
||||
const res = await fetch(
|
||||
`/api/v1/libs/${encodeURIComponent(repo.id)}/versions/${encodeURIComponent(tag)}/index`,
|
||||
{ method: 'POST' }
|
||||
);
|
||||
if (!res.ok) {
|
||||
const d = await res.json();
|
||||
throw new Error(d.error ?? 'Failed to queue version indexing');
|
||||
}
|
||||
const d = await res.json();
|
||||
return d.job?.id ?? null;
|
||||
}
|
||||
|
||||
async function handleBulkReprocessErroredVersions() {
|
||||
if (actionableErroredTags.length === 0) return;
|
||||
bulkReprocessBusy = true;
|
||||
errorMessage = null;
|
||||
successMessage = null;
|
||||
try {
|
||||
const tags = [...actionableErroredTags];
|
||||
const BATCH_SIZE = 5;
|
||||
let next = { ...activeVersionJobs };
|
||||
|
||||
for (let i = 0; i < tags.length; i += BATCH_SIZE) {
|
||||
const batch = tags.slice(i, i + BATCH_SIZE);
|
||||
const jobIds = await Promise.all(batch.map((versionTag) => queueVersionIndex(versionTag)));
|
||||
for (let j = 0; j < batch.length; j++) {
|
||||
if (jobIds[j]) {
|
||||
next = { ...next, [batch[j]]: jobIds[j] ?? undefined };
|
||||
}
|
||||
}
|
||||
activeVersionJobs = next;
|
||||
}
|
||||
|
||||
successMessage = `Queued ${tags.length} errored tag${tags.length === 1 ? '' : 's'} for reprocessing.`;
|
||||
await loadVersions();
|
||||
} catch (e) {
|
||||
errorMessage = (e as Error).message;
|
||||
} finally {
|
||||
bulkReprocessBusy = false;
|
||||
}
|
||||
}
|
||||
|
||||
async function handleRemoveVersion() {
|
||||
if (!removeTag) return;
|
||||
const tag = removeTag;
|
||||
@@ -318,10 +376,9 @@
|
||||
discoverBusy = true;
|
||||
errorMessage = null;
|
||||
try {
|
||||
const res = await fetch(
|
||||
`/api/v1/libs/${encodeURIComponent(repo.id)}/versions/discover`,
|
||||
{ method: 'POST' }
|
||||
);
|
||||
const res = await fetch(`/api/v1/libs/${encodeURIComponent(repo.id)}/versions/discover`, {
|
||||
method: 'POST'
|
||||
});
|
||||
if (!res.ok) {
|
||||
const d = await res.json();
|
||||
throw new Error(d.error ?? 'Failed to discover tags');
|
||||
@@ -331,7 +388,10 @@
|
||||
discoveredTags = (d.tags ?? []).filter(
|
||||
(t: { tag: string; commitHash: string }) => !registeredTags.has(t.tag)
|
||||
);
|
||||
selectedDiscoveredTags = new SvelteSet(discoveredTags.map((t) => t.tag));
|
||||
selectedDiscoveredTags.clear();
|
||||
for (const discoveredTag of discoveredTags) {
|
||||
selectedDiscoveredTags.add(discoveredTag.tag);
|
||||
}
|
||||
showDiscoverPanel = true;
|
||||
} catch (e) {
|
||||
errorMessage = (e as Error).message;
|
||||
@@ -380,7 +440,7 @@
|
||||
activeVersionJobs = next;
|
||||
showDiscoverPanel = false;
|
||||
discoveredTags = [];
|
||||
selectedDiscoveredTags = new SvelteSet();
|
||||
selectedDiscoveredTags.clear();
|
||||
await loadVersions();
|
||||
} catch (e) {
|
||||
errorMessage = (e as Error).message;
|
||||
@@ -498,41 +558,69 @@
|
||||
|
||||
<!-- Versions -->
|
||||
<div class="mt-6 rounded-xl border border-gray-200 bg-white p-5">
|
||||
<div class="mb-4 flex flex-wrap items-center justify-between gap-3">
|
||||
<h2 class="text-sm font-semibold text-gray-700">Versions</h2>
|
||||
<div class="flex flex-wrap items-center gap-2">
|
||||
<!-- Add version inline form -->
|
||||
<form
|
||||
onsubmit={(e) => {
|
||||
e.preventDefault();
|
||||
handleAddVersion();
|
||||
}}
|
||||
class="flex items-center gap-1.5"
|
||||
>
|
||||
<input
|
||||
type="text"
|
||||
bind:value={addVersionTag}
|
||||
placeholder="e.g. v2.0.0"
|
||||
class="rounded-lg border border-gray-200 px-3 py-1.5 text-sm text-gray-900 placeholder-gray-400 focus:border-blue-400 focus:outline-none"
|
||||
/>
|
||||
<div class="mb-4 flex flex-col gap-3">
|
||||
<div class="flex flex-wrap items-center justify-between gap-3">
|
||||
<div class="flex flex-wrap items-center gap-3">
|
||||
<h2 class="text-sm font-semibold text-gray-700">Versions</h2>
|
||||
<div class="flex flex-wrap items-center gap-1 rounded-lg bg-gray-100 p-1">
|
||||
{#each versionFilterOptions as option (option.value)}
|
||||
<button
|
||||
type="button"
|
||||
onclick={() => (activeVersionFilter = option.value)}
|
||||
class="rounded-md px-2.5 py-1 text-xs font-medium transition-colors {activeVersionFilter ===
|
||||
option.value
|
||||
? 'bg-white text-gray-900 shadow-sm'
|
||||
: 'text-gray-500 hover:text-gray-700'}"
|
||||
>
|
||||
{option.label}
|
||||
</button>
|
||||
{/each}
|
||||
</div>
|
||||
</div>
|
||||
<div class="flex flex-wrap items-center gap-2">
|
||||
<button
|
||||
type="submit"
|
||||
disabled={addVersionBusy || !addVersionTag.trim()}
|
||||
class="rounded-lg bg-blue-600 px-3 py-1.5 text-sm font-medium text-white hover:bg-blue-700 disabled:cursor-not-allowed disabled:opacity-50"
|
||||
type="button"
|
||||
onclick={handleBulkReprocessErroredVersions}
|
||||
disabled={bulkReprocessBusy || actionableErroredTags.length === 0}
|
||||
class="rounded-lg border border-red-200 px-3 py-1.5 text-sm font-medium text-red-600 hover:bg-red-50 disabled:cursor-not-allowed disabled:opacity-50"
|
||||
>
|
||||
Add
|
||||
{bulkReprocessBusy
|
||||
? 'Reprocessing...'
|
||||
: `Reprocess errored${actionableErroredTags.length > 0 ? ` (${actionableErroredTags.length})` : ''}`}
|
||||
</button>
|
||||
</form>
|
||||
<!-- Discover tags button — local repos only -->
|
||||
{#if repo.source === 'local'}
|
||||
<button
|
||||
onclick={handleDiscoverTags}
|
||||
disabled={discoverBusy}
|
||||
class="rounded-lg border border-gray-200 px-3 py-1.5 text-sm font-medium text-gray-700 hover:bg-gray-50 disabled:cursor-not-allowed disabled:opacity-50"
|
||||
<!-- Add version inline form -->
|
||||
<form
|
||||
onsubmit={(e) => {
|
||||
e.preventDefault();
|
||||
handleAddVersion();
|
||||
}}
|
||||
class="flex items-center gap-1.5"
|
||||
>
|
||||
{discoverBusy ? 'Discovering...' : 'Discover tags'}
|
||||
</button>
|
||||
{/if}
|
||||
<input
|
||||
type="text"
|
||||
bind:value={addVersionTag}
|
||||
placeholder="e.g. v2.0.0"
|
||||
class="rounded-lg border border-gray-200 px-3 py-1.5 text-sm text-gray-900 placeholder-gray-400 focus:border-blue-400 focus:outline-none"
|
||||
/>
|
||||
<button
|
||||
type="submit"
|
||||
disabled={addVersionBusy || !addVersionTag.trim()}
|
||||
class="rounded-lg bg-blue-600 px-3 py-1.5 text-sm font-medium text-white hover:bg-blue-700 disabled:cursor-not-allowed disabled:opacity-50"
|
||||
>
|
||||
Add
|
||||
</button>
|
||||
</form>
|
||||
<!-- Discover tags button — local repos only -->
|
||||
{#if repo.source === 'local'}
|
||||
<button
|
||||
onclick={handleDiscoverTags}
|
||||
disabled={discoverBusy}
|
||||
class="rounded-lg border border-gray-200 px-3 py-1.5 text-sm font-medium text-gray-700 hover:bg-gray-50 disabled:cursor-not-allowed disabled:opacity-50"
|
||||
>
|
||||
{discoverBusy ? 'Discovering...' : 'Discover tags'}
|
||||
</button>
|
||||
{/if}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
@@ -549,7 +637,7 @@
|
||||
onclick={() => {
|
||||
showDiscoverPanel = false;
|
||||
discoveredTags = [];
|
||||
selectedDiscoveredTags = new SvelteSet();
|
||||
selectedDiscoveredTags.clear();
|
||||
}}
|
||||
class="text-xs text-blue-600 hover:underline"
|
||||
>
|
||||
@@ -567,7 +655,9 @@
|
||||
class="rounded border-gray-300"
|
||||
/>
|
||||
<span class="font-mono text-gray-800">{discovered.tag}</span>
|
||||
<span class="font-mono text-xs text-gray-400">{discovered.commitHash.slice(0, 8)}</span>
|
||||
<span class="font-mono text-xs text-gray-400"
|
||||
>{discovered.commitHash.slice(0, 8)}</span
|
||||
>
|
||||
</label>
|
||||
{/each}
|
||||
</div>
|
||||
@@ -576,9 +666,7 @@
|
||||
disabled={registerBusy || selectedDiscoveredTags.size === 0}
|
||||
class="rounded-lg bg-blue-600 px-3 py-1.5 text-sm font-medium text-white hover:bg-blue-700 disabled:cursor-not-allowed disabled:opacity-50"
|
||||
>
|
||||
{registerBusy
|
||||
? 'Registering...'
|
||||
: `Register ${selectedDiscoveredTags.size} selected`}
|
||||
{registerBusy ? 'Registering...' : `Register ${selectedDiscoveredTags.size} selected`}
|
||||
</button>
|
||||
{/if}
|
||||
</div>
|
||||
@@ -589,9 +677,15 @@
|
||||
<p class="text-sm text-gray-400">Loading versions...</p>
|
||||
{:else if versions.length === 0}
|
||||
<p class="text-sm text-gray-400">No versions registered. Add a tag above to get started.</p>
|
||||
{:else if filteredVersions.length === 0}
|
||||
<div class="rounded-lg border border-dashed border-gray-200 bg-gray-50 px-4 py-5">
|
||||
<p class="text-sm text-gray-500">
|
||||
No versions match the {activeVersionFilterLabel.toLowerCase()} filter.
|
||||
</p>
|
||||
</div>
|
||||
{:else}
|
||||
<div class="divide-y divide-gray-100">
|
||||
{#each versions as version (version.id)}
|
||||
{#each filteredVersions as version (version.id)}
|
||||
<div class="py-2.5">
|
||||
<div class="flex items-center justify-between">
|
||||
<div class="flex items-center gap-3">
|
||||
@@ -609,7 +703,9 @@
|
||||
disabled={version.state === 'indexing' || !!activeVersionJobs[version.tag]}
|
||||
class="rounded-lg border border-blue-200 px-3 py-1 text-xs font-medium text-blue-600 hover:bg-blue-50 disabled:cursor-not-allowed disabled:opacity-50"
|
||||
>
|
||||
{version.state === 'indexing' || !!activeVersionJobs[version.tag] ? 'Indexing...' : 'Index'}
|
||||
{version.state === 'indexing' || !!activeVersionJobs[version.tag]
|
||||
? 'Indexing...'
|
||||
: 'Index'}
|
||||
</button>
|
||||
<button
|
||||
onclick={() => (removeTag = version.tag)}
|
||||
@@ -625,12 +721,8 @@
|
||||
version.totalSnippets > 0
|
||||
? { text: `${version.totalSnippets} snippets`, mono: false }
|
||||
: null,
|
||||
version.commitHash
|
||||
? { text: version.commitHash.slice(0, 8), mono: true }
|
||||
: null,
|
||||
version.indexedAt
|
||||
? { text: formatDate(version.indexedAt), mono: false }
|
||||
: null
|
||||
version.commitHash ? { text: version.commitHash.slice(0, 8), mono: true } : null,
|
||||
version.indexedAt ? { text: formatDate(version.indexedAt), mono: false } : null
|
||||
] as Array<{ text: string; mono: boolean } | null>
|
||||
).filter((p): p is { text: string; mono: boolean } => p !== null)}
|
||||
<div class="mt-1 flex items-center gap-1.5">
|
||||
@@ -638,7 +730,8 @@
|
||||
{#if i > 0}
|
||||
<span class="text-xs text-gray-300">·</span>
|
||||
{/if}
|
||||
<span class="text-xs text-gray-400{part.mono ? ' font-mono' : ''}">{part.text}</span>
|
||||
<span class="text-xs text-gray-400{part.mono ? ' font-mono' : ''}">{part.text}</span
|
||||
>
|
||||
{/each}
|
||||
</div>
|
||||
{/if}
|
||||
@@ -646,10 +739,12 @@
|
||||
{@const job = versionJobProgress[activeVersionJobs[version.tag]!]}
|
||||
<div class="mt-2">
|
||||
<div class="flex justify-between text-xs text-gray-500">
|
||||
<span>
|
||||
{#if job?.stageDetail}{job.stageDetail}{:else}{(job?.processedFiles ?? 0).toLocaleString()} / {(job?.totalFiles ?? 0).toLocaleString()} files{/if}
|
||||
{#if job?.stage}{' - ' + (stageLabels[job.stage] ?? job.stage)}{/if}
|
||||
</span>
|
||||
<span>
|
||||
{#if job?.stageDetail}{job.stageDetail}{:else}{(
|
||||
job?.processedFiles ?? 0
|
||||
).toLocaleString()} / {(job?.totalFiles ?? 0).toLocaleString()} files{/if}
|
||||
{#if job?.stage}{' - ' + (stageLabels[job.stage] ?? job.stage)}{/if}
|
||||
</span>
|
||||
<span>{job?.progress ?? 0}%</span>
|
||||
</div>
|
||||
<div class="mt-1 h-1.5 w-full rounded-full bg-gray-200">
|
||||
|
||||
@@ -20,9 +20,7 @@ export const load: PageServerLoad = async () => {
|
||||
// Read indexing concurrency setting
|
||||
let indexingConcurrency = 2;
|
||||
const concurrencyRow = db
|
||||
.prepare<[], { value: string }>(
|
||||
"SELECT value FROM settings WHERE key = 'indexing.concurrency'"
|
||||
)
|
||||
.prepare<[], { value: string }>("SELECT value FROM settings WHERE key = 'indexing.concurrency'")
|
||||
.get();
|
||||
|
||||
if (concurrencyRow && concurrencyRow.value) {
|
||||
|
||||
@@ -199,7 +199,9 @@
|
||||
}
|
||||
|
||||
function getOpenAiProfile(settings: EmbeddingSettingsDto): EmbeddingProfileDto | null {
|
||||
return settings.profiles.find((profile) => profile.providerKind === 'openai-compatible') ?? null;
|
||||
return (
|
||||
settings.profiles.find((profile) => profile.providerKind === 'openai-compatible') ?? null
|
||||
);
|
||||
}
|
||||
|
||||
function resolveProvider(profile: EmbeddingProfileDto | null): 'none' | 'openai' | 'local' {
|
||||
@@ -210,27 +212,30 @@
|
||||
}
|
||||
|
||||
function resolveBaseUrl(settings: EmbeddingSettingsDto): string {
|
||||
const profile = settings.activeProfile?.providerKind === 'openai-compatible'
|
||||
? settings.activeProfile
|
||||
: getOpenAiProfile(settings);
|
||||
const profile =
|
||||
settings.activeProfile?.providerKind === 'openai-compatible'
|
||||
? settings.activeProfile
|
||||
: getOpenAiProfile(settings);
|
||||
return typeof profile?.config.baseUrl === 'string'
|
||||
? profile.config.baseUrl
|
||||
: 'https://api.openai.com/v1';
|
||||
}
|
||||
|
||||
function resolveModel(settings: EmbeddingSettingsDto): string {
|
||||
const profile = settings.activeProfile?.providerKind === 'openai-compatible'
|
||||
? settings.activeProfile
|
||||
: getOpenAiProfile(settings);
|
||||
const profile =
|
||||
settings.activeProfile?.providerKind === 'openai-compatible'
|
||||
? settings.activeProfile
|
||||
: getOpenAiProfile(settings);
|
||||
return typeof profile?.config.model === 'string'
|
||||
? profile.config.model
|
||||
: profile?.model ?? 'text-embedding-3-small';
|
||||
: (profile?.model ?? 'text-embedding-3-small');
|
||||
}
|
||||
|
||||
function resolveDimensions(settings: EmbeddingSettingsDto): number | undefined {
|
||||
const profile = settings.activeProfile?.providerKind === 'openai-compatible'
|
||||
? settings.activeProfile
|
||||
: getOpenAiProfile(settings);
|
||||
const profile =
|
||||
settings.activeProfile?.providerKind === 'openai-compatible'
|
||||
? settings.activeProfile
|
||||
: getOpenAiProfile(settings);
|
||||
return profile?.dimensions ?? 1536;
|
||||
}
|
||||
|
||||
@@ -296,34 +301,38 @@
|
||||
<dt class="font-medium text-gray-500">Provider</dt>
|
||||
<dd class="font-semibold text-gray-900">{activeProfile.providerKind}</dd>
|
||||
<dt class="font-medium text-gray-500">Model</dt>
|
||||
<dd class="break-all font-semibold text-gray-900">{activeProfile.model}</dd>
|
||||
<dd class="font-semibold break-all text-gray-900">{activeProfile.model}</dd>
|
||||
<dt class="font-medium text-gray-500">Dimensions</dt>
|
||||
<dd class="font-semibold text-gray-900">{activeProfile.dimensions}</dd>
|
||||
</div>
|
||||
|
||||
<div class="grid grid-cols-[110px_1fr] gap-x-4 gap-y-2 pt-3">
|
||||
<dt class="text-gray-500">Enabled</dt>
|
||||
<dd class="font-medium text-gray-800">{activeProfile.enabled ? 'Yes' : 'No'}</dd>
|
||||
<dt class="text-gray-500">Default</dt>
|
||||
<dd class="font-medium text-gray-800">{activeProfile.isDefault ? 'Yes' : 'No'}</dd>
|
||||
<dt class="text-gray-500">Updated</dt>
|
||||
<dd class="font-medium text-gray-800">{formatTimestamp(activeProfile.updatedAt)}</dd>
|
||||
<dt class="text-gray-500">Enabled</dt>
|
||||
<dd class="font-medium text-gray-800">{activeProfile.enabled ? 'Yes' : 'No'}</dd>
|
||||
<dt class="text-gray-500">Default</dt>
|
||||
<dd class="font-medium text-gray-800">{activeProfile.isDefault ? 'Yes' : 'No'}</dd>
|
||||
<dt class="text-gray-500">Updated</dt>
|
||||
<dd class="font-medium text-gray-800">{formatTimestamp(activeProfile.updatedAt)}</dd>
|
||||
</div>
|
||||
</dl>
|
||||
</div>
|
||||
|
||||
<div class="rounded-lg border border-gray-200 bg-gray-50 p-4">
|
||||
<p class="text-sm font-medium text-gray-800">Provider configuration</p>
|
||||
<p class="mb-3 mt-1 text-sm text-gray-500">
|
||||
<p class="mt-1 mb-3 text-sm text-gray-500">
|
||||
These are the provider-specific settings currently saved for the active profile.
|
||||
</p>
|
||||
|
||||
{#if activeConfigEntries.length > 0}
|
||||
<ul class="space-y-2 text-sm">
|
||||
{#each activeConfigEntries as entry (entry.key)}
|
||||
<li class="flex items-start justify-between gap-4 border-b border-gray-200 pb-2 last:border-b-0 last:pb-0">
|
||||
<li
|
||||
class="flex items-start justify-between gap-4 border-b border-gray-200 pb-2 last:border-b-0 last:pb-0"
|
||||
>
|
||||
<span class="font-medium text-gray-600">{entry.key}</span>
|
||||
<span class={entry.redacted ? 'text-gray-500' : 'text-gray-800'}>{entry.value}</span>
|
||||
<span class={entry.redacted ? 'text-gray-500' : 'text-gray-800'}
|
||||
>{entry.value}</span
|
||||
>
|
||||
</li>
|
||||
{/each}
|
||||
</ul>
|
||||
@@ -332,9 +341,9 @@
|
||||
No provider-specific configuration is stored for this profile.
|
||||
</p>
|
||||
<p class="mt-2 text-sm text-gray-500">
|
||||
For <span class="font-medium text-gray-700">OpenAI-compatible</span> profiles, edit the
|
||||
settings in the <span class="font-medium text-gray-700">Embedding Provider</span> form
|
||||
below. The built-in <span class="font-medium text-gray-700">Local Model</span> profile
|
||||
For <span class="font-medium text-gray-700">OpenAI-compatible</span> profiles, edit
|
||||
the settings in the <span class="font-medium text-gray-700">Embedding Provider</span>
|
||||
form below. The built-in <span class="font-medium text-gray-700">Local Model</span> profile
|
||||
does not currently expose extra configurable fields.
|
||||
</p>
|
||||
{/if}
|
||||
@@ -342,14 +351,17 @@
|
||||
</div>
|
||||
{:else}
|
||||
<div class="rounded-lg border border-amber-200 bg-amber-50 p-4 text-sm text-amber-800">
|
||||
Embeddings are currently disabled. Keyword search remains available, but no embedding profile is active.
|
||||
Embeddings are currently disabled. Keyword search remains available, but no embedding
|
||||
profile is active.
|
||||
</div>
|
||||
{/if}
|
||||
</div>
|
||||
|
||||
<div class="rounded-xl border border-gray-200 bg-white p-6">
|
||||
<h2 class="mb-1 text-base font-semibold text-gray-900">Profile Inventory</h2>
|
||||
<p class="mb-4 text-sm text-gray-500">Profiles stored in the database and available for activation.</p>
|
||||
<p class="mb-4 text-sm text-gray-500">
|
||||
Profiles stored in the database and available for activation.
|
||||
</p>
|
||||
<div class="grid grid-cols-2 gap-3">
|
||||
<StatBadge label="Profiles" value={String(currentSettings.profiles.length)} />
|
||||
<StatBadge label="Active" value={activeProfile ? '1' : '0'} />
|
||||
@@ -363,7 +375,9 @@
|
||||
<p class="text-gray-500">{profile.id}</p>
|
||||
</div>
|
||||
{#if profile.id === currentSettings.activeProfileId}
|
||||
<span class="rounded-full bg-blue-50 px-2 py-0.5 text-xs font-medium text-blue-700">Active</span>
|
||||
<span class="rounded-full bg-blue-50 px-2 py-0.5 text-xs font-medium text-blue-700"
|
||||
>Active</span
|
||||
>
|
||||
{/if}
|
||||
</div>
|
||||
</div>
|
||||
@@ -379,238 +393,234 @@
|
||||
</p>
|
||||
|
||||
<form class="space-y-4" onsubmit={handleSubmit}>
|
||||
<!-- Provider selector -->
|
||||
<div class="mb-4 flex gap-2">
|
||||
{#each ['none', 'openai', 'local'] as p (p)}
|
||||
<button
|
||||
type="button"
|
||||
onclick={() => {
|
||||
provider = p as 'none' | 'openai' | 'local';
|
||||
testStatus = 'idle';
|
||||
testError = null;
|
||||
}}
|
||||
class={[
|
||||
'rounded-lg px-4 py-2 text-sm',
|
||||
provider === p
|
||||
? 'bg-blue-600 text-white'
|
||||
: 'border border-gray-200 text-gray-700 hover:bg-gray-50'
|
||||
].join(' ')}
|
||||
>
|
||||
{p === 'none'
|
||||
? 'None (FTS5 only)'
|
||||
: p === 'openai'
|
||||
? 'OpenAI-compatible'
|
||||
: 'Local Model'}
|
||||
</button>
|
||||
{/each}
|
||||
<!-- Provider selector -->
|
||||
<div class="mb-4 flex gap-2">
|
||||
{#each ['none', 'openai', 'local'] as p (p)}
|
||||
<button
|
||||
type="button"
|
||||
onclick={() => {
|
||||
provider = p as 'none' | 'openai' | 'local';
|
||||
testStatus = 'idle';
|
||||
testError = null;
|
||||
}}
|
||||
class={[
|
||||
'rounded-lg px-4 py-2 text-sm',
|
||||
provider === p
|
||||
? 'bg-blue-600 text-white'
|
||||
: 'border border-gray-200 text-gray-700 hover:bg-gray-50'
|
||||
].join(' ')}
|
||||
>
|
||||
{p === 'none' ? 'None (FTS5 only)' : p === 'openai' ? 'OpenAI-compatible' : 'Local Model'}
|
||||
</button>
|
||||
{/each}
|
||||
</div>
|
||||
|
||||
<!-- None warning -->
|
||||
{#if provider === 'none'}
|
||||
<div class="rounded-lg border border-amber-200 bg-amber-50 p-3 text-sm text-amber-700">
|
||||
Search will use keyword matching only. Results may be less relevant for complex questions.
|
||||
</div>
|
||||
{/if}
|
||||
|
||||
<!-- None warning -->
|
||||
{#if provider === 'none'}
|
||||
<div class="rounded-lg border border-amber-200 bg-amber-50 p-3 text-sm text-amber-700">
|
||||
Search will use keyword matching only. Results may be less relevant for complex questions.
|
||||
</div>
|
||||
{/if}
|
||||
|
||||
<!-- OpenAI-compatible form -->
|
||||
{#if provider === 'openai'}
|
||||
<div class="space-y-3">
|
||||
<!-- Preset buttons -->
|
||||
<div class="flex flex-wrap gap-2">
|
||||
{#each PROVIDER_PRESETS as preset (preset.name)}
|
||||
<button
|
||||
type="button"
|
||||
onclick={() => applyPreset(preset)}
|
||||
class="rounded border border-gray-200 px-2.5 py-1 text-xs text-gray-600 hover:bg-gray-50"
|
||||
>
|
||||
{preset.name}
|
||||
</button>
|
||||
{/each}
|
||||
</div>
|
||||
|
||||
<label class="block" for="embedding-base-url">
|
||||
<span class="text-sm font-medium text-gray-700">Base URL</span>
|
||||
<input
|
||||
id="embedding-base-url"
|
||||
name="baseUrl"
|
||||
type="text"
|
||||
autocomplete="url"
|
||||
bind:value={baseUrl}
|
||||
class="mt-1 w-full rounded-lg border border-gray-300 px-3 py-2 text-sm focus:border-blue-500 focus:outline-none"
|
||||
/>
|
||||
</label>
|
||||
|
||||
<label class="block" for="embedding-api-key">
|
||||
<span class="text-sm font-medium text-gray-700">API Key</span>
|
||||
<input
|
||||
id="embedding-api-key"
|
||||
name="apiKey"
|
||||
type="password"
|
||||
autocomplete="off"
|
||||
bind:value={apiKey}
|
||||
placeholder="sk-…"
|
||||
class="mt-1 w-full rounded-lg border border-gray-300 px-3 py-2 text-sm focus:border-blue-500 focus:outline-none"
|
||||
/>
|
||||
</label>
|
||||
|
||||
<label class="block" for="embedding-model">
|
||||
<span class="text-sm font-medium text-gray-700">Model</span>
|
||||
<input
|
||||
id="embedding-model"
|
||||
name="model"
|
||||
type="text"
|
||||
autocomplete="off"
|
||||
bind:value={model}
|
||||
class="mt-1 w-full rounded-lg border border-gray-300 px-3 py-2 text-sm focus:border-blue-500 focus:outline-none"
|
||||
/>
|
||||
</label>
|
||||
|
||||
<label class="block" for="embedding-dimensions">
|
||||
<span class="text-sm font-medium text-gray-700">Dimensions (optional override)</span>
|
||||
<input
|
||||
id="embedding-dimensions"
|
||||
name="dimensions"
|
||||
type="number"
|
||||
inputmode="numeric"
|
||||
bind:value={dimensions}
|
||||
class="mt-1 w-full rounded-lg border border-gray-300 px-3 py-2 text-sm focus:border-blue-500 focus:outline-none"
|
||||
/>
|
||||
</label>
|
||||
|
||||
<!-- Test connection row -->
|
||||
<div class="flex items-center gap-3">
|
||||
<!-- OpenAI-compatible form -->
|
||||
{#if provider === 'openai'}
|
||||
<div class="space-y-3">
|
||||
<!-- Preset buttons -->
|
||||
<div class="flex flex-wrap gap-2">
|
||||
{#each PROVIDER_PRESETS as preset (preset.name)}
|
||||
<button
|
||||
type="button"
|
||||
onclick={testConnection}
|
||||
disabled={testStatus === 'testing'}
|
||||
class="rounded-lg border border-gray-300 px-3 py-1.5 text-sm hover:bg-gray-50 disabled:opacity-50"
|
||||
onclick={() => applyPreset(preset)}
|
||||
class="rounded border border-gray-200 px-2.5 py-1 text-xs text-gray-600 hover:bg-gray-50"
|
||||
>
|
||||
{testStatus === 'testing' ? 'Testing…' : 'Test Connection'}
|
||||
{preset.name}
|
||||
</button>
|
||||
|
||||
{#if testStatus === 'ok'}
|
||||
<span class="text-sm text-green-600">
|
||||
Connection successful
|
||||
{#if testDimensions}— {testDimensions} dimensions{/if}
|
||||
</span>
|
||||
{:else if testStatus === 'error'}
|
||||
<span class="text-sm text-red-600">
|
||||
{testError}
|
||||
</span>
|
||||
{/if}
|
||||
</div>
|
||||
</div>
|
||||
{/if}
|
||||
|
||||
<!-- Local model section -->
|
||||
{#if provider === 'local'}
|
||||
<div class="rounded-lg border border-gray-200 bg-gray-50 p-4 text-sm">
|
||||
<p class="font-medium text-gray-800">Local ONNX model via @xenova/transformers</p>
|
||||
<p class="mt-1 text-gray-500">Model: Xenova/all-MiniLM-L6-v2 · 384 dimensions</p>
|
||||
{#if getInitialLocalProviderAvailability()}
|
||||
<p class="mt-2 text-green-600">@xenova/transformers is installed and ready.</p>
|
||||
{:else}
|
||||
<p class="mt-2 text-amber-700">
|
||||
@xenova/transformers is not installed. Run
|
||||
<code class="rounded bg-amber-100 px-1 py-0.5 font-mono text-xs"
|
||||
>npm install @xenova/transformers</code
|
||||
>
|
||||
to enable local embeddings.
|
||||
</p>
|
||||
{/if}
|
||||
</div>
|
||||
{/if}
|
||||
|
||||
<!-- Indexing section -->
|
||||
<div class="space-y-3 rounded-lg border border-gray-200 bg-white p-4">
|
||||
<div>
|
||||
<label for="concurrency" class="block text-sm font-medium text-gray-700">
|
||||
Concurrent Workers
|
||||
</label>
|
||||
<p class="mt-0.5 text-xs text-gray-500">
|
||||
Number of parallel indexing workers. Range: 1 to 8.
|
||||
</p>
|
||||
{/each}
|
||||
</div>
|
||||
|
||||
<div class="flex items-center gap-3">
|
||||
<label class="block" for="embedding-base-url">
|
||||
<span class="text-sm font-medium text-gray-700">Base URL</span>
|
||||
<input
|
||||
id="concurrency"
|
||||
type="number"
|
||||
min="1"
|
||||
max="8"
|
||||
inputmode="numeric"
|
||||
bind:value={concurrencyInput}
|
||||
disabled={concurrencySaving}
|
||||
class="w-20 rounded-lg border border-gray-300 px-3 py-2 text-sm focus:border-blue-500 focus:outline-none disabled:opacity-50"
|
||||
id="embedding-base-url"
|
||||
name="baseUrl"
|
||||
type="text"
|
||||
autocomplete="url"
|
||||
bind:value={baseUrl}
|
||||
class="mt-1 w-full rounded-lg border border-gray-300 px-3 py-2 text-sm focus:border-blue-500 focus:outline-none"
|
||||
/>
|
||||
</label>
|
||||
|
||||
<label class="block" for="embedding-api-key">
|
||||
<span class="text-sm font-medium text-gray-700">API Key</span>
|
||||
<input
|
||||
id="embedding-api-key"
|
||||
name="apiKey"
|
||||
type="password"
|
||||
autocomplete="off"
|
||||
bind:value={apiKey}
|
||||
placeholder="sk-…"
|
||||
class="mt-1 w-full rounded-lg border border-gray-300 px-3 py-2 text-sm focus:border-blue-500 focus:outline-none"
|
||||
/>
|
||||
</label>
|
||||
|
||||
<label class="block" for="embedding-model">
|
||||
<span class="text-sm font-medium text-gray-700">Model</span>
|
||||
<input
|
||||
id="embedding-model"
|
||||
name="model"
|
||||
type="text"
|
||||
autocomplete="off"
|
||||
bind:value={model}
|
||||
class="mt-1 w-full rounded-lg border border-gray-300 px-3 py-2 text-sm focus:border-blue-500 focus:outline-none"
|
||||
/>
|
||||
</label>
|
||||
|
||||
<label class="block" for="embedding-dimensions">
|
||||
<span class="text-sm font-medium text-gray-700">Dimensions (optional override)</span>
|
||||
<input
|
||||
id="embedding-dimensions"
|
||||
name="dimensions"
|
||||
type="number"
|
||||
inputmode="numeric"
|
||||
bind:value={dimensions}
|
||||
class="mt-1 w-full rounded-lg border border-gray-300 px-3 py-2 text-sm focus:border-blue-500 focus:outline-none"
|
||||
/>
|
||||
</label>
|
||||
|
||||
<!-- Test connection row -->
|
||||
<div class="flex items-center gap-3">
|
||||
<button
|
||||
type="button"
|
||||
onclick={saveConcurrency}
|
||||
disabled={concurrencySaving}
|
||||
class="rounded-lg bg-blue-600 px-3 py-2 text-sm text-white hover:bg-blue-700 disabled:opacity-50"
|
||||
onclick={testConnection}
|
||||
disabled={testStatus === 'testing'}
|
||||
class="rounded-lg border border-gray-300 px-3 py-1.5 text-sm hover:bg-gray-50 disabled:opacity-50"
|
||||
>
|
||||
{concurrencySaving ? 'Saving…' : 'Save'}
|
||||
{testStatus === 'testing' ? 'Testing…' : 'Test Connection'}
|
||||
</button>
|
||||
|
||||
{#if concurrencySaveStatus === 'ok'}
|
||||
<span class="text-sm text-green-600">✓ Saved</span>
|
||||
{:else if concurrencySaveStatus === 'error'}
|
||||
<span class="text-sm text-red-600">{concurrencySaveError}</span>
|
||||
{#if testStatus === 'ok'}
|
||||
<span class="text-sm text-green-600">
|
||||
Connection successful
|
||||
{#if testDimensions}— {testDimensions} dimensions{/if}
|
||||
</span>
|
||||
{:else if testStatus === 'error'}
|
||||
<span class="text-sm text-red-600">
|
||||
{testError}
|
||||
</span>
|
||||
{/if}
|
||||
</div>
|
||||
</div>
|
||||
{/if}
|
||||
|
||||
<!-- Save feedback banners -->
|
||||
{#if saveStatus === 'ok'}
|
||||
<div
|
||||
class="mt-4 flex items-center gap-2 rounded-lg border border-green-200 bg-green-50 px-4 py-3 text-sm font-medium text-green-700"
|
||||
>
|
||||
<svg
|
||||
xmlns="http://www.w3.org/2000/svg"
|
||||
class="h-4 w-4 shrink-0"
|
||||
viewBox="0 0 20 20"
|
||||
fill="currentColor"
|
||||
aria-hidden="true"
|
||||
>
|
||||
<path
|
||||
fill-rule="evenodd"
|
||||
d="M16.707 5.293a1 1 0 010 1.414l-8 8a1 1 0 01-1.414 0l-4-4a1 1 0 011.414-1.414L8 12.586l7.293-7.293a1 1 0 011.414 0z"
|
||||
clip-rule="evenodd"
|
||||
/>
|
||||
</svg>
|
||||
Settings saved successfully.
|
||||
</div>
|
||||
{:else if saveStatus === 'error'}
|
||||
<div
|
||||
class="mt-4 flex items-center gap-2 rounded-lg border border-red-200 bg-red-50 px-4 py-3 text-sm font-medium text-red-700"
|
||||
>
|
||||
<svg
|
||||
xmlns="http://www.w3.org/2000/svg"
|
||||
class="h-4 w-4 shrink-0"
|
||||
viewBox="0 0 20 20"
|
||||
fill="currentColor"
|
||||
aria-hidden="true"
|
||||
>
|
||||
<path
|
||||
fill-rule="evenodd"
|
||||
d="M18 10a8 8 0 11-16 0 8 8 0 0116 0zm-7 4a1 1 0 11-2 0 1 1 0 012 0zm-1-9a1 1 0 00-1 1v4a1 1 0 102 0V6a1 1 0 00-1-1z"
|
||||
clip-rule="evenodd"
|
||||
/>
|
||||
</svg>
|
||||
{saveError}
|
||||
</div>
|
||||
{/if}
|
||||
|
||||
<!-- Save row -->
|
||||
<div class="mt-4 flex items-center justify-end">
|
||||
<button
|
||||
type="submit"
|
||||
disabled={saving}
|
||||
class="rounded-lg bg-blue-600 px-4 py-2 text-sm text-white hover:bg-blue-700 disabled:opacity-50"
|
||||
>
|
||||
{saving ? 'Saving…' : 'Save Settings'}
|
||||
</button>
|
||||
<!-- Local model section -->
|
||||
{#if provider === 'local'}
|
||||
<div class="rounded-lg border border-gray-200 bg-gray-50 p-4 text-sm">
|
||||
<p class="font-medium text-gray-800">Local ONNX model via @xenova/transformers</p>
|
||||
<p class="mt-1 text-gray-500">Model: Xenova/all-MiniLM-L6-v2 · 384 dimensions</p>
|
||||
{#if getInitialLocalProviderAvailability()}
|
||||
<p class="mt-2 text-green-600">@xenova/transformers is installed and ready.</p>
|
||||
{:else}
|
||||
<p class="mt-2 text-amber-700">
|
||||
@xenova/transformers is not installed. Run
|
||||
<code class="rounded bg-amber-100 px-1 py-0.5 font-mono text-xs"
|
||||
>npm install @xenova/transformers</code
|
||||
>
|
||||
to enable local embeddings.
|
||||
</p>
|
||||
{/if}
|
||||
</div>
|
||||
{/if}
|
||||
|
||||
<!-- Indexing section -->
|
||||
<div class="space-y-3 rounded-lg border border-gray-200 bg-white p-4">
|
||||
<div>
|
||||
<label for="concurrency" class="block text-sm font-medium text-gray-700">
|
||||
Concurrent Workers
|
||||
</label>
|
||||
<p class="mt-0.5 text-xs text-gray-500">
|
||||
Number of parallel indexing workers. Range: 1 to 8.
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<div class="flex items-center gap-3">
|
||||
<input
|
||||
id="concurrency"
|
||||
type="number"
|
||||
min="1"
|
||||
max="8"
|
||||
inputmode="numeric"
|
||||
bind:value={concurrencyInput}
|
||||
disabled={concurrencySaving}
|
||||
class="w-20 rounded-lg border border-gray-300 px-3 py-2 text-sm focus:border-blue-500 focus:outline-none disabled:opacity-50"
|
||||
/>
|
||||
<button
|
||||
type="button"
|
||||
onclick={saveConcurrency}
|
||||
disabled={concurrencySaving}
|
||||
class="rounded-lg bg-blue-600 px-3 py-2 text-sm text-white hover:bg-blue-700 disabled:opacity-50"
|
||||
>
|
||||
{concurrencySaving ? 'Saving…' : 'Save'}
|
||||
</button>
|
||||
|
||||
{#if concurrencySaveStatus === 'ok'}
|
||||
<span class="text-sm text-green-600">✓ Saved</span>
|
||||
{:else if concurrencySaveStatus === 'error'}
|
||||
<span class="text-sm text-red-600">{concurrencySaveError}</span>
|
||||
{/if}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Save feedback banners -->
|
||||
{#if saveStatus === 'ok'}
|
||||
<div
|
||||
class="mt-4 flex items-center gap-2 rounded-lg border border-green-200 bg-green-50 px-4 py-3 text-sm font-medium text-green-700"
|
||||
>
|
||||
<svg
|
||||
xmlns="http://www.w3.org/2000/svg"
|
||||
class="h-4 w-4 shrink-0"
|
||||
viewBox="0 0 20 20"
|
||||
fill="currentColor"
|
||||
aria-hidden="true"
|
||||
>
|
||||
<path
|
||||
fill-rule="evenodd"
|
||||
d="M16.707 5.293a1 1 0 010 1.414l-8 8a1 1 0 01-1.414 0l-4-4a1 1 0 011.414-1.414L8 12.586l7.293-7.293a1 1 0 011.414 0z"
|
||||
clip-rule="evenodd"
|
||||
/>
|
||||
</svg>
|
||||
Settings saved successfully.
|
||||
</div>
|
||||
{:else if saveStatus === 'error'}
|
||||
<div
|
||||
class="mt-4 flex items-center gap-2 rounded-lg border border-red-200 bg-red-50 px-4 py-3 text-sm font-medium text-red-700"
|
||||
>
|
||||
<svg
|
||||
xmlns="http://www.w3.org/2000/svg"
|
||||
class="h-4 w-4 shrink-0"
|
||||
viewBox="0 0 20 20"
|
||||
fill="currentColor"
|
||||
aria-hidden="true"
|
||||
>
|
||||
<path
|
||||
fill-rule="evenodd"
|
||||
d="M18 10a8 8 0 11-16 0 8 8 0 0116 0zm-7 4a1 1 0 11-2 0 1 1 0 012 0zm-1-9a1 1 0 00-1 1v4a1 1 0 102 0V6a1 1 0 00-1-1z"
|
||||
clip-rule="evenodd"
|
||||
/>
|
||||
</svg>
|
||||
{saveError}
|
||||
</div>
|
||||
{/if}
|
||||
|
||||
<!-- Save row -->
|
||||
<div class="mt-4 flex items-center justify-end">
|
||||
<button
|
||||
type="submit"
|
||||
disabled={saving}
|
||||
class="rounded-lg bg-blue-600 px-4 py-2 text-sm text-white hover:bg-blue-700 disabled:opacity-50"
|
||||
>
|
||||
{saving ? 'Saving…' : 'Save Settings'}
|
||||
</button>
|
||||
</div>
|
||||
</form>
|
||||
</div>
|
||||
|
||||
|
||||
Reference in New Issue
Block a user