feature/TRUEREF-0023_libsql_vector_search #2
@@ -215,7 +215,7 @@ For GitHub repositories, TrueRef fetches the file from the default branch root.
|
|||||||
### Fields
|
### Fields
|
||||||
|
|
||||||
| Field | Type | Required | Description |
|
| Field | Type | Required | Description |
|
||||||
|---|---|---|---|
|
| ------------------ | -------- | -------- | ------------------------------------------------------------------------------------------------- |
|
||||||
| `$schema` | string | No | URL to the live JSON Schema for editor validation |
|
| `$schema` | string | No | URL to the live JSON Schema for editor validation |
|
||||||
| `projectTitle` | string | No | Display name override (max 100 chars) |
|
| `projectTitle` | string | No | Display name override (max 100 chars) |
|
||||||
| `description` | string | No | Library description used for search ranking (10–500 chars) |
|
| `description` | string | No | Library description used for search ranking (10–500 chars) |
|
||||||
|
|||||||
@@ -335,3 +335,47 @@ Add subsequent research below this section.
|
|||||||
- Risks / follow-ups:
|
- Risks / follow-ups:
|
||||||
- Iteration 2 task decomposition must treat the current dirty code files from iterations 0 and 1 as the validation baseline, otherwise the executor will keep rediscovering pre-existing worktree drift instead of new task deltas.
|
- Iteration 2 task decomposition must treat the current dirty code files from iterations 0 and 1 as the validation baseline, otherwise the executor will keep rediscovering pre-existing worktree drift instead of new task deltas.
|
||||||
- The sqlite-vec bootstrap helper and the relational cleanup should be planned as one acceptance unit before any downstream vec0, worker-status, or admin-page tasks, because that is the smallest unit that removes the known broken intermediate state.
|
- The sqlite-vec bootstrap helper and the relational cleanup should be planned as one acceptance unit before any downstream vec0, worker-status, or admin-page tasks, because that is the smallest unit that removes the known broken intermediate state.
|
||||||
|
|
||||||
|
### 2026-04-01T00:00:00.000Z — TRUEREF-0023 iteration 3 navbar follow-up planning research
|
||||||
|
|
||||||
|
- Task: Plan the accepted follow-up request to add an admin route to the main navbar.
|
||||||
|
- Files inspected:
|
||||||
|
- `prompts/TRUEREF-0023/progress.yaml`
|
||||||
|
- `prompts/TRUEREF-0023/iteration_2/review_report.yaml`
|
||||||
|
- `prompts/TRUEREF-0023/prompt.yaml`
|
||||||
|
- `package.json`
|
||||||
|
- `src/routes/+layout.svelte`
|
||||||
|
- `src/routes/admin/jobs/+page.svelte`
|
||||||
|
- Findings:
|
||||||
|
- The accepted iteration-2 workspace is green: `review_report.yaml` records passing build, passing tests, and no workspace diagnostics, so this request is a narrow additive follow-up rather than a rework of the sqlite-vec/admin jobs implementation.
|
||||||
|
- The main navbar is defined entirely in `src/routes/+layout.svelte` and already uses base-aware SvelteKit navigation via `resolve as resolveRoute` from `$app/paths` for the existing `Repositories`, `Search`, and `Settings` links.
|
||||||
|
- The existing admin surface already lives at `src/routes/admin/jobs/+page.svelte`, which sets the page title to `Job Queue - TrueRef Admin`; adding a navbar entry can therefore target `/admin/jobs` directly without introducing new routes, loaders, or components.
|
||||||
|
- Repository findings from the earlier lint planning work already confirm the codebase expectation to avoid root-relative internal navigation in SvelteKit pages and components, so the new navbar link should follow the existing `resolveRoute('/...')` anchor pattern.
|
||||||
|
- No dedicated test file currently covers the shared navbar. The appropriate validation for this follow-up remains repository-level `npm run build` and `npm test` after the single layout edit.
|
||||||
|
- Risks / follow-ups:
|
||||||
|
- The follow-up navigation request should stay isolated to the shared layout so it does not reopen the accepted sqlite-vec implementation surface.
|
||||||
|
- Build and test validation remain the appropriate regression checks because no dedicated navbar test currently exists.
|
||||||
|
|
||||||
|
### 2026-04-01T12:05:23.000Z — TRUEREF-0023 iteration 5 tabs filter and bulk reprocess planning research
|
||||||
|
|
||||||
|
- Task: Plan the follow-up repo-detail UI change to filter version rows in the tabs/tags view and add a bulk action that reprocesses all errored tags without adding a new backend endpoint.
|
||||||
|
- Files inspected:
|
||||||
|
- `prompts/TRUEREF-0023/progress.yaml`
|
||||||
|
- `prompts/TRUEREF-0023/prompt.yaml`
|
||||||
|
- `prompts/TRUEREF-0023/iteration_2/plan.md`
|
||||||
|
- `prompts/TRUEREF-0023/iteration_2/tasks.yaml`
|
||||||
|
- `src/routes/repos/[id]/+page.svelte`
|
||||||
|
- `src/routes/api/v1/libs/[id]/versions/[tag]/index/+server.ts`
|
||||||
|
- `src/routes/api/v1/api-contract.integration.test.ts`
|
||||||
|
- `package.json`
|
||||||
|
- Findings:
|
||||||
|
- The relevant UI surface is entirely in `src/routes/repos/[id]/+page.svelte`; the page already loads `versions`, renders per-version state badges, and exposes per-tag `Index` and `Remove` buttons.
|
||||||
|
- Version states are concretely `pending`, `indexing`, `indexed`, and `error`, and the page already centralizes their labels and color classes in `stateLabels` and `stateColors`.
|
||||||
|
- Existing per-tag reprocessing is implemented by `handleIndexVersion(tag)`, which POSTs to `/api/v1/libs/:id/versions/:tag/index`; the corresponding backend route exists and returns a queued job DTO with status `202`.
|
||||||
|
- No bulk reprocess endpoint exists, so the lowest-risk implementation is a UI-only bulk action that iterates the existing per-tag route.
|
||||||
|
- The page already contains a bounded batching pattern in `handleRegisterSelected()` with `BATCH_SIZE = 5`, which provides a concrete local precedent for bulk tag operations without inventing a new concurrency model.
|
||||||
|
- There is no existing page-component or browser test targeting `src/routes/repos/[id]/+page.svelte`; nearby automated coverage is API-contract focused, so this iteration should rely on `npm run build` and `npm test` regression validation unless a developer discovers an existing Svelte page harness during implementation.
|
||||||
|
- Context7 lookup for Svelte and SvelteKit could not be completed in this environment because the configured API key is invalid; planning therefore relied on installed versions from `package.json` (`svelte` `^5.51.0`, `@sveltejs/kit` `^2.50.2`) and the live page patterns already present in the repository.
|
||||||
|
- Risks / follow-ups:
|
||||||
|
- Bulk reprocessing must avoid queuing duplicate jobs for tags already shown as `indexing` or already tracked in `activeVersionJobs`.
|
||||||
|
- Filter state should be implemented as local UI state only and must not disturb the existing `onMount(loadVersions)` fetch path or the SSE job-progress flow.
|
||||||
|
|||||||
@@ -47,7 +47,7 @@ Executed in `IndexingPipeline.run()` before the crawl, when the job has a `versi
|
|||||||
containing shell metacharacters).
|
containing shell metacharacters).
|
||||||
|
|
||||||
3. **Path partitioning**: The changed-file list is split into `changedPaths` (added + modified
|
3. **Path partitioning**: The changed-file list is split into `changedPaths` (added + modified
|
||||||
+ renamed-destination) and `deletedPaths`. `unchangedPaths` is derived as
|
- renamed-destination) and `deletedPaths`. `unchangedPaths` is derived as
|
||||||
`ancestorFilePaths − changedPaths − deletedPaths`.
|
`ancestorFilePaths − changedPaths − deletedPaths`.
|
||||||
|
|
||||||
4. **Guard**: Returns `null` when no indexed ancestor exists, when the ancestor has no indexed
|
4. **Guard**: Returns `null` when no indexed ancestor exists, when the ancestor has no indexed
|
||||||
@@ -75,7 +75,7 @@ matching files are returned. This minimises GitHub API requests and local I/O.
|
|||||||
## API Surface Changes
|
## API Surface Changes
|
||||||
|
|
||||||
| Symbol | Location | Change |
|
| Symbol | Location | Change |
|
||||||
|---|---|---|
|
| -------------------------------------- | ----------------------------------- | --------------------------------------------- |
|
||||||
| `buildDifferentialPlan` | `pipeline/differential-strategy.ts` | **New** — async function |
|
| `buildDifferentialPlan` | `pipeline/differential-strategy.ts` | **New** — async function |
|
||||||
| `DifferentialPlan` | `pipeline/differential-strategy.ts` | **New** — interface |
|
| `DifferentialPlan` | `pipeline/differential-strategy.ts` | **New** — interface |
|
||||||
| `findBestAncestorVersion` | `utils/tag-order.ts` | **New** — pure function |
|
| `findBestAncestorVersion` | `utils/tag-order.ts` | **New** — pure function |
|
||||||
|
|||||||
@@ -88,6 +88,7 @@ The UI currently polls `GET /api/v1/jobs?repositoryId=...` every 2 seconds. This
|
|||||||
#### Worker Thread lifecycle
|
#### Worker Thread lifecycle
|
||||||
|
|
||||||
Each worker is a long-lived `node:worker_threads` `Worker` instance that:
|
Each worker is a long-lived `node:worker_threads` `Worker` instance that:
|
||||||
|
|
||||||
1. Opens its own `better-sqlite3` connection to the same database file.
|
1. Opens its own `better-sqlite3` connection to the same database file.
|
||||||
2. Listens for `{ type: 'run', jobId }` messages from the main thread.
|
2. Listens for `{ type: 'run', jobId }` messages from the main thread.
|
||||||
3. Runs `IndexingPipeline.run(job)`, emitting `postMessage` progress events at each stage boundary and every N files.
|
3. Runs `IndexingPipeline.run(job)`, emitting `postMessage` progress events at each stage boundary and every N files.
|
||||||
@@ -120,12 +121,14 @@ Workers are kept alive across jobs. If a worker crashes (non-zero exit), the poo
|
|||||||
#### Parallelism and write contention
|
#### Parallelism and write contention
|
||||||
|
|
||||||
With WAL mode enabled (already the case), SQLite supports:
|
With WAL mode enabled (already the case), SQLite supports:
|
||||||
|
|
||||||
- **One concurrent writer** (the transaction lock)
|
- **One concurrent writer** (the transaction lock)
|
||||||
- **Many concurrent readers**
|
- **Many concurrent readers**
|
||||||
|
|
||||||
The `replaceSnippets` transaction for different repositories never contends — they write different rows. The `cloneFromAncestor` operation writes to the same tables but different `version_id` values, so WAL checkpoint logic keeps them non-overlapping at the page level.
|
The `replaceSnippets` transaction for different repositories never contends — they write different rows. The `cloneFromAncestor` operation writes to the same tables but different `version_id` values, so WAL checkpoint logic keeps them non-overlapping at the page level.
|
||||||
|
|
||||||
Two jobs on the **same repository** (e.g. `/my-lib/v1.0.0` and `/my-lib/v2.0.0`) can run in parallel because:
|
Two jobs on the **same repository** (e.g. `/my-lib/v1.0.0` and `/my-lib/v2.0.0`) can run in parallel because:
|
||||||
|
|
||||||
- Differential indexing (TRUEREF-0021) ensures `v2.0.0` reads from `v1.0.0`'s already-committed rows.
|
- Differential indexing (TRUEREF-0021) ensures `v2.0.0` reads from `v1.0.0`'s already-committed rows.
|
||||||
- The write transactions for each version touch disjoint `version_id` partitions.
|
- The write transactions for each version touch disjoint `version_id` partitions.
|
||||||
|
|
||||||
@@ -134,6 +137,7 @@ If write contention still occurs under parallel load, `busy_timeout = 5000` (alr
|
|||||||
#### Concurrency limit per repository
|
#### Concurrency limit per repository
|
||||||
|
|
||||||
To prevent a user from queuing 500 tags and overwhelming the worker pool, the pool enforces:
|
To prevent a user from queuing 500 tags and overwhelming the worker pool, the pool enforces:
|
||||||
|
|
||||||
- **Max 1 running job per repository** for the default branch (re-index).
|
- **Max 1 running job per repository** for the default branch (re-index).
|
||||||
- **Max `concurrency` total running jobs** across all repositories.
|
- **Max `concurrency` total running jobs** across all repositories.
|
||||||
- Version jobs for the same repository are serialised within the pool (the queue picks the oldest queued version job for a given repo only when no other version job for that repo is running).
|
- Version jobs for the same repository are serialised within the pool (the queue picks the oldest queued version job for a given repo only when no other version job for that repo is running).
|
||||||
@@ -183,11 +187,13 @@ interface ProgressMessage {
|
|||||||
```
|
```
|
||||||
|
|
||||||
Workers emit this message:
|
Workers emit this message:
|
||||||
|
|
||||||
- On every stage transition (crawl start, parse start, store start, embed start).
|
- On every stage transition (crawl start, parse start, store start, embed start).
|
||||||
- Every `PROGRESS_EMIT_EVERY = 10` files during the parse loop.
|
- Every `PROGRESS_EMIT_EVERY = 10` files during the parse loop.
|
||||||
- On job completion or failure.
|
- On job completion or failure.
|
||||||
|
|
||||||
The main thread receives these messages and does two things:
|
The main thread receives these messages and does two things:
|
||||||
|
|
||||||
1. Writes the update to `indexing_jobs` in SQLite (batched — one write per message, not per file).
|
1. Writes the update to `indexing_jobs` in SQLite (batched — one write per message, not per file).
|
||||||
2. Pushes the payload to any open SSE channels for that jobId.
|
2. Pushes the payload to any open SSE channels for that jobId.
|
||||||
|
|
||||||
@@ -198,6 +204,7 @@ The main thread receives these messages and does two things:
|
|||||||
### `GET /api/v1/jobs/:id/stream`
|
### `GET /api/v1/jobs/:id/stream`
|
||||||
|
|
||||||
Opens an SSE connection for a specific job. The server:
|
Opens an SSE connection for a specific job. The server:
|
||||||
|
|
||||||
1. Sends the current job state as the first event immediately (no initial lag).
|
1. Sends the current job state as the first event immediately (no initial lag).
|
||||||
2. Pushes `ProgressMessage` events as the worker emits them.
|
2. Pushes `ProgressMessage` events as the worker emits them.
|
||||||
3. Sends a final `event: done` or `event: failed` event, then closes the connection.
|
3. Sends a final `event: done` or `event: failed` event, then closes the connection.
|
||||||
@@ -363,7 +370,7 @@ The embedding stage must **not** run inside the same Worker Thread as the crawl/
|
|||||||
### Why a dedicated embedding worker
|
### Why a dedicated embedding worker
|
||||||
|
|
||||||
| Concern | Per-parse-worker model | Dedicated embedding worker |
|
| Concern | Per-parse-worker model | Dedicated embedding worker |
|
||||||
|---|---|---|
|
| ------------------ | ------------------------------------------------------ | -------------------------------------------------------------------------------- |
|
||||||
| Memory | N × ~100 MB (model weights + WASM heap) per worker | 1 × ~100 MB regardless of concurrency |
|
| Memory | N × ~100 MB (model weights + WASM heap) per worker | 1 × ~100 MB regardless of concurrency |
|
||||||
| Model warm-up | Paid once per worker spawn; cold starts slow | Paid once at server startup |
|
| Model warm-up | Paid once per worker spawn; cold starts slow | Paid once at server startup |
|
||||||
| Batch size | Each worker batches only its own job's snippets | All in-flight jobs queue to one worker → larger batches → higher WASM throughput |
|
| Batch size | Each worker batches only its own job's snippets | All in-flight jobs queue to one worker → larger batches → higher WASM throughput |
|
||||||
@@ -415,6 +422,7 @@ Instead, the existing `findSnippetIdsMissingEmbeddings` query is the handshake:
|
|||||||
5. Main thread routes this to the SSE broadcaster → UI updates the embedding progress slice.
|
5. Main thread routes this to the SSE broadcaster → UI updates the embedding progress slice.
|
||||||
|
|
||||||
This means:
|
This means:
|
||||||
|
|
||||||
- The embedding worker reads snippet text from the DB itself (no IPC serialisation of content).
|
- The embedding worker reads snippet text from the DB itself (no IPC serialisation of content).
|
||||||
- The model is loaded once, stays warm, and processes batches from all repositories in FIFO order.
|
- The model is loaded once, stays warm, and processes batches from all repositories in FIFO order.
|
||||||
- Parse workers are never blocked waiting for embeddings — they complete their job stages and exit immediately.
|
- Parse workers are never blocked waiting for embeddings — they complete their job stages and exit immediately.
|
||||||
|
|||||||
955
docs/features/TRUEREF-0023.md
Normal file
955
docs/features/TRUEREF-0023.md
Normal file
@@ -0,0 +1,955 @@
|
|||||||
|
# TRUEREF-0023 — libSQL Migration, Native Vector Search, Parallel Tag Indexing, and Performance Hardening
|
||||||
|
|
||||||
|
**Priority:** P1
|
||||||
|
**Status:** Draft
|
||||||
|
**Depends On:** TRUEREF-0001, TRUEREF-0022
|
||||||
|
**Blocks:** —
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
TrueRef currently uses `better-sqlite3` for all database access. This creates three compounding performance problems:
|
||||||
|
|
||||||
|
1. **Vector search does not scale.** `VectorSearch.vectorSearch()` loads the entire `snippet_embeddings` table for a repository into Node.js memory and computes cosine similarity in a JavaScript loop. A repository with 100k snippets at 1536 OpenAI dimensions allocates ~600 MB per query and ties up the worker thread for seconds before returning results.
|
||||||
|
2. **Missing composite indexes cause table scans on every query.** The schema defines FK columns used in every search and embedding filter, but declares zero composite or covering indexes on them. Every call to `searchSnippets`, `findSnippetIdsMissingEmbeddings`, and `cloneFromAncestor` performs full or near-full table scans.
|
||||||
|
3. **SQLite connection is under-configured.** Critical pragmas (`synchronous`, `cache_size`, `mmap_size`, `temp_store`) are absent, leaving significant I/O throughput on the table.
|
||||||
|
|
||||||
|
The solution is to replace `better-sqlite3` with `@libsql/better-sqlite3` — an embeddable, drop-in synchronous replacement that is a superset of the better-sqlite3 API and exposes libSQL's native vector index (`libsql_vector_idx`). Because the API is identical, no service layer or ORM code changes are needed beyond import statements and the vector search implementation.
|
||||||
|
|
||||||
|
Two additional structural improvements are delivered in the same feature:
|
||||||
|
|
||||||
|
4. **Per-repo job serialization is too coarse.** `WorkerPool` prevents any two jobs sharing the same `repositoryId` from running in parallel. This means indexing 200 tags of a single library is fully sequential — one tag at a time — even though different tags write to entirely disjoint row sets. The constraint should track `(repositoryId, versionId)` pairs instead.
|
||||||
|
5. **Write lock contention under parallel indexing.** When multiple parse workers flush parsed snippets simultaneously they all compete for the SQLite write lock, spending most of their time in `busy_timeout` back-off. A single dedicated write worker eliminates this: parse workers become pure CPU workers (crawl → parse → send batches over `postMessage`) and the write worker is the sole DB writer.
|
||||||
|
6. **Admin UI is unusable under load.** The job queue page has no status or repository filters, no worker status panel, no skeleton loading, uses blocking `alert()` / `confirm()` dialogs, and `IndexingProgress` still polls every 2 seconds instead of consuming the existing SSE stream.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Goals
|
||||||
|
|
||||||
|
1. Replace `better-sqlite3` with `@libsql/better-sqlite3` with minimal code churn — import paths only.
|
||||||
|
2. Add a libSQL vector index on `snippet_embeddings` so that KNN queries execute inside SQLite instead of in a JavaScript loop.
|
||||||
|
3. Add the six composite and covering indexes required by the hot query paths.
|
||||||
|
4. Tune the SQLite pragma configuration for I/O performance.
|
||||||
|
5. Eliminate the leading cause of OOM risk during semantic search.
|
||||||
|
6. Keep a single embedded database file — no external server, no network.
|
||||||
|
7. Allow multiple tags of the same repository to index in parallel (unrelated version rows, no write conflict).
|
||||||
|
8. Eliminate write-lock contention between parallel parse workers by introducing a single dedicated write worker.
|
||||||
|
9. Rebuild the admin jobs page with full filtering (status, repository, free-text), a live worker status panel, skeleton loading on initial fetch, per-action inline spinners, non-blocking toast notifications, and SSE-driven real-time updates throughout.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Non-Goals
|
||||||
|
|
||||||
|
- Migrating to the async `@libsql/client` package (HTTP/embedded-replica mode).
|
||||||
|
- Changing the Drizzle ORM adapter (`drizzle-orm/better-sqlite3` stays unchanged).
|
||||||
|
- Changing `drizzle.config.ts` dialect (`sqlite` is still correct for embedded libSQL).
|
||||||
|
- Adding hybrid/approximate indexing beyond the default HNSW strategy provided by `libsql_vector_idx`.
|
||||||
|
- Parallelizing embedding batches across providers (separate feature).
|
||||||
|
- Horizontally scaling across processes.
|
||||||
|
- Allowing more than one job for the exact same `(repositoryId, versionId)` pair to run concurrently (still serialized — duplicate detection in `JobQueue` is unchanged).
|
||||||
|
- A full admin authentication system (out of scope).
|
||||||
|
- Mobile-responsive redesign of the entire admin section (out of scope).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Problem Detail
|
||||||
|
|
||||||
|
### 1. Vector Search — Full Table Scan in JavaScript
|
||||||
|
|
||||||
|
**File:** `src/lib/server/search/vector.search.ts`
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Current: no LIMIT, loads ALL embeddings for repo into memory
|
||||||
|
const rows = this.db.prepare<unknown[], RawEmbeddingRow>(sql).all(...params);
|
||||||
|
|
||||||
|
const scored: VectorSearchResult[] = rows.map((row) => {
|
||||||
|
const embedding = new Float32Array(
|
||||||
|
row.embedding.buffer,
|
||||||
|
row.embedding.byteOffset,
|
||||||
|
row.embedding.byteLength / 4
|
||||||
|
);
|
||||||
|
return { snippetId: row.snippet_id, score: cosineSimilarity(queryEmbedding, embedding) };
|
||||||
|
});
|
||||||
|
|
||||||
|
return scored.sort((a, b) => b.score - a.score).slice(0, limit);
|
||||||
|
```
|
||||||
|
|
||||||
|
For a repo with N snippets and D dimensions, this allocates `N × D × 4` bytes per query. At N=100k and D=1536, that is ~600 MB allocated synchronously. The result is sorted entirely in JS before the top-k is returned. With a native vector index, SQLite returns only the top-k rows.
|
||||||
|
|
||||||
|
### 2. Missing Composite Indexes
|
||||||
|
|
||||||
|
The `snippets`, `documents`, and `snippet_embeddings` tables are queried with multi-column WHERE predicates in every hot path, but no composite indexes exist:
|
||||||
|
|
||||||
|
| Table | Filter columns | Used in |
|
||||||
|
| -------------------- | ----------------------------- | ---------------------------------------------- |
|
||||||
|
| `snippets` | `(repository_id, version_id)` | All search, diff, clone |
|
||||||
|
| `snippets` | `(repository_id, type)` | Type-filtered queries |
|
||||||
|
| `documents` | `(repository_id, version_id)` | Diff strategy, clone |
|
||||||
|
| `snippet_embeddings` | `(profile_id, snippet_id)` | `findSnippetIdsMissingEmbeddings` LEFT JOIN |
|
||||||
|
| `repositories` | `(state)` | `searchRepositories` WHERE `state = 'indexed'` |
|
||||||
|
| `indexing_jobs` | `(repository_id, status)` | Job status lookups |
|
||||||
|
|
||||||
|
Without these indexes, SQLite performs a B-tree scan of the primary key and filters rows in memory. On a 500k-row `snippets` table this is the dominant cost of every search.
|
||||||
|
|
||||||
|
### 4. Admin UI — Current Problems
|
||||||
|
|
||||||
|
**File:** `src/routes/admin/jobs/+page.svelte`, `src/lib/components/IndexingProgress.svelte`
|
||||||
|
|
||||||
|
| Problem | Location | Impact |
|
||||||
|
| -------------------------------------------------------------- | ----------------------------------------- | ------------------------------------------------------------ |
|
||||||
|
| `IndexingProgress` polls every 2 s via `setInterval` + `fetch` | `IndexingProgress.svelte` | Constant HTTP traffic; progress lags by up to 2 s |
|
||||||
|
| No status or repository filter controls | `admin/jobs/+page.svelte` | With 200 tag jobs, finding a specific one requires scrolling |
|
||||||
|
| No worker status panel | — (no endpoint exists) | Operator cannot see which workers are busy or idle |
|
||||||
|
| `alert()` for errors, `confirm()` for cancel | `admin/jobs/+page.svelte` — `showToast()` | Blocks the entire browser tab; unusable under parallel jobs |
|
||||||
|
| `actionInProgress` is a single string, not per-job | `admin/jobs/+page.svelte` | Pausing job A disables buttons on all other jobs |
|
||||||
|
| No skeleton loading — blank + spinner on first load | `admin/jobs/+page.svelte` | Layout shift; no structural preview while data loads |
|
||||||
|
| Hard-coded `limit=50` query, no pagination | `admin/jobs/+page.svelte:fetchJobs()` | Page truncates silently for large queues |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3. Under-configured SQLite Connection
|
||||||
|
|
||||||
|
**File:** `src/lib/server/db/client.ts` and `src/lib/server/db/index.ts`
|
||||||
|
|
||||||
|
Current pragmas:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
client.pragma('journal_mode = WAL');
|
||||||
|
client.pragma('foreign_keys = ON');
|
||||||
|
client.pragma('busy_timeout = 5000');
|
||||||
|
```
|
||||||
|
|
||||||
|
Missing:
|
||||||
|
|
||||||
|
- `synchronous = NORMAL` — halves fsync overhead vs the default FULL; safe with WAL
|
||||||
|
- `cache_size = -65536` — 64 MB page cache; default is 2 MB
|
||||||
|
- `temp_store = MEMORY` — temp tables and sort spills stay in RAM
|
||||||
|
- `mmap_size = 268435456` — 256 MB memory-mapped read path; bypasses system call overhead for reads
|
||||||
|
- `wal_autocheckpoint = 1000` — more frequent checkpoints prevent WAL growth
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
### Drop-In Replacement: `@libsql/better-sqlite3`
|
||||||
|
|
||||||
|
`@libsql/better-sqlite3` is published by Turso and implemented as a Node.js native addon wrapping the libSQL embedded engine. The exported class is API-compatible with `better-sqlite3`:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// before
|
||||||
|
import Database from 'better-sqlite3';
|
||||||
|
const db = new Database('/path/to/file.db');
|
||||||
|
db.pragma('journal_mode = WAL');
|
||||||
|
const rows = db.prepare('SELECT ...').all(...params);
|
||||||
|
|
||||||
|
// after — identical code
|
||||||
|
import Database from '@libsql/better-sqlite3';
|
||||||
|
const db = new Database('/path/to/file.db');
|
||||||
|
db.pragma('journal_mode = WAL');
|
||||||
|
const rows = db.prepare('SELECT ...').all(...params);
|
||||||
|
```
|
||||||
|
|
||||||
|
All of the following continue to work unchanged:
|
||||||
|
|
||||||
|
- `drizzle-orm/better-sqlite3` adapter and `migrate` helper
|
||||||
|
- `drizzle-kit` with `dialect: 'sqlite'`
|
||||||
|
- Prepared statements, transactions, WAL pragmas, foreign keys
|
||||||
|
- Worker thread per-thread connections (`worker-entry.ts`, `embed-worker-entry.ts`)
|
||||||
|
- All `type Database from 'better-sqlite3'` type imports (replaced in lock-step)
|
||||||
|
|
||||||
|
### Vector Index Design
|
||||||
|
|
||||||
|
libSQL provides `libsql_vector_idx()` — a virtual index type stored in a shadow table alongside the main table. Once indexed, KNN queries use a SQL `vector_top_k()` function:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- KNN: return top-k snippet IDs closest to the query vector
|
||||||
|
SELECT snippet_id
|
||||||
|
FROM vector_top_k('idx_snippet_embeddings_vec', vector_from_float32(?), ?)
|
||||||
|
```
|
||||||
|
|
||||||
|
`vector_from_float32(blob)` accepts the same raw little-endian Float32 bytes currently stored in the `embedding` blob column. **No data migration is needed** — the existing blob column can be re-indexed with `libsql_vector_idx` pointing at the bytes-stored column.
|
||||||
|
|
||||||
|
The index strategy:
|
||||||
|
|
||||||
|
1. Add a generated `vec_embedding` column of type `F32_BLOB(dimensions)` to `snippet_embeddings`, populated from the existing `embedding` blob via a migration trigger.
|
||||||
|
2. Create the vector index: `CREATE INDEX idx_snippet_embeddings_vec ON snippet_embeddings(vec_embedding) USING libsql_vector_idx(vec_embedding)`.
|
||||||
|
3. Rewrite `VectorSearch.vectorSearch()` to use `vector_top_k()` with a two-step join instead of the in-memory loop.
|
||||||
|
4. Update `EmbeddingService.embedSnippets()` to write `vec_embedding` on insert.
|
||||||
|
|
||||||
|
Dimensions are profile-specific. Because the index is per-column, a separate index is needed per embedding dimensionality. For v1, a single index covering the default profile's dimensions is sufficient; multi-profile KNN can be handled with a `WHERE profile_id = ?` pre-filter on the vector_top_k results.
|
||||||
|
|
||||||
|
### Updated Vector Search Query
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
vectorSearch(queryEmbedding: Float32Array, options: VectorSearchOptions): VectorSearchResult[] {
|
||||||
|
const { repositoryId, versionId, profileId = 'local-default', limit = 50 } = options;
|
||||||
|
|
||||||
|
// Encode query vector as raw bytes (same format as stored blobs)
|
||||||
|
const queryBytes = Buffer.from(queryEmbedding.buffer);
|
||||||
|
|
||||||
|
// Use libSQL vector_top_k for ANN — returns ordered (rowid, distance) pairs
|
||||||
|
let sql = `
|
||||||
|
SELECT se.snippet_id,
|
||||||
|
vector_distance_cos(se.vec_embedding, vector_from_float32(?)) AS score
|
||||||
|
FROM vector_top_k('idx_snippet_embeddings_vec', vector_from_float32(?), ?) AS knn
|
||||||
|
JOIN snippet_embeddings se ON se.rowid = knn.id
|
||||||
|
JOIN snippets s ON s.id = se.snippet_id
|
||||||
|
WHERE s.repository_id = ?
|
||||||
|
AND se.profile_id = ?
|
||||||
|
`;
|
||||||
|
const params: unknown[] = [queryBytes, queryBytes, limit * 4, repositoryId, profileId];
|
||||||
|
|
||||||
|
if (versionId) {
|
||||||
|
sql += ' AND s.version_id = ?';
|
||||||
|
params.push(versionId);
|
||||||
|
}
|
||||||
|
|
||||||
|
sql += ' ORDER BY score ASC LIMIT ?';
|
||||||
|
params.push(limit);
|
||||||
|
|
||||||
|
return this.db
|
||||||
|
.prepare<unknown[], { snippet_id: string; score: number }>(sql)
|
||||||
|
.all(...params)
|
||||||
|
.map((row) => ({ snippetId: row.snippet_id, score: 1 - row.score }));
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`vector_distance_cos` returns distance (0 = identical), so `1 - distance` gives a similarity score in [0, 1] matching the existing `VectorSearchResult.score` contract.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Plan
|
||||||
|
|
||||||
|
### Phase 1 — Package Swap (no logic changes)
|
||||||
|
|
||||||
|
**Files touched:** `package.json`, all `.ts` files that import `better-sqlite3`
|
||||||
|
|
||||||
|
1. In `package.json`:
|
||||||
|
- Remove `"better-sqlite3": "^12.6.2"` from `dependencies`
|
||||||
|
- Add `"@libsql/better-sqlite3": "^0.4.0"` to `dependencies`
|
||||||
|
- Remove `"@types/better-sqlite3": "^7.6.13"` from `devDependencies`
|
||||||
|
- `@libsql/better-sqlite3` ships its own TypeScript declarations
|
||||||
|
|
||||||
|
2. Replace all import statements (35 occurrences across 19 files):
|
||||||
|
|
||||||
|
| Old import | New import |
|
||||||
|
| --------------------------------------------------------------- | ---------------------------------------------------- |
|
||||||
|
| `import Database from 'better-sqlite3'` | `import Database from '@libsql/better-sqlite3'` |
|
||||||
|
| `import type Database from 'better-sqlite3'` | `import type Database from '@libsql/better-sqlite3'` |
|
||||||
|
| `import { drizzle } from 'drizzle-orm/better-sqlite3'` | unchanged |
|
||||||
|
| `import { migrate } from 'drizzle-orm/better-sqlite3/migrator'` | unchanged |
|
||||||
|
|
||||||
|
Affected production files:
|
||||||
|
- `src/lib/server/db/index.ts`
|
||||||
|
- `src/lib/server/db/client.ts`
|
||||||
|
- `src/lib/server/embeddings/embedding.service.ts`
|
||||||
|
- `src/lib/server/pipeline/indexing.pipeline.ts`
|
||||||
|
- `src/lib/server/pipeline/job-queue.ts`
|
||||||
|
- `src/lib/server/pipeline/startup.ts`
|
||||||
|
- `src/lib/server/pipeline/worker-entry.ts`
|
||||||
|
- `src/lib/server/pipeline/embed-worker-entry.ts`
|
||||||
|
- `src/lib/server/pipeline/differential-strategy.ts`
|
||||||
|
- `src/lib/server/search/vector.search.ts`
|
||||||
|
- `src/lib/server/search/hybrid.search.service.ts`
|
||||||
|
- `src/lib/server/search/search.service.ts`
|
||||||
|
- `src/lib/server/services/repository.service.ts`
|
||||||
|
- `src/lib/server/services/version.service.ts`
|
||||||
|
- `src/lib/server/services/embedding-settings.service.ts`
|
||||||
|
|
||||||
|
Affected test files (same mechanical replacement):
|
||||||
|
- `src/routes/api/v1/api-contract.integration.test.ts`
|
||||||
|
- `src/routes/api/v1/sse-and-settings.integration.test.ts`
|
||||||
|
- `src/routes/settings/page.server.test.ts`
|
||||||
|
- `src/lib/server/db/schema.test.ts`
|
||||||
|
- `src/lib/server/embeddings/embedding.service.test.ts`
|
||||||
|
- `src/lib/server/pipeline/indexing.pipeline.test.ts`
|
||||||
|
- `src/lib/server/pipeline/differential-strategy.test.ts`
|
||||||
|
- `src/lib/server/search/search.service.test.ts`
|
||||||
|
- `src/lib/server/search/hybrid.search.service.test.ts`
|
||||||
|
- `src/lib/server/services/repository.service.test.ts`
|
||||||
|
- `src/lib/server/services/version.service.test.ts`
|
||||||
|
- `src/routes/api/v1/settings/embedding/server.test.ts`
|
||||||
|
- `src/routes/api/v1/libs/[id]/index/server.test.ts`
|
||||||
|
- `src/routes/api/v1/libs/[id]/versions/discover/server.test.ts`
|
||||||
|
|
||||||
|
3. Run all tests — they should pass with zero logic changes: `npm test`
|
||||||
|
|
||||||
|
### Phase 2 — Pragma Hardening
|
||||||
|
|
||||||
|
**Files touched:** `src/lib/server/db/client.ts`, `src/lib/server/db/index.ts`
|
||||||
|
|
||||||
|
Add the following pragmas to both connection factories (raw client and `initializeDatabase()`):
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
client.pragma('synchronous = NORMAL');
|
||||||
|
client.pragma('cache_size = -65536'); // 64 MB
|
||||||
|
client.pragma('temp_store = MEMORY');
|
||||||
|
client.pragma('mmap_size = 268435456'); // 256 MB
|
||||||
|
client.pragma('wal_autocheckpoint = 1000');
|
||||||
|
```
|
||||||
|
|
||||||
|
Worker threads (`worker-entry.ts`, `embed-worker-entry.ts`) open their own connections — apply the same pragmas there.
|
||||||
|
|
||||||
|
### Phase 3 — Composite Indexes (Drizzle migration)
|
||||||
|
|
||||||
|
**Files touched:** `src/lib/server/db/schema.ts`, new migration SQL file
|
||||||
|
|
||||||
|
Add indexes in `schema.ts` using Drizzle's `index()` helper:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// snippets table
|
||||||
|
export const snippets = sqliteTable(
|
||||||
|
'snippets',
|
||||||
|
{
|
||||||
|
/* unchanged */
|
||||||
|
},
|
||||||
|
(t) => [
|
||||||
|
index('idx_snippets_repo_version').on(t.repositoryId, t.versionId),
|
||||||
|
index('idx_snippets_repo_type').on(t.repositoryId, t.type)
|
||||||
|
]
|
||||||
|
);
|
||||||
|
|
||||||
|
// documents table
|
||||||
|
export const documents = sqliteTable(
|
||||||
|
'documents',
|
||||||
|
{
|
||||||
|
/* unchanged */
|
||||||
|
},
|
||||||
|
(t) => [index('idx_documents_repo_version').on(t.repositoryId, t.versionId)]
|
||||||
|
);
|
||||||
|
|
||||||
|
// snippet_embeddings table
|
||||||
|
export const snippetEmbeddings = sqliteTable(
|
||||||
|
'snippet_embeddings',
|
||||||
|
{
|
||||||
|
/* unchanged */
|
||||||
|
},
|
||||||
|
(table) => [
|
||||||
|
primaryKey({ columns: [table.snippetId, table.profileId] }), // unchanged
|
||||||
|
index('idx_embeddings_profile').on(table.profileId, table.snippetId)
|
||||||
|
]
|
||||||
|
);
|
||||||
|
|
||||||
|
// repositories table
|
||||||
|
export const repositories = sqliteTable(
|
||||||
|
'repositories',
|
||||||
|
{
|
||||||
|
/* unchanged */
|
||||||
|
},
|
||||||
|
(t) => [index('idx_repositories_state').on(t.state)]
|
||||||
|
);
|
||||||
|
|
||||||
|
// indexing_jobs table
|
||||||
|
export const indexingJobs = sqliteTable(
|
||||||
|
'indexing_jobs',
|
||||||
|
{
|
||||||
|
/* unchanged */
|
||||||
|
},
|
||||||
|
(t) => [index('idx_jobs_repo_status').on(t.repositoryId, t.status)]
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
Generate and apply migration: `npm run db:generate && npm run db:migrate`
|
||||||
|
|
||||||
|
### Phase 4 — Vector Column and Index (Drizzle migration)
|
||||||
|
|
||||||
|
**Files touched:** `src/lib/server/db/schema.ts`, new migration SQL, `src/lib/server/search/vector.search.ts`, `src/lib/server/embeddings/embedding.service.ts`
|
||||||
|
|
||||||
|
#### 4a. Schema: add `vec_embedding` column
|
||||||
|
|
||||||
|
Add `vec_embedding` to `snippet_embeddings`. Drizzle does not have a `F32_BLOB` column type helper; use a raw SQL column:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
import { sql } from 'drizzle-orm';
|
||||||
|
import { customType } from 'drizzle-orm/sqlite-core';
|
||||||
|
|
||||||
|
const f32Blob = (name: string, dimensions: number) =>
|
||||||
|
customType<{ data: Buffer }>({
|
||||||
|
dataType() {
|
||||||
|
return `F32_BLOB(${dimensions})`;
|
||||||
|
}
|
||||||
|
})(name);
|
||||||
|
|
||||||
|
export const snippetEmbeddings = sqliteTable(
|
||||||
|
'snippet_embeddings',
|
||||||
|
{
|
||||||
|
snippetId: text('snippet_id')
|
||||||
|
.notNull()
|
||||||
|
.references(() => snippets.id, { onDelete: 'cascade' }),
|
||||||
|
profileId: text('profile_id')
|
||||||
|
.notNull()
|
||||||
|
.references(() => embeddingProfiles.id, { onDelete: 'cascade' }),
|
||||||
|
model: text('model').notNull(),
|
||||||
|
dimensions: integer('dimensions').notNull(),
|
||||||
|
embedding: blob('embedding').notNull(), // existing blob — kept for backward compat
|
||||||
|
vecEmbedding: f32Blob('vec_embedding', 1536), // libSQL vector column (nullable during migration fill)
|
||||||
|
createdAt: integer('created_at').notNull()
|
||||||
|
},
|
||||||
|
(table) => [
|
||||||
|
primaryKey({ columns: [table.snippetId, table.profileId] }),
|
||||||
|
index('idx_embeddings_profile').on(table.profileId, table.snippetId)
|
||||||
|
]
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
Because dimensionality is fixed per model, `F32_BLOB(1536)` covers OpenAI `text-embedding-3-small/large`. A follow-up can parameterize this per profile.
|
||||||
|
|
||||||
|
#### 4b. Migration SQL: populate `vec_embedding` from existing `embedding` blob and create the vector index
|
||||||
|
|
||||||
|
The vector index cannot be expressed in SQL DDL portable across Drizzle — it must be applied in the FTS-style custom SQL file (`src/lib/server/db/fts.sql` or an equivalent `vectors.sql`):
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Backfill vec_embedding from existing raw blob data
|
||||||
|
UPDATE snippet_embeddings
|
||||||
|
SET vec_embedding = vector_from_float32(embedding)
|
||||||
|
WHERE vec_embedding IS NULL AND embedding IS NOT NULL;
|
||||||
|
|
||||||
|
-- Create the HNSW vector index (libSQL extension syntax)
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_snippet_embeddings_vec
|
||||||
|
ON snippet_embeddings(vec_embedding)
|
||||||
|
USING libsql_vector_idx(vec_embedding, 'metric=cosine', 'compress_neighbors=float8', 'max_neighbors=20');
|
||||||
|
```
|
||||||
|
|
||||||
|
Add a call to this SQL in `initializeDatabase()` alongside the existing `fts.sql` execution:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
const vectorSql = readFileSync(join(__dirname, 'vectors.sql'), 'utf-8');
|
||||||
|
client.exec(vectorSql);
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 4c. Update `EmbeddingService.embedSnippets()`
|
||||||
|
|
||||||
|
When inserting a new embedding, write both the blob and the vec column:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
const insert = this.db.prepare<[string, string, string, number, Buffer, Buffer]>(`
|
||||||
|
INSERT OR REPLACE INTO snippet_embeddings
|
||||||
|
(snippet_id, profile_id, model, dimensions, embedding, vec_embedding, created_at)
|
||||||
|
VALUES (?, ?, ?, ?, ?, vector_from_float32(?), unixepoch())
|
||||||
|
`);
|
||||||
|
|
||||||
|
// inside the transaction:
|
||||||
|
insert.run(
|
||||||
|
snippet.id,
|
||||||
|
this.profileId,
|
||||||
|
embedding.model,
|
||||||
|
embedding.dimensions,
|
||||||
|
embeddingBuffer,
|
||||||
|
embeddingBuffer // same bytes — vector_from_float32() interprets them
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 4d. Rewrite `VectorSearch.vectorSearch()`
|
||||||
|
|
||||||
|
Replace the full-scan JS loop with `vector_top_k()`:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
vectorSearch(queryEmbedding: Float32Array, options: VectorSearchOptions): VectorSearchResult[] {
|
||||||
|
const { repositoryId, versionId, profileId = 'local-default', limit = 50 } = options;
|
||||||
|
|
||||||
|
const queryBytes = Buffer.from(queryEmbedding.buffer);
|
||||||
|
const candidatePool = limit * 4; // over-fetch for post-filter
|
||||||
|
|
||||||
|
let sql = `
|
||||||
|
SELECT se.snippet_id,
|
||||||
|
vector_distance_cos(se.vec_embedding, vector_from_float32(?)) AS distance
|
||||||
|
FROM vector_top_k('idx_snippet_embeddings_vec', vector_from_float32(?), ?) AS knn
|
||||||
|
JOIN snippet_embeddings se ON se.rowid = knn.id
|
||||||
|
JOIN snippets s ON s.id = se.snippet_id
|
||||||
|
WHERE s.repository_id = ?
|
||||||
|
AND se.profile_id = ?
|
||||||
|
`;
|
||||||
|
const params: unknown[] = [queryBytes, queryBytes, candidatePool, repositoryId, profileId];
|
||||||
|
|
||||||
|
if (versionId) {
|
||||||
|
sql += ' AND s.version_id = ?';
|
||||||
|
params.push(versionId);
|
||||||
|
}
|
||||||
|
|
||||||
|
sql += ' ORDER BY distance ASC LIMIT ?';
|
||||||
|
params.push(limit);
|
||||||
|
|
||||||
|
return this.db
|
||||||
|
.prepare<unknown[], { snippet_id: string; distance: number }>(sql)
|
||||||
|
.all(...params)
|
||||||
|
.map((row) => ({ snippetId: row.snippet_id, score: 1 - row.distance }));
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `score` contract is preserved (1 = identical, 0 = orthogonal). The `cosineSimilarity` helper function is no longer called at runtime but can be kept for unit tests.
|
||||||
|
|
||||||
|
### Phase 5 — Per-Job Serialization Key Fix
|
||||||
|
|
||||||
|
**Files touched:** `src/lib/server/pipeline/worker-pool.ts`
|
||||||
|
|
||||||
|
The current serialization guard uses a bare `repositoryId`:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// current
|
||||||
|
private runningRepoIds = new Set<string>();
|
||||||
|
// blocks any job whose repositoryId is already in the set
|
||||||
|
const jobIdx = this.jobQueue.findIndex((j) => !this.runningRepoIds.has(j.repositoryId));
|
||||||
|
```
|
||||||
|
|
||||||
|
Different tags of the same repository write to completely disjoint rows (`version_id`-partitioned documents, snippets, and embeddings). The only genuine conflict is two jobs for the same `(repositoryId, versionId)` pair, which `JobQueue.enqueue()` already prevents via the `status IN ('queued', 'running')` deduplication check.
|
||||||
|
|
||||||
|
Change the guard to key on the compound pair:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// replace Set<string> with Set<string> keyed on compound pair
|
||||||
|
private runningJobKeys = new Set<string>();
|
||||||
|
|
||||||
|
private jobKey(repositoryId: string, versionId?: string | null): string {
|
||||||
|
return `${repositoryId}|${versionId ?? ''}`;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Update all four sites that read/write `runningRepoIds`:
|
||||||
|
|
||||||
|
| Location | Old | New |
|
||||||
|
| ------------------------------------ | ----------------------------------------------------- | ---------------------------------------------------------------------------------------- |
|
||||||
|
| `dispatch()` find | `!this.runningRepoIds.has(j.repositoryId)` | `!this.runningJobKeys.has(this.jobKey(j.repositoryId, j.versionId))` |
|
||||||
|
| `dispatch()` add | `this.runningRepoIds.add(job.repositoryId)` | `this.runningJobKeys.add(this.jobKey(job.repositoryId, job.versionId))` |
|
||||||
|
| `onWorkerMessage` done/failed delete | `this.runningRepoIds.delete(runningJob.repositoryId)` | `this.runningJobKeys.delete(this.jobKey(runningJob.repositoryId, runningJob.versionId))` |
|
||||||
|
| `onWorkerExit` delete | same | same |
|
||||||
|
|
||||||
|
The `QueuedJob` and `RunningJob` interfaces already carry `versionId` — no type changes needed.
|
||||||
|
|
||||||
|
The only serialized case that remains is `versionId = null` (default-branch re-index) paired with itself, which maps to the stable key `"repositoryId|"` — correctly deduplicated.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 6 — Dedicated Write Worker (Single-Writer Pattern)
|
||||||
|
|
||||||
|
**Files touched:** `src/lib/server/pipeline/worker-types.ts`, `src/lib/server/pipeline/write-worker-entry.ts` (new), `src/lib/server/pipeline/worker-entry.ts`, `src/lib/server/pipeline/worker-pool.ts`
|
||||||
|
|
||||||
|
#### Motivation
|
||||||
|
|
||||||
|
With Phase 5 in place, N tags of the same library can index in parallel. Each parse worker currently opens its own DB connection and holds the write lock while storing parsed snippets. Under N concurrent writers, each worker spends the majority of its wall-clock time waiting in `busy_timeout` back-off. The fix is the single-writer pattern: one dedicated write worker owns the only writable DB connection; parse workers become stateless CPU workers that send write batches over `postMessage`.
|
||||||
|
|
||||||
|
```
|
||||||
|
Parse Worker 1 ──┐ WriteRequest (docs[], snippets[]) ┌── WriteAck
|
||||||
|
Parse Worker 2 ──┼─────────────────────────────────────► Write Worker (sole DB writer)
|
||||||
|
Parse Worker N ──┘ └── single better-sqlite3 connection
|
||||||
|
```
|
||||||
|
|
||||||
|
#### New message types (`worker-types.ts`)
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
export interface WriteRequest {
|
||||||
|
type: 'write';
|
||||||
|
jobId: string;
|
||||||
|
documents: SerializedDocument[];
|
||||||
|
snippets: SerializedSnippet[];
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface WriteAck {
|
||||||
|
type: 'write_ack';
|
||||||
|
jobId: string;
|
||||||
|
documentCount: number;
|
||||||
|
snippetCount: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface WriteError {
|
||||||
|
type: 'write_error';
|
||||||
|
jobId: string;
|
||||||
|
error: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
// SerializedDocument / SerializedSnippet mirror the DB column shapes
|
||||||
|
// (plain objects, safe to transfer via structured clone)
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Write worker (`write-worker-entry.ts`)
|
||||||
|
|
||||||
|
The write worker:
|
||||||
|
|
||||||
|
- Opens its own `Database` connection (WAL mode, all pragmas from Phase 2)
|
||||||
|
- Listens for `WriteRequest` messages
|
||||||
|
- Wraps each batch in a single transaction
|
||||||
|
- Posts `WriteAck` or `WriteError` back to the parent, which forwards the ack to the originating parse worker by `jobId`
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
import Database from '@libsql/better-sqlite3';
|
||||||
|
import { workerData, parentPort } from 'node:worker_threads';
|
||||||
|
import type { WriteRequest, WriteAck, WriteError } from './worker-types.js';
|
||||||
|
|
||||||
|
const db = new Database((workerData as WorkerInitData).dbPath);
|
||||||
|
db.pragma('journal_mode = WAL');
|
||||||
|
db.pragma('synchronous = NORMAL');
|
||||||
|
db.pragma('cache_size = -65536');
|
||||||
|
db.pragma('foreign_keys = ON');
|
||||||
|
|
||||||
|
const insertDoc = db.prepare(`INSERT OR REPLACE INTO documents (...) VALUES (...)`);
|
||||||
|
const insertSnippet = db.prepare(`INSERT OR REPLACE INTO snippets (...) VALUES (...)`);
|
||||||
|
|
||||||
|
const writeBatch = db.transaction((req: WriteRequest) => {
|
||||||
|
for (const doc of req.documents) insertDoc.run(doc);
|
||||||
|
for (const snip of req.snippets) insertSnippet.run(snip);
|
||||||
|
});
|
||||||
|
|
||||||
|
parentPort!.on('message', (req: WriteRequest) => {
|
||||||
|
try {
|
||||||
|
writeBatch(req);
|
||||||
|
const ack: WriteAck = {
|
||||||
|
type: 'write_ack',
|
||||||
|
jobId: req.jobId,
|
||||||
|
documentCount: req.documents.length,
|
||||||
|
snippetCount: req.snippets.length
|
||||||
|
};
|
||||||
|
parentPort!.postMessage(ack);
|
||||||
|
} catch (err) {
|
||||||
|
const fail: WriteError = { type: 'write_error', jobId: req.jobId, error: String(err) };
|
||||||
|
parentPort!.postMessage(fail);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Parse worker changes (`worker-entry.ts`)
|
||||||
|
|
||||||
|
Parse workers lose their DB connection. `IndexingPipeline` receives a `sendWrite` callback instead of a `db` instance. After parsing each file batch, the worker calls `sendWrite({ type: 'write', jobId, documents, snippets })` and awaits the `WriteAck` before continuing. This keeps back-pressure: a slow write worker naturally throttles the parse workers without additional semaphores.
|
||||||
|
|
||||||
|
#### WorkerPool changes
|
||||||
|
|
||||||
|
- Spawn one write worker at startup (always, regardless of embedding config)
|
||||||
|
- Route incoming `write_ack` / `write_error` messages to the correct waiting parse worker via a `Map<jobId, resolve>` promise registry
|
||||||
|
- The write worker is separate from the embed worker — embed writes (`snippet_embeddings`) can still go through the write worker by adding an `EmbedWriteRequest` message type, or remain in the embed worker since embedding runs after parsing completes (no lock contention with active parse jobs)
|
||||||
|
|
||||||
|
#### Conflict analysis with Phase 5
|
||||||
|
|
||||||
|
Phases 5 and 6 compose cleanly:
|
||||||
|
|
||||||
|
- Phase 5 allows multiple `(repo, versionId)` jobs to run concurrently
|
||||||
|
- Phase 6 ensures all those concurrent jobs share a single write path — contention is eliminated by design
|
||||||
|
- The write worker is stateless with respect to job identity; it just executes batches in arrival order within a FIFO message queue (Node.js `postMessage` is ordered)
|
||||||
|
- The embed worker remains a separate process (it runs after parse completes, so it never overlaps with active parse writes for the same job)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 7 — Admin UI Overhaul
|
||||||
|
|
||||||
|
**Files touched:**
|
||||||
|
|
||||||
|
- `src/routes/admin/jobs/+page.svelte` — rebuilt
|
||||||
|
- `src/routes/api/v1/workers/+server.ts` — new endpoint
|
||||||
|
- `src/lib/components/admin/JobStatusBadge.svelte` — extend with spinner variant
|
||||||
|
- `src/lib/components/admin/JobSkeleton.svelte` — new
|
||||||
|
- `src/lib/components/admin/WorkerStatusPanel.svelte` — new
|
||||||
|
- `src/lib/components/admin/Toast.svelte` — new
|
||||||
|
- `src/lib/components/IndexingProgress.svelte` — switch to SSE
|
||||||
|
|
||||||
|
#### 7a. New API endpoint: `GET /api/v1/workers`
|
||||||
|
|
||||||
|
The `WorkerPool` singleton tracks running jobs in `runningJobs: Map<Worker, RunningJob>` and idle workers in `idleWorkers: Worker[]`. Expose this state as a lightweight REST snapshot:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// GET /api/v1/workers
|
||||||
|
// Response shape:
|
||||||
|
interface WorkersResponse {
|
||||||
|
concurrency: number; // configured max workers
|
||||||
|
active: number; // workers with a running job
|
||||||
|
idle: number; // workers waiting for work
|
||||||
|
workers: WorkerStatus[]; // one entry per spawned parse worker
|
||||||
|
}
|
||||||
|
|
||||||
|
interface WorkerStatus {
|
||||||
|
index: number; // worker slot (0-based)
|
||||||
|
state: 'idle' | 'running'; // current state
|
||||||
|
jobId: string | null; // null when idle
|
||||||
|
repositoryId: string | null;
|
||||||
|
versionId: string | null;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The route handler calls `getPool().getStatus()` — add a `getStatus(): WorkersResponse` method to `WorkerPool` that reads `runningJobs` and `idleWorkers` without any DB call. This is read-only and runs on the main thread.
|
||||||
|
|
||||||
|
The SSE stream at `/api/v1/jobs/stream` should emit a new `worker-status` event type whenever a worker transitions idle ↔ running (on `dispatch()` and job completion). This allows the worker panel to update in real-time without polling the REST endpoint.
|
||||||
|
|
||||||
|
#### 7b. `GET /api/v1/jobs` — add `repositoryId` free-text and multi-status filter
|
||||||
|
|
||||||
|
The existing endpoint already accepts `repositoryId` (exact match) and `status` (single value). Extend:
|
||||||
|
|
||||||
|
- `repositoryId` to also support prefix match (e.g. `?repositoryId=/facebook` returns all `/facebook/*` repos)
|
||||||
|
- `status` to accept comma-separated values: `?status=queued,running`
|
||||||
|
- `page` and `pageSize` query params (default pageSize=50, max 200) in addition to `limit` for backwards compat
|
||||||
|
|
||||||
|
Return `{ jobs, total, page, pageSize }` with `total` always reflecting the unfiltered-by-page count.
|
||||||
|
|
||||||
|
#### 7c. New component: `JobSkeleton.svelte`
|
||||||
|
|
||||||
|
A set of skeleton rows matching the job table structure. Shown during the initial fetch before any data arrives. Uses Tailwind `animate-pulse`:
|
||||||
|
|
||||||
|
```svelte
|
||||||
|
<!-- renders N skeleton rows -->
|
||||||
|
<script lang="ts">
|
||||||
|
let { rows = 5 }: { rows?: number } = $props();
|
||||||
|
</script>
|
||||||
|
|
||||||
|
{#each Array(rows) as _, i (i)}
|
||||||
|
<tr>
|
||||||
|
<td class="px-6 py-4">
|
||||||
|
<div class="h-4 w-48 animate-pulse rounded bg-gray-200"></div>
|
||||||
|
<div class="mt-1 h-3 w-24 animate-pulse rounded bg-gray-100"></div>
|
||||||
|
</td>
|
||||||
|
<td class="px-6 py-4">
|
||||||
|
<div class="h-5 w-16 animate-pulse rounded-full bg-gray-200"></div>
|
||||||
|
</td>
|
||||||
|
<td class="px-6 py-4">
|
||||||
|
<div class="h-4 w-20 animate-pulse rounded bg-gray-200"></div>
|
||||||
|
</td>
|
||||||
|
<td class="px-6 py-4">
|
||||||
|
<div class="h-2 w-32 animate-pulse rounded-full bg-gray-200"></div>
|
||||||
|
</td>
|
||||||
|
<td class="px-6 py-4">
|
||||||
|
<div class="h-4 w-28 animate-pulse rounded bg-gray-200"></div>
|
||||||
|
</td>
|
||||||
|
<td class="px-6 py-4 text-right">
|
||||||
|
<div class="ml-auto h-7 w-20 animate-pulse rounded bg-gray-200"></div>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
{/each}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 7d. New component: `Toast.svelte`
|
||||||
|
|
||||||
|
Replaces all `alert()` / `console.log()` calls in the jobs page. Renders a fixed-position stack in the bottom-right corner. Each toast auto-dismisses after 4 seconds and can be manually closed:
|
||||||
|
|
||||||
|
```svelte
|
||||||
|
<!-- Usage: bind a toasts array and call push({ message, type }) -->
|
||||||
|
<script lang="ts">
|
||||||
|
export interface ToastItem {
|
||||||
|
id: string;
|
||||||
|
message: string;
|
||||||
|
type: 'success' | 'error' | 'info';
|
||||||
|
}
|
||||||
|
|
||||||
|
let { toasts = $bindable([]) }: { toasts: ToastItem[] } = $props();
|
||||||
|
|
||||||
|
function dismiss(id: string) {
|
||||||
|
toasts = toasts.filter((t) => t.id !== id);
|
||||||
|
}
|
||||||
|
</script>
|
||||||
|
|
||||||
|
<div class="fixed right-4 bottom-4 z-50 flex flex-col gap-2">
|
||||||
|
{#each toasts as toast (toast.id)}
|
||||||
|
<!-- color by type, close button, auto-dismiss via onmount timer -->
|
||||||
|
{/each}
|
||||||
|
</div>
|
||||||
|
```
|
||||||
|
|
||||||
|
The jobs page replaces `showToast()` with pushing onto the bound `toasts` array. The `confirm()` for cancel is replaced with an inline confirmation state per job (`pendingCancelId`) that shows "Confirm cancel?" / "Yes" / "No" buttons inside the row.
|
||||||
|
|
||||||
|
#### 7e. New component: `WorkerStatusPanel.svelte`
|
||||||
|
|
||||||
|
A compact panel displayed above the job table showing the worker pool health. Subscribes to the `worker-status` SSE events and falls back to polling `GET /api/v1/workers` every 5 s on SSE error:
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────┐
|
||||||
|
│ Workers [2 / 4 active] ████░░░░ 50% │
|
||||||
|
│ Worker 0 ● running /facebook/react / v18.3.0 │
|
||||||
|
│ Worker 1 ● running /facebook/react / v17.0.2 │
|
||||||
|
│ Worker 2 ○ idle │
|
||||||
|
│ Worker 3 ○ idle │
|
||||||
|
└─────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
Each worker row shows: slot index, status dot (animated green pulse for running), repository ID, version tag, and a link to the job row in the table below.
|
||||||
|
|
||||||
|
#### 7f. Filter bar on the jobs page
|
||||||
|
|
||||||
|
Add a filter strip between the page header and the table:
|
||||||
|
|
||||||
|
```
|
||||||
|
[ Repository: _______________ ] [ Status: ▾ all ] [ 🔍 Apply ] [ ↺ Reset ]
|
||||||
|
```
|
||||||
|
|
||||||
|
- **Repository field**: free-text input, matches `repositoryId` prefix (e.g. `/facebook` shows all `/facebook/*`)
|
||||||
|
- **Status dropdown**: multi-select checkboxes for `queued`, `running`, `paused`, `cancelled`, `done`, `failed`; default = all
|
||||||
|
- Filters are applied client-side against the loaded `jobs` array for instant feedback, and also re-fetched from the API on Apply to get the correct total count
|
||||||
|
- Filter state is mirrored to URL search params (`?repo=...&status=...`) so the view is bookmarkable and survives refresh
|
||||||
|
|
||||||
|
#### 7g. Per-job action spinner and disabled state
|
||||||
|
|
||||||
|
Replace the single `actionInProgress: string | null` with a `Map<string, 'pausing' | 'resuming' | 'cancelling'>`:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
let actionInProgress = $state(new Map<string, 'pausing' | 'resuming' | 'cancelling'>());
|
||||||
|
```
|
||||||
|
|
||||||
|
Each action button shows an inline spinner (small `animate-spin` circle) and is disabled only for that row. Other rows remain fully interactive during the action. On completion the entry is deleted from the map.
|
||||||
|
|
||||||
|
#### 7h. `IndexingProgress.svelte` — switch from polling to SSE
|
||||||
|
|
||||||
|
The component currently uses `setInterval + fetch` at 2 s. Replace with the per-job SSE stream already available at `/api/v1/jobs/{id}/stream`:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// replace the $effect body
|
||||||
|
$effect(() => {
|
||||||
|
job = null;
|
||||||
|
const es = new EventSource(`/api/v1/jobs/${jobId}/stream`);
|
||||||
|
|
||||||
|
es.addEventListener('job-progress', (event) => {
|
||||||
|
const data = JSON.parse(event.data);
|
||||||
|
job = { ...job, ...data };
|
||||||
|
});
|
||||||
|
|
||||||
|
es.addEventListener('job-done', () => {
|
||||||
|
void fetch(`/api/v1/jobs/${jobId}`)
|
||||||
|
.then((r) => r.json())
|
||||||
|
.then((d) => {
|
||||||
|
job = d.job;
|
||||||
|
oncomplete?.();
|
||||||
|
});
|
||||||
|
es.close();
|
||||||
|
});
|
||||||
|
|
||||||
|
es.addEventListener('job-failed', (event) => {
|
||||||
|
const data = JSON.parse(event.data);
|
||||||
|
job = { ...job, status: 'failed', error: data.error };
|
||||||
|
oncomplete?.();
|
||||||
|
es.close();
|
||||||
|
});
|
||||||
|
|
||||||
|
es.onerror = () => {
|
||||||
|
// on SSE failure fall back to a single fetch to get current state
|
||||||
|
es.close();
|
||||||
|
void fetch(`/api/v1/jobs/${jobId}`)
|
||||||
|
.then((r) => r.json())
|
||||||
|
.then((d) => {
|
||||||
|
job = d.job;
|
||||||
|
});
|
||||||
|
};
|
||||||
|
|
||||||
|
return () => es.close();
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
This reduces network traffic from 1 request/2 s to zero requests during active indexing — updates arrive as server-push events.
|
||||||
|
|
||||||
|
#### 7i. Pagination on the jobs page
|
||||||
|
|
||||||
|
Replace the hard-coded `?limit=50` fetch with paginated requests:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
let currentPage = $state(1);
|
||||||
|
const PAGE_SIZE = 50;
|
||||||
|
|
||||||
|
async function fetchJobs() {
|
||||||
|
const params = new URLSearchParams({
|
||||||
|
page: String(currentPage),
|
||||||
|
pageSize: String(PAGE_SIZE),
|
||||||
|
...(filterRepo ? { repositoryId: filterRepo } : {}),
|
||||||
|
...(filterStatuses.length ? { status: filterStatuses.join(',') } : {})
|
||||||
|
});
|
||||||
|
const data = await fetch(`/api/v1/jobs?${params}`).then((r) => r.json());
|
||||||
|
jobs = data.jobs;
|
||||||
|
total = data.total;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Render a simple `« Prev Page N of M Next »` control below the table, hidden when `total <= PAGE_SIZE`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Acceptance Criteria
|
||||||
|
|
||||||
|
- [ ] `npm install` with `@libsql/better-sqlite3` succeeds; `better-sqlite3` is absent from `node_modules`
|
||||||
|
- [ ] All existing unit and integration tests pass after Phase 1 import swap
|
||||||
|
- [ ] `npm run db:migrate` applies the composite index migration cleanly against an existing database
|
||||||
|
- [ ] `npm run db:migrate` applies the vector column migration cleanly; `sql> SELECT vec_embedding FROM snippet_embeddings LIMIT 1` returns a non-NULL value for any previously-embedded snippet
|
||||||
|
- [ ] `GET /api/v1/context?libraryId=...&query=...` with a semantic-mode or hybrid-mode request returns results in ≤ 200 ms on a repository with 50k+ snippets (vs previous multi-second response)
|
||||||
|
- [ ] Memory profiled during a /context request shows no allocation spike proportional to repository size
|
||||||
|
- [ ] `EXPLAIN QUERY PLAN` on the `snippets` search query shows `SCAN snippets USING INDEX idx_snippets_repo_version` instead of `SCAN snippets`
|
||||||
|
- [ ] Worker threads (`worker-entry.ts`, `embed-worker-entry.ts`) start and complete an indexing job successfully after the package swap
|
||||||
|
- [ ] `drizzle-kit studio` connects and browses the migrated database
|
||||||
|
- [ ] Re-indexing a repository after the migration correctly populates `vec_embedding` on all new snippets
|
||||||
|
- [ ] `cosineSimilarity` unit tests still pass (function is kept)
|
||||||
|
- [ ] Starting two indexing jobs for different tags of the same repository simultaneously results in both jobs reaching `running` state concurrently (not one waiting for the other)
|
||||||
|
- [ ] Starting two indexing jobs for the **same** `(repositoryId, versionId)` pair returns the existing job (deduplication unchanged)
|
||||||
|
- [ ] With 4 parse workers and 4 concurrent tag jobs, zero `SQLITE_BUSY` errors appear in logs
|
||||||
|
- [ ] Write worker is present in the process list during active indexing (`worker_threads` inspector shows `write-worker-entry`)
|
||||||
|
- [ ] A `WriteError` from the write worker marks the originating job as `failed` with the error message propagated to the SSE stream
|
||||||
|
- [ ] `GET /api/v1/workers` returns a `WorkersResponse` JSON object with correct `active`, `idle`, and `workers[]` fields while jobs are in-flight
|
||||||
|
- [ ] The `worker-status` SSE event is emitted by `/api/v1/jobs/stream` whenever a worker transitions state
|
||||||
|
- [ ] The admin jobs page shows skeleton rows (not a blank screen) during the initial `fetchJobs()` call
|
||||||
|
- [ ] No `alert()` or `confirm()` calls exist in `admin/jobs/+page.svelte` after this change; all notifications go through `Toast.svelte`
|
||||||
|
- [ ] Pausing job A while job B is also in progress does not disable job B's action buttons
|
||||||
|
- [ ] The status filter multi-select correctly restricts the visible job list; the URL updates to reflect the filter state
|
||||||
|
- [ ] The repository prefix filter `?repositoryId=/facebook` returns all jobs whose `repositoryId` starts with `/facebook`
|
||||||
|
- [ ] Paginating past page 1 fetches the next batch from the API, not from the client-side array
|
||||||
|
- [ ] `IndexingProgress.svelte` has no `setInterval` call; it uses `EventSource` for progress updates
|
||||||
|
- [ ] The `WorkerStatusPanel` shows the correct number of running workers live during a multi-tag indexing run
|
||||||
|
- [ ] Refreshing the jobs page with `?repo=/facebook/react&status=running` pre-populates the filters and fetches with those params
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Migration Safety
|
||||||
|
|
||||||
|
### Backward Compatibility
|
||||||
|
|
||||||
|
The `embedding` blob column is kept. The `vec_embedding` column is nullable during the backfill window and becomes populated as:
|
||||||
|
|
||||||
|
1. The `UPDATE` in `vectors.sql` fills all existing rows on startup
|
||||||
|
2. New embeddings populate it at insert time
|
||||||
|
|
||||||
|
If `vec_embedding IS NULL` for a row (e.g., a row inserted before the migration runs), the vector index silently omits that row from results. The fallback in `HybridSearchService` to FTS-only mode still applies when no embeddings exist, so degraded-but-correct behavior is preserved.
|
||||||
|
|
||||||
|
### Rollback
|
||||||
|
|
||||||
|
Rollback before Phase 4 (vector column): remove `@libsql/better-sqlite3`, restore `better-sqlite3`, restore imports. No schema changes have been made.
|
||||||
|
|
||||||
|
Rollback after Phase 4: schema now has `vec_embedding` column. Drop the column with a migration reversal and restore imports. The `embedding` blob is intact throughout — no data loss.
|
||||||
|
|
||||||
|
### SQLite File Compatibility
|
||||||
|
|
||||||
|
libSQL embedded mode reads and writes standard SQLite 3 files. The WAL file, page size, and encoding are unchanged. An existing production database opened with `@libsql/better-sqlite3` is fully readable and writable. The vector index is stored in a shadow table `idx_snippet_embeddings_vec_shadow` which better-sqlite3 would ignore if rolled back (it is a regular table with a special name).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
| Package | Action | Reason |
|
||||||
|
| ------------------------ | ----------------------------- | ----------------------------------------------- |
|
||||||
|
| `better-sqlite3` | Remove from `dependencies` | Replaced |
|
||||||
|
| `@types/better-sqlite3` | Remove from `devDependencies` | `@libsql/better-sqlite3` ships own types |
|
||||||
|
| `@libsql/better-sqlite3` | Add to `dependencies` | Drop-in libSQL node addon |
|
||||||
|
| `drizzle-orm` | No change | `better-sqlite3` adapter works unchanged |
|
||||||
|
| `drizzle-kit` | No change | `dialect: 'sqlite'` correct for embedded libSQL |
|
||||||
|
|
||||||
|
No new runtime dependencies beyond the package replacement.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Testing Strategy
|
||||||
|
|
||||||
|
### Unit Tests
|
||||||
|
|
||||||
|
- `src/lib/server/search/vector.search.ts`: add test asserting KNN results are correct for a seeded 3-vector table; verify memory is not proportional to table size (mock `db.prepare` to assert no unbounded `.all()` is called)
|
||||||
|
- `src/lib/server/embeddings/embedding.service.ts`: existing tests cover insert round-trips; verify `vec_embedding` column is non-NULL after `embedSnippets()`
|
||||||
|
|
||||||
|
### Integration Tests
|
||||||
|
|
||||||
|
- `api-contract.integration.test.ts`: existing tests already use `new Database(':memory:')` — these continue to work with `@libsql/better-sqlite3` because the in-memory path is identical
|
||||||
|
- Add one test to `api-contract.integration.test.ts`: seed a repository + multiple embeddings, call `/api/v1/context` in semantic mode, assert non-empty results and response time < 500ms on in-memory DB
|
||||||
|
|
||||||
|
### UI Tests
|
||||||
|
|
||||||
|
- `src/routes/admin/jobs/+page.svelte`: add Vitest browser tests (Playwright) verifying:
|
||||||
|
- Skeleton rows appear before the first fetch resolves (mock `fetch` to delay 200 ms)
|
||||||
|
- Status filter restricts displayed rows; URL param updates
|
||||||
|
- Pausing job A leaves job B's buttons enabled
|
||||||
|
- Toast appears and auto-dismisses on successful pause
|
||||||
|
- Cancel confirm flow shows inline confirmation, not `window.confirm`
|
||||||
|
- `src/lib/components/IndexingProgress.svelte`: unit test that no `setInterval` is created; verify `EventSource` is opened with the correct URL
|
||||||
|
|
||||||
|
### Performance Regression Gate
|
||||||
|
|
||||||
|
Add a benchmark script `scripts/bench-vector-search.mjs` that:
|
||||||
|
|
||||||
|
1. Creates an in-memory libSQL database
|
||||||
|
2. Seeds 10000 snippet embeddings (random Float32Array, 1536 dims)
|
||||||
|
3. Runs 100 `vectorSearch()` calls
|
||||||
|
4. Asserts p99 < 50 ms
|
||||||
|
|
||||||
|
This gates the CI check on Phase 4 correctness and speed.
|
||||||
@@ -8,7 +8,7 @@ const entries = [
|
|||||||
];
|
];
|
||||||
|
|
||||||
try {
|
try {
|
||||||
const existing = entries.filter(e => existsSync(e));
|
const existing = entries.filter((e) => existsSync(e));
|
||||||
if (existing.length === 0) {
|
if (existing.length === 0) {
|
||||||
console.log('[build-workers] No worker entry files found yet, skipping.');
|
console.log('[build-workers] No worker entry files found yet, skipping.');
|
||||||
process.exit(0);
|
process.exit(0);
|
||||||
@@ -23,7 +23,7 @@ try {
|
|||||||
outdir: 'build/workers',
|
outdir: 'build/workers',
|
||||||
outExtension: { '.js': '.mjs' },
|
outExtension: { '.js': '.mjs' },
|
||||||
alias: {
|
alias: {
|
||||||
'$lib': './src/lib',
|
$lib: './src/lib',
|
||||||
'$lib/server': './src/lib/server'
|
'$lib/server': './src/lib/server'
|
||||||
},
|
},
|
||||||
external: ['better-sqlite3', '@xenova/transformers'],
|
external: ['better-sqlite3', '@xenova/transformers'],
|
||||||
|
|||||||
@@ -33,9 +33,10 @@ try {
|
|||||||
try {
|
try {
|
||||||
const db = getClient();
|
const db = getClient();
|
||||||
const activeProfileRow = db
|
const activeProfileRow = db
|
||||||
.prepare<[], EmbeddingProfileEntityProps>(
|
.prepare<
|
||||||
'SELECT * FROM embedding_profiles WHERE is_default = 1 AND enabled = 1 LIMIT 1'
|
[],
|
||||||
)
|
EmbeddingProfileEntityProps
|
||||||
|
>('SELECT * FROM embedding_profiles WHERE is_default = 1 AND enabled = 1 LIMIT 1')
|
||||||
.get();
|
.get();
|
||||||
|
|
||||||
let embeddingService: EmbeddingService | null = null;
|
let embeddingService: EmbeddingService | null = null;
|
||||||
@@ -55,9 +56,10 @@ try {
|
|||||||
let concurrency = 2; // default
|
let concurrency = 2; // default
|
||||||
if (dbPath) {
|
if (dbPath) {
|
||||||
const concurrencyRow = db
|
const concurrencyRow = db
|
||||||
.prepare<[], { value: string }>(
|
.prepare<
|
||||||
"SELECT value FROM settings WHERE key = 'indexing.concurrency' LIMIT 1"
|
[],
|
||||||
)
|
{ value: string }
|
||||||
|
>("SELECT value FROM settings WHERE key = 'indexing.concurrency' LIMIT 1")
|
||||||
.get();
|
.get();
|
||||||
if (concurrencyRow) {
|
if (concurrencyRow) {
|
||||||
try {
|
try {
|
||||||
|
|||||||
@@ -16,21 +16,29 @@
|
|||||||
|
|
||||||
es.addEventListener('job-done', () => {
|
es.addEventListener('job-done', () => {
|
||||||
void fetch(`/api/v1/jobs/${jobId}`)
|
void fetch(`/api/v1/jobs/${jobId}`)
|
||||||
.then(r => r.json())
|
.then((r) => r.json())
|
||||||
.then(d => { job = d.job; oncomplete?.(); });
|
.then((d) => {
|
||||||
|
job = d.job;
|
||||||
|
oncomplete?.();
|
||||||
|
});
|
||||||
es.close();
|
es.close();
|
||||||
});
|
});
|
||||||
|
|
||||||
es.addEventListener('job-failed', (event) => {
|
es.addEventListener('job-failed', (event) => {
|
||||||
const data = JSON.parse(event.data);
|
const data = JSON.parse(event.data);
|
||||||
if (job) job = { ...job, status: 'failed', error: data.error ?? 'Unknown error' } as IndexingJob;
|
if (job)
|
||||||
|
job = { ...job, status: 'failed', error: data.error ?? 'Unknown error' } as IndexingJob;
|
||||||
oncomplete?.();
|
oncomplete?.();
|
||||||
es.close();
|
es.close();
|
||||||
});
|
});
|
||||||
|
|
||||||
es.onerror = () => {
|
es.onerror = () => {
|
||||||
es.close();
|
es.close();
|
||||||
void fetch(`/api/v1/jobs/${jobId}`).then(r => r.json()).then(d => { job = d.job; });
|
void fetch(`/api/v1/jobs/${jobId}`)
|
||||||
|
.then((r) => r.json())
|
||||||
|
.then((d) => {
|
||||||
|
job = d.job;
|
||||||
|
});
|
||||||
};
|
};
|
||||||
|
|
||||||
return () => es.close();
|
return () => es.close();
|
||||||
|
|||||||
@@ -1,8 +1,9 @@
|
|||||||
<script lang="ts">
|
<script lang="ts">
|
||||||
let { rows = 5 }: { rows?: number } = $props();
|
let { rows = 5 }: { rows?: number } = $props();
|
||||||
|
const rowIndexes = $derived(Array.from({ length: rows }, (_, index) => index));
|
||||||
</script>
|
</script>
|
||||||
|
|
||||||
{#each Array(rows) as _, i (i)}
|
{#each rowIndexes as i (i)}
|
||||||
<tr>
|
<tr>
|
||||||
<td class="px-6 py-4">
|
<td class="px-6 py-4">
|
||||||
<div class="h-4 w-48 animate-pulse rounded bg-gray-200"></div>
|
<div class="h-4 w-48 animate-pulse rounded bg-gray-200"></div>
|
||||||
|
|||||||
@@ -1,5 +1,6 @@
|
|||||||
<script lang="ts">
|
<script lang="ts">
|
||||||
import { onDestroy } from 'svelte';
|
import { onDestroy } from 'svelte';
|
||||||
|
import { SvelteMap } from 'svelte/reactivity';
|
||||||
|
|
||||||
export interface ToastItem {
|
export interface ToastItem {
|
||||||
id: string;
|
id: string;
|
||||||
@@ -8,7 +9,7 @@
|
|||||||
}
|
}
|
||||||
|
|
||||||
let { toasts = $bindable([]) }: { toasts: ToastItem[] } = $props();
|
let { toasts = $bindable([]) }: { toasts: ToastItem[] } = $props();
|
||||||
const timers = new Map<string, ReturnType<typeof setTimeout>>();
|
const timers = new SvelteMap<string, ReturnType<typeof setTimeout>>();
|
||||||
|
|
||||||
$effect(() => {
|
$effect(() => {
|
||||||
for (const toast of toasts) {
|
for (const toast of toasts) {
|
||||||
@@ -70,8 +71,7 @@
|
|||||||
class="ml-2 text-xs opacity-70 hover:opacity-100"
|
class="ml-2 text-xs opacity-70 hover:opacity-100"
|
||||||
>
|
>
|
||||||
x
|
x
|
||||||
</button
|
</button>
|
||||||
>
|
|
||||||
</div>
|
</div>
|
||||||
{/each}
|
{/each}
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
@@ -10,9 +10,7 @@ import { GitHubApiError } from './github-tags.js';
|
|||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
|
|
||||||
function mockFetch(status: number, body: unknown): void {
|
function mockFetch(status: number, body: unknown): void {
|
||||||
vi.spyOn(global, 'fetch').mockResolvedValueOnce(
|
vi.spyOn(global, 'fetch').mockResolvedValueOnce(new Response(JSON.stringify(body), { status }));
|
||||||
new Response(JSON.stringify(body), { status })
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
beforeEach(() => {
|
beforeEach(() => {
|
||||||
@@ -105,9 +103,9 @@ describe('fetchGitHubChangedFiles', () => {
|
|||||||
|
|
||||||
it('throws GitHubApiError on 422 unprocessable entity', async () => {
|
it('throws GitHubApiError on 422 unprocessable entity', async () => {
|
||||||
mockFetch(422, { message: 'Unprocessable Entity' });
|
mockFetch(422, { message: 'Unprocessable Entity' });
|
||||||
await expect(
|
await expect(fetchGitHubChangedFiles('owner', 'repo', 'bad-ref', 'v1.1.0')).rejects.toThrow(
|
||||||
fetchGitHubChangedFiles('owner', 'repo', 'bad-ref', 'v1.1.0')
|
GitHubApiError
|
||||||
).rejects.toThrow(GitHubApiError);
|
);
|
||||||
});
|
});
|
||||||
|
|
||||||
it('returns empty array when files property is missing', async () => {
|
it('returns empty array when files property is missing', async () => {
|
||||||
@@ -141,7 +139,9 @@ describe('fetchGitHubChangedFiles', () => {
|
|||||||
});
|
});
|
||||||
|
|
||||||
it('sends Authorization header when token is provided', async () => {
|
it('sends Authorization header when token is provided', async () => {
|
||||||
const fetchSpy = vi.spyOn(global, 'fetch').mockResolvedValueOnce(
|
const fetchSpy = vi
|
||||||
|
.spyOn(global, 'fetch')
|
||||||
|
.mockResolvedValueOnce(
|
||||||
new Response(JSON.stringify({ status: 'ahead', files: [] }), { status: 200 })
|
new Response(JSON.stringify({ status: 'ahead', files: [] }), { status: 200 })
|
||||||
);
|
);
|
||||||
await fetchGitHubChangedFiles('owner', 'repo', 'v1.0.0', 'v1.1.0', 'my-token');
|
await fetchGitHubChangedFiles('owner', 'repo', 'v1.0.0', 'v1.1.0', 'my-token');
|
||||||
@@ -151,7 +151,9 @@ describe('fetchGitHubChangedFiles', () => {
|
|||||||
});
|
});
|
||||||
|
|
||||||
it('does not send Authorization header when no token provided', async () => {
|
it('does not send Authorization header when no token provided', async () => {
|
||||||
const fetchSpy = vi.spyOn(global, 'fetch').mockResolvedValueOnce(
|
const fetchSpy = vi
|
||||||
|
.spyOn(global, 'fetch')
|
||||||
|
.mockResolvedValueOnce(
|
||||||
new Response(JSON.stringify({ status: 'ahead', files: [] }), { status: 200 })
|
new Response(JSON.stringify({ status: 'ahead', files: [] }), { status: 200 })
|
||||||
);
|
);
|
||||||
await fetchGitHubChangedFiles('owner', 'repo', 'v1.0.0', 'v1.1.0');
|
await fetchGitHubChangedFiles('owner', 'repo', 'v1.0.0', 'v1.1.0');
|
||||||
|
|||||||
@@ -4,6 +4,7 @@
|
|||||||
*/
|
*/
|
||||||
import Database from 'better-sqlite3';
|
import Database from 'better-sqlite3';
|
||||||
import { env } from '$env/dynamic/private';
|
import { env } from '$env/dynamic/private';
|
||||||
|
import { applySqlitePragmas } from './connection';
|
||||||
import { loadSqliteVec } from './sqlite-vec';
|
import { loadSqliteVec } from './sqlite-vec';
|
||||||
|
|
||||||
let _client: Database.Database | null = null;
|
let _client: Database.Database | null = null;
|
||||||
@@ -12,14 +13,7 @@ export function getClient(): Database.Database {
|
|||||||
if (!_client) {
|
if (!_client) {
|
||||||
if (!env.DATABASE_URL) throw new Error('DATABASE_URL is not set');
|
if (!env.DATABASE_URL) throw new Error('DATABASE_URL is not set');
|
||||||
_client = new Database(env.DATABASE_URL);
|
_client = new Database(env.DATABASE_URL);
|
||||||
_client.pragma('journal_mode = WAL');
|
applySqlitePragmas(_client);
|
||||||
_client.pragma('foreign_keys = ON');
|
|
||||||
_client.pragma('busy_timeout = 5000');
|
|
||||||
_client.pragma('synchronous = NORMAL');
|
|
||||||
_client.pragma('cache_size = -65536');
|
|
||||||
_client.pragma('temp_store = MEMORY');
|
|
||||||
_client.pragma('mmap_size = 268435456');
|
|
||||||
_client.pragma('wal_autocheckpoint = 1000');
|
|
||||||
loadSqliteVec(_client);
|
loadSqliteVec(_client);
|
||||||
}
|
}
|
||||||
return _client;
|
return _client;
|
||||||
|
|||||||
14
src/lib/server/db/connection.ts
Normal file
14
src/lib/server/db/connection.ts
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
import type Database from 'better-sqlite3';
|
||||||
|
|
||||||
|
export const SQLITE_BUSY_TIMEOUT_MS = 30000;
|
||||||
|
|
||||||
|
export function applySqlitePragmas(db: Database.Database): void {
|
||||||
|
db.pragma('journal_mode = WAL');
|
||||||
|
db.pragma('foreign_keys = ON');
|
||||||
|
db.pragma(`busy_timeout = ${SQLITE_BUSY_TIMEOUT_MS}`);
|
||||||
|
db.pragma('synchronous = NORMAL');
|
||||||
|
db.pragma('cache_size = -65536');
|
||||||
|
db.pragma('temp_store = MEMORY');
|
||||||
|
db.pragma('mmap_size = 268435456');
|
||||||
|
db.pragma('wal_autocheckpoint = 1000');
|
||||||
|
}
|
||||||
@@ -5,6 +5,7 @@ import { readFileSync } from 'node:fs';
|
|||||||
import { fileURLToPath } from 'node:url';
|
import { fileURLToPath } from 'node:url';
|
||||||
import { join, dirname } from 'node:path';
|
import { join, dirname } from 'node:path';
|
||||||
import * as schema from './schema';
|
import * as schema from './schema';
|
||||||
|
import { applySqlitePragmas } from './connection';
|
||||||
import { loadSqliteVec } from './sqlite-vec';
|
import { loadSqliteVec } from './sqlite-vec';
|
||||||
import { env } from '$env/dynamic/private';
|
import { env } from '$env/dynamic/private';
|
||||||
|
|
||||||
@@ -12,19 +13,7 @@ if (!env.DATABASE_URL) throw new Error('DATABASE_URL is not set');
|
|||||||
|
|
||||||
const client = new Database(env.DATABASE_URL);
|
const client = new Database(env.DATABASE_URL);
|
||||||
|
|
||||||
// Enable WAL mode for better concurrent read performance.
|
applySqlitePragmas(client);
|
||||||
client.pragma('journal_mode = WAL');
|
|
||||||
// Enforce foreign key constraints.
|
|
||||||
client.pragma('foreign_keys = ON');
|
|
||||||
// Wait up to 5 s when the DB is locked instead of failing immediately.
|
|
||||||
// Prevents SQLITE_BUSY errors when the indexing pipeline holds the write lock
|
|
||||||
// and an HTTP request arrives simultaneously.
|
|
||||||
client.pragma('busy_timeout = 5000');
|
|
||||||
client.pragma('synchronous = NORMAL');
|
|
||||||
client.pragma('cache_size = -65536');
|
|
||||||
client.pragma('temp_store = MEMORY');
|
|
||||||
client.pragma('mmap_size = 268435456');
|
|
||||||
client.pragma('wal_autocheckpoint = 1000');
|
|
||||||
loadSqliteVec(client);
|
loadSqliteVec(client);
|
||||||
|
|
||||||
export const db = drizzle(client, { schema });
|
export const db = drizzle(client, { schema });
|
||||||
|
|||||||
@@ -78,12 +78,8 @@
|
|||||||
"name": "documents_repository_id_repositories_id_fk",
|
"name": "documents_repository_id_repositories_id_fk",
|
||||||
"tableFrom": "documents",
|
"tableFrom": "documents",
|
||||||
"tableTo": "repositories",
|
"tableTo": "repositories",
|
||||||
"columnsFrom": [
|
"columnsFrom": ["repository_id"],
|
||||||
"repository_id"
|
"columnsTo": ["id"],
|
||||||
],
|
|
||||||
"columnsTo": [
|
|
||||||
"id"
|
|
||||||
],
|
|
||||||
"onDelete": "cascade",
|
"onDelete": "cascade",
|
||||||
"onUpdate": "no action"
|
"onUpdate": "no action"
|
||||||
},
|
},
|
||||||
@@ -91,12 +87,8 @@
|
|||||||
"name": "documents_version_id_repository_versions_id_fk",
|
"name": "documents_version_id_repository_versions_id_fk",
|
||||||
"tableFrom": "documents",
|
"tableFrom": "documents",
|
||||||
"tableTo": "repository_versions",
|
"tableTo": "repository_versions",
|
||||||
"columnsFrom": [
|
"columnsFrom": ["version_id"],
|
||||||
"version_id"
|
"columnsTo": ["id"],
|
||||||
],
|
|
||||||
"columnsTo": [
|
|
||||||
"id"
|
|
||||||
],
|
|
||||||
"onDelete": "cascade",
|
"onDelete": "cascade",
|
||||||
"onUpdate": "no action"
|
"onUpdate": "no action"
|
||||||
}
|
}
|
||||||
@@ -293,12 +285,8 @@
|
|||||||
"name": "indexing_jobs_repository_id_repositories_id_fk",
|
"name": "indexing_jobs_repository_id_repositories_id_fk",
|
||||||
"tableFrom": "indexing_jobs",
|
"tableFrom": "indexing_jobs",
|
||||||
"tableTo": "repositories",
|
"tableTo": "repositories",
|
||||||
"columnsFrom": [
|
"columnsFrom": ["repository_id"],
|
||||||
"repository_id"
|
"columnsTo": ["id"],
|
||||||
],
|
|
||||||
"columnsTo": [
|
|
||||||
"id"
|
|
||||||
],
|
|
||||||
"onDelete": "cascade",
|
"onDelete": "cascade",
|
||||||
"onUpdate": "no action"
|
"onUpdate": "no action"
|
||||||
}
|
}
|
||||||
@@ -512,18 +500,13 @@
|
|||||||
"indexes": {
|
"indexes": {
|
||||||
"uniq_repo_config_base": {
|
"uniq_repo_config_base": {
|
||||||
"name": "uniq_repo_config_base",
|
"name": "uniq_repo_config_base",
|
||||||
"columns": [
|
"columns": ["repository_id"],
|
||||||
"repository_id"
|
|
||||||
],
|
|
||||||
"isUnique": true,
|
"isUnique": true,
|
||||||
"where": "\"repository_configs\".\"version_id\" IS NULL"
|
"where": "\"repository_configs\".\"version_id\" IS NULL"
|
||||||
},
|
},
|
||||||
"uniq_repo_config_version": {
|
"uniq_repo_config_version": {
|
||||||
"name": "uniq_repo_config_version",
|
"name": "uniq_repo_config_version",
|
||||||
"columns": [
|
"columns": ["repository_id", "version_id"],
|
||||||
"repository_id",
|
|
||||||
"version_id"
|
|
||||||
],
|
|
||||||
"isUnique": true,
|
"isUnique": true,
|
||||||
"where": "\"repository_configs\".\"version_id\" IS NOT NULL"
|
"where": "\"repository_configs\".\"version_id\" IS NOT NULL"
|
||||||
}
|
}
|
||||||
@@ -533,12 +516,8 @@
|
|||||||
"name": "repository_configs_repository_id_repositories_id_fk",
|
"name": "repository_configs_repository_id_repositories_id_fk",
|
||||||
"tableFrom": "repository_configs",
|
"tableFrom": "repository_configs",
|
||||||
"tableTo": "repositories",
|
"tableTo": "repositories",
|
||||||
"columnsFrom": [
|
"columnsFrom": ["repository_id"],
|
||||||
"repository_id"
|
"columnsTo": ["id"],
|
||||||
],
|
|
||||||
"columnsTo": [
|
|
||||||
"id"
|
|
||||||
],
|
|
||||||
"onDelete": "cascade",
|
"onDelete": "cascade",
|
||||||
"onUpdate": "no action"
|
"onUpdate": "no action"
|
||||||
}
|
}
|
||||||
@@ -622,12 +601,8 @@
|
|||||||
"name": "repository_versions_repository_id_repositories_id_fk",
|
"name": "repository_versions_repository_id_repositories_id_fk",
|
||||||
"tableFrom": "repository_versions",
|
"tableFrom": "repository_versions",
|
||||||
"tableTo": "repositories",
|
"tableTo": "repositories",
|
||||||
"columnsFrom": [
|
"columnsFrom": ["repository_id"],
|
||||||
"repository_id"
|
"columnsTo": ["id"],
|
||||||
],
|
|
||||||
"columnsTo": [
|
|
||||||
"id"
|
|
||||||
],
|
|
||||||
"onDelete": "cascade",
|
"onDelete": "cascade",
|
||||||
"onUpdate": "no action"
|
"onUpdate": "no action"
|
||||||
}
|
}
|
||||||
@@ -719,12 +694,8 @@
|
|||||||
"name": "snippet_embeddings_snippet_id_snippets_id_fk",
|
"name": "snippet_embeddings_snippet_id_snippets_id_fk",
|
||||||
"tableFrom": "snippet_embeddings",
|
"tableFrom": "snippet_embeddings",
|
||||||
"tableTo": "snippets",
|
"tableTo": "snippets",
|
||||||
"columnsFrom": [
|
"columnsFrom": ["snippet_id"],
|
||||||
"snippet_id"
|
"columnsTo": ["id"],
|
||||||
],
|
|
||||||
"columnsTo": [
|
|
||||||
"id"
|
|
||||||
],
|
|
||||||
"onDelete": "cascade",
|
"onDelete": "cascade",
|
||||||
"onUpdate": "no action"
|
"onUpdate": "no action"
|
||||||
},
|
},
|
||||||
@@ -732,22 +703,15 @@
|
|||||||
"name": "snippet_embeddings_profile_id_embedding_profiles_id_fk",
|
"name": "snippet_embeddings_profile_id_embedding_profiles_id_fk",
|
||||||
"tableFrom": "snippet_embeddings",
|
"tableFrom": "snippet_embeddings",
|
||||||
"tableTo": "embedding_profiles",
|
"tableTo": "embedding_profiles",
|
||||||
"columnsFrom": [
|
"columnsFrom": ["profile_id"],
|
||||||
"profile_id"
|
"columnsTo": ["id"],
|
||||||
],
|
|
||||||
"columnsTo": [
|
|
||||||
"id"
|
|
||||||
],
|
|
||||||
"onDelete": "cascade",
|
"onDelete": "cascade",
|
||||||
"onUpdate": "no action"
|
"onUpdate": "no action"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"compositePrimaryKeys": {
|
"compositePrimaryKeys": {
|
||||||
"snippet_embeddings_snippet_id_profile_id_pk": {
|
"snippet_embeddings_snippet_id_profile_id_pk": {
|
||||||
"columns": [
|
"columns": ["snippet_id", "profile_id"],
|
||||||
"snippet_id",
|
|
||||||
"profile_id"
|
|
||||||
],
|
|
||||||
"name": "snippet_embeddings_snippet_id_profile_id_pk"
|
"name": "snippet_embeddings_snippet_id_profile_id_pk"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
@@ -842,12 +806,8 @@
|
|||||||
"name": "snippets_document_id_documents_id_fk",
|
"name": "snippets_document_id_documents_id_fk",
|
||||||
"tableFrom": "snippets",
|
"tableFrom": "snippets",
|
||||||
"tableTo": "documents",
|
"tableTo": "documents",
|
||||||
"columnsFrom": [
|
"columnsFrom": ["document_id"],
|
||||||
"document_id"
|
"columnsTo": ["id"],
|
||||||
],
|
|
||||||
"columnsTo": [
|
|
||||||
"id"
|
|
||||||
],
|
|
||||||
"onDelete": "cascade",
|
"onDelete": "cascade",
|
||||||
"onUpdate": "no action"
|
"onUpdate": "no action"
|
||||||
},
|
},
|
||||||
@@ -855,12 +815,8 @@
|
|||||||
"name": "snippets_repository_id_repositories_id_fk",
|
"name": "snippets_repository_id_repositories_id_fk",
|
||||||
"tableFrom": "snippets",
|
"tableFrom": "snippets",
|
||||||
"tableTo": "repositories",
|
"tableTo": "repositories",
|
||||||
"columnsFrom": [
|
"columnsFrom": ["repository_id"],
|
||||||
"repository_id"
|
"columnsTo": ["id"],
|
||||||
],
|
|
||||||
"columnsTo": [
|
|
||||||
"id"
|
|
||||||
],
|
|
||||||
"onDelete": "cascade",
|
"onDelete": "cascade",
|
||||||
"onUpdate": "no action"
|
"onUpdate": "no action"
|
||||||
},
|
},
|
||||||
@@ -868,12 +824,8 @@
|
|||||||
"name": "snippets_version_id_repository_versions_id_fk",
|
"name": "snippets_version_id_repository_versions_id_fk",
|
||||||
"tableFrom": "snippets",
|
"tableFrom": "snippets",
|
||||||
"tableTo": "repository_versions",
|
"tableTo": "repository_versions",
|
||||||
"columnsFrom": [
|
"columnsFrom": ["version_id"],
|
||||||
"version_id"
|
"columnsTo": ["id"],
|
||||||
],
|
|
||||||
"columnsTo": [
|
|
||||||
"id"
|
|
||||||
],
|
|
||||||
"onDelete": "cascade",
|
"onDelete": "cascade",
|
||||||
"onUpdate": "no action"
|
"onUpdate": "no action"
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -75,10 +75,7 @@
|
|||||||
"indexes": {
|
"indexes": {
|
||||||
"idx_documents_repo_version": {
|
"idx_documents_repo_version": {
|
||||||
"name": "idx_documents_repo_version",
|
"name": "idx_documents_repo_version",
|
||||||
"columns": [
|
"columns": ["repository_id", "version_id"],
|
||||||
"repository_id",
|
|
||||||
"version_id"
|
|
||||||
],
|
|
||||||
"isUnique": false
|
"isUnique": false
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
@@ -87,12 +84,8 @@
|
|||||||
"name": "documents_repository_id_repositories_id_fk",
|
"name": "documents_repository_id_repositories_id_fk",
|
||||||
"tableFrom": "documents",
|
"tableFrom": "documents",
|
||||||
"tableTo": "repositories",
|
"tableTo": "repositories",
|
||||||
"columnsFrom": [
|
"columnsFrom": ["repository_id"],
|
||||||
"repository_id"
|
"columnsTo": ["id"],
|
||||||
],
|
|
||||||
"columnsTo": [
|
|
||||||
"id"
|
|
||||||
],
|
|
||||||
"onDelete": "cascade",
|
"onDelete": "cascade",
|
||||||
"onUpdate": "no action"
|
"onUpdate": "no action"
|
||||||
},
|
},
|
||||||
@@ -100,12 +93,8 @@
|
|||||||
"name": "documents_version_id_repository_versions_id_fk",
|
"name": "documents_version_id_repository_versions_id_fk",
|
||||||
"tableFrom": "documents",
|
"tableFrom": "documents",
|
||||||
"tableTo": "repository_versions",
|
"tableTo": "repository_versions",
|
||||||
"columnsFrom": [
|
"columnsFrom": ["version_id"],
|
||||||
"version_id"
|
"columnsTo": ["id"],
|
||||||
],
|
|
||||||
"columnsTo": [
|
|
||||||
"id"
|
|
||||||
],
|
|
||||||
"onDelete": "cascade",
|
"onDelete": "cascade",
|
||||||
"onUpdate": "no action"
|
"onUpdate": "no action"
|
||||||
}
|
}
|
||||||
@@ -299,10 +288,7 @@
|
|||||||
"indexes": {
|
"indexes": {
|
||||||
"idx_jobs_repo_status": {
|
"idx_jobs_repo_status": {
|
||||||
"name": "idx_jobs_repo_status",
|
"name": "idx_jobs_repo_status",
|
||||||
"columns": [
|
"columns": ["repository_id", "status"],
|
||||||
"repository_id",
|
|
||||||
"status"
|
|
||||||
],
|
|
||||||
"isUnique": false
|
"isUnique": false
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
@@ -311,12 +297,8 @@
|
|||||||
"name": "indexing_jobs_repository_id_repositories_id_fk",
|
"name": "indexing_jobs_repository_id_repositories_id_fk",
|
||||||
"tableFrom": "indexing_jobs",
|
"tableFrom": "indexing_jobs",
|
||||||
"tableTo": "repositories",
|
"tableTo": "repositories",
|
||||||
"columnsFrom": [
|
"columnsFrom": ["repository_id"],
|
||||||
"repository_id"
|
"columnsTo": ["id"],
|
||||||
],
|
|
||||||
"columnsTo": [
|
|
||||||
"id"
|
|
||||||
],
|
|
||||||
"onDelete": "cascade",
|
"onDelete": "cascade",
|
||||||
"onUpdate": "no action"
|
"onUpdate": "no action"
|
||||||
}
|
}
|
||||||
@@ -450,9 +432,7 @@
|
|||||||
"indexes": {
|
"indexes": {
|
||||||
"idx_repositories_state": {
|
"idx_repositories_state": {
|
||||||
"name": "idx_repositories_state",
|
"name": "idx_repositories_state",
|
||||||
"columns": [
|
"columns": ["state"],
|
||||||
"state"
|
|
||||||
],
|
|
||||||
"isUnique": false
|
"isUnique": false
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
@@ -538,18 +518,13 @@
|
|||||||
"indexes": {
|
"indexes": {
|
||||||
"uniq_repo_config_base": {
|
"uniq_repo_config_base": {
|
||||||
"name": "uniq_repo_config_base",
|
"name": "uniq_repo_config_base",
|
||||||
"columns": [
|
"columns": ["repository_id"],
|
||||||
"repository_id"
|
|
||||||
],
|
|
||||||
"isUnique": true,
|
"isUnique": true,
|
||||||
"where": "\"repository_configs\".\"version_id\" IS NULL"
|
"where": "\"repository_configs\".\"version_id\" IS NULL"
|
||||||
},
|
},
|
||||||
"uniq_repo_config_version": {
|
"uniq_repo_config_version": {
|
||||||
"name": "uniq_repo_config_version",
|
"name": "uniq_repo_config_version",
|
||||||
"columns": [
|
"columns": ["repository_id", "version_id"],
|
||||||
"repository_id",
|
|
||||||
"version_id"
|
|
||||||
],
|
|
||||||
"isUnique": true,
|
"isUnique": true,
|
||||||
"where": "\"repository_configs\".\"version_id\" IS NOT NULL"
|
"where": "\"repository_configs\".\"version_id\" IS NOT NULL"
|
||||||
}
|
}
|
||||||
@@ -559,12 +534,8 @@
|
|||||||
"name": "repository_configs_repository_id_repositories_id_fk",
|
"name": "repository_configs_repository_id_repositories_id_fk",
|
||||||
"tableFrom": "repository_configs",
|
"tableFrom": "repository_configs",
|
||||||
"tableTo": "repositories",
|
"tableTo": "repositories",
|
||||||
"columnsFrom": [
|
"columnsFrom": ["repository_id"],
|
||||||
"repository_id"
|
"columnsTo": ["id"],
|
||||||
],
|
|
||||||
"columnsTo": [
|
|
||||||
"id"
|
|
||||||
],
|
|
||||||
"onDelete": "cascade",
|
"onDelete": "cascade",
|
||||||
"onUpdate": "no action"
|
"onUpdate": "no action"
|
||||||
}
|
}
|
||||||
@@ -648,12 +619,8 @@
|
|||||||
"name": "repository_versions_repository_id_repositories_id_fk",
|
"name": "repository_versions_repository_id_repositories_id_fk",
|
||||||
"tableFrom": "repository_versions",
|
"tableFrom": "repository_versions",
|
||||||
"tableTo": "repositories",
|
"tableTo": "repositories",
|
||||||
"columnsFrom": [
|
"columnsFrom": ["repository_id"],
|
||||||
"repository_id"
|
"columnsTo": ["id"],
|
||||||
],
|
|
||||||
"columnsTo": [
|
|
||||||
"id"
|
|
||||||
],
|
|
||||||
"onDelete": "cascade",
|
"onDelete": "cascade",
|
||||||
"onUpdate": "no action"
|
"onUpdate": "no action"
|
||||||
}
|
}
|
||||||
@@ -742,10 +709,7 @@
|
|||||||
"indexes": {
|
"indexes": {
|
||||||
"idx_embeddings_profile": {
|
"idx_embeddings_profile": {
|
||||||
"name": "idx_embeddings_profile",
|
"name": "idx_embeddings_profile",
|
||||||
"columns": [
|
"columns": ["profile_id", "snippet_id"],
|
||||||
"profile_id",
|
|
||||||
"snippet_id"
|
|
||||||
],
|
|
||||||
"isUnique": false
|
"isUnique": false
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
@@ -754,12 +718,8 @@
|
|||||||
"name": "snippet_embeddings_snippet_id_snippets_id_fk",
|
"name": "snippet_embeddings_snippet_id_snippets_id_fk",
|
||||||
"tableFrom": "snippet_embeddings",
|
"tableFrom": "snippet_embeddings",
|
||||||
"tableTo": "snippets",
|
"tableTo": "snippets",
|
||||||
"columnsFrom": [
|
"columnsFrom": ["snippet_id"],
|
||||||
"snippet_id"
|
"columnsTo": ["id"],
|
||||||
],
|
|
||||||
"columnsTo": [
|
|
||||||
"id"
|
|
||||||
],
|
|
||||||
"onDelete": "cascade",
|
"onDelete": "cascade",
|
||||||
"onUpdate": "no action"
|
"onUpdate": "no action"
|
||||||
},
|
},
|
||||||
@@ -767,22 +727,15 @@
|
|||||||
"name": "snippet_embeddings_profile_id_embedding_profiles_id_fk",
|
"name": "snippet_embeddings_profile_id_embedding_profiles_id_fk",
|
||||||
"tableFrom": "snippet_embeddings",
|
"tableFrom": "snippet_embeddings",
|
||||||
"tableTo": "embedding_profiles",
|
"tableTo": "embedding_profiles",
|
||||||
"columnsFrom": [
|
"columnsFrom": ["profile_id"],
|
||||||
"profile_id"
|
"columnsTo": ["id"],
|
||||||
],
|
|
||||||
"columnsTo": [
|
|
||||||
"id"
|
|
||||||
],
|
|
||||||
"onDelete": "cascade",
|
"onDelete": "cascade",
|
||||||
"onUpdate": "no action"
|
"onUpdate": "no action"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"compositePrimaryKeys": {
|
"compositePrimaryKeys": {
|
||||||
"snippet_embeddings_snippet_id_profile_id_pk": {
|
"snippet_embeddings_snippet_id_profile_id_pk": {
|
||||||
"columns": [
|
"columns": ["snippet_id", "profile_id"],
|
||||||
"snippet_id",
|
|
||||||
"profile_id"
|
|
||||||
],
|
|
||||||
"name": "snippet_embeddings_snippet_id_profile_id_pk"
|
"name": "snippet_embeddings_snippet_id_profile_id_pk"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
@@ -874,18 +827,12 @@
|
|||||||
"indexes": {
|
"indexes": {
|
||||||
"idx_snippets_repo_version": {
|
"idx_snippets_repo_version": {
|
||||||
"name": "idx_snippets_repo_version",
|
"name": "idx_snippets_repo_version",
|
||||||
"columns": [
|
"columns": ["repository_id", "version_id"],
|
||||||
"repository_id",
|
|
||||||
"version_id"
|
|
||||||
],
|
|
||||||
"isUnique": false
|
"isUnique": false
|
||||||
},
|
},
|
||||||
"idx_snippets_repo_type": {
|
"idx_snippets_repo_type": {
|
||||||
"name": "idx_snippets_repo_type",
|
"name": "idx_snippets_repo_type",
|
||||||
"columns": [
|
"columns": ["repository_id", "type"],
|
||||||
"repository_id",
|
|
||||||
"type"
|
|
||||||
],
|
|
||||||
"isUnique": false
|
"isUnique": false
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
@@ -894,12 +841,8 @@
|
|||||||
"name": "snippets_document_id_documents_id_fk",
|
"name": "snippets_document_id_documents_id_fk",
|
||||||
"tableFrom": "snippets",
|
"tableFrom": "snippets",
|
||||||
"tableTo": "documents",
|
"tableTo": "documents",
|
||||||
"columnsFrom": [
|
"columnsFrom": ["document_id"],
|
||||||
"document_id"
|
"columnsTo": ["id"],
|
||||||
],
|
|
||||||
"columnsTo": [
|
|
||||||
"id"
|
|
||||||
],
|
|
||||||
"onDelete": "cascade",
|
"onDelete": "cascade",
|
||||||
"onUpdate": "no action"
|
"onUpdate": "no action"
|
||||||
},
|
},
|
||||||
@@ -907,12 +850,8 @@
|
|||||||
"name": "snippets_repository_id_repositories_id_fk",
|
"name": "snippets_repository_id_repositories_id_fk",
|
||||||
"tableFrom": "snippets",
|
"tableFrom": "snippets",
|
||||||
"tableTo": "repositories",
|
"tableTo": "repositories",
|
||||||
"columnsFrom": [
|
"columnsFrom": ["repository_id"],
|
||||||
"repository_id"
|
"columnsTo": ["id"],
|
||||||
],
|
|
||||||
"columnsTo": [
|
|
||||||
"id"
|
|
||||||
],
|
|
||||||
"onDelete": "cascade",
|
"onDelete": "cascade",
|
||||||
"onUpdate": "no action"
|
"onUpdate": "no action"
|
||||||
},
|
},
|
||||||
@@ -920,12 +859,8 @@
|
|||||||
"name": "snippets_version_id_repository_versions_id_fk",
|
"name": "snippets_version_id_repository_versions_id_fk",
|
||||||
"tableFrom": "snippets",
|
"tableFrom": "snippets",
|
||||||
"tableTo": "repository_versions",
|
"tableTo": "repository_versions",
|
||||||
"columnsFrom": [
|
"columnsFrom": ["version_id"],
|
||||||
"version_id"
|
"columnsTo": ["id"],
|
||||||
],
|
|
||||||
"columnsTo": [
|
|
||||||
"id"
|
|
||||||
],
|
|
||||||
"onDelete": "cascade",
|
"onDelete": "cascade",
|
||||||
"onUpdate": "no action"
|
"onUpdate": "no action"
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -349,14 +349,14 @@ describe('snippet_embeddings table', () => {
|
|||||||
});
|
});
|
||||||
|
|
||||||
it('keeps the relational schema free of vec_embedding and retains the profile index', () => {
|
it('keeps the relational schema free of vec_embedding and retains the profile index', () => {
|
||||||
const columns = client
|
const columns = client.prepare("PRAGMA table_info('snippet_embeddings')").all() as Array<{
|
||||||
.prepare("PRAGMA table_info('snippet_embeddings')")
|
name: string;
|
||||||
.all() as Array<{ name: string }>;
|
}>;
|
||||||
expect(columns.map((column) => column.name)).not.toContain('vec_embedding');
|
expect(columns.map((column) => column.name)).not.toContain('vec_embedding');
|
||||||
|
|
||||||
const indexes = client
|
const indexes = client.prepare("PRAGMA index_list('snippet_embeddings')").all() as Array<{
|
||||||
.prepare("PRAGMA index_list('snippet_embeddings')")
|
name: string;
|
||||||
.all() as Array<{ name: string }>;
|
}>;
|
||||||
expect(indexes.map((index) => index.name)).toContain('idx_embeddings_profile');
|
expect(indexes.map((index) => index.name)).toContain('idx_embeddings_profile');
|
||||||
});
|
});
|
||||||
|
|
||||||
|
|||||||
@@ -13,7 +13,9 @@ import {
|
|||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
// repositories
|
// repositories
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
export const repositories = sqliteTable('repositories', {
|
export const repositories = sqliteTable(
|
||||||
|
'repositories',
|
||||||
|
{
|
||||||
id: text('id').primaryKey(), // e.g. "/facebook/react" or "/local/my-sdk"
|
id: text('id').primaryKey(), // e.g. "/facebook/react" or "/local/my-sdk"
|
||||||
title: text('title').notNull(),
|
title: text('title').notNull(),
|
||||||
description: text('description'),
|
description: text('description'),
|
||||||
@@ -35,7 +37,9 @@ export const repositories = sqliteTable('repositories', {
|
|||||||
lastIndexedAt: integer('last_indexed_at', { mode: 'timestamp' }),
|
lastIndexedAt: integer('last_indexed_at', { mode: 'timestamp' }),
|
||||||
createdAt: integer('created_at', { mode: 'timestamp' }).notNull(),
|
createdAt: integer('created_at', { mode: 'timestamp' }).notNull(),
|
||||||
updatedAt: integer('updated_at', { mode: 'timestamp' }).notNull()
|
updatedAt: integer('updated_at', { mode: 'timestamp' }).notNull()
|
||||||
}, (t) => [index('idx_repositories_state').on(t.state)]);
|
},
|
||||||
|
(t) => [index('idx_repositories_state').on(t.state)]
|
||||||
|
);
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
// repository_versions
|
// repository_versions
|
||||||
@@ -61,7 +65,9 @@ export const repositoryVersions = sqliteTable('repository_versions', {
|
|||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
// documents
|
// documents
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
export const documents = sqliteTable('documents', {
|
export const documents = sqliteTable(
|
||||||
|
'documents',
|
||||||
|
{
|
||||||
id: text('id').primaryKey(), // UUID
|
id: text('id').primaryKey(), // UUID
|
||||||
repositoryId: text('repository_id')
|
repositoryId: text('repository_id')
|
||||||
.notNull()
|
.notNull()
|
||||||
@@ -73,12 +79,16 @@ export const documents = sqliteTable('documents', {
|
|||||||
tokenCount: integer('token_count').default(0),
|
tokenCount: integer('token_count').default(0),
|
||||||
checksum: text('checksum').notNull(), // SHA-256 of file content
|
checksum: text('checksum').notNull(), // SHA-256 of file content
|
||||||
indexedAt: integer('indexed_at', { mode: 'timestamp' }).notNull()
|
indexedAt: integer('indexed_at', { mode: 'timestamp' }).notNull()
|
||||||
}, (t) => [index('idx_documents_repo_version').on(t.repositoryId, t.versionId)]);
|
},
|
||||||
|
(t) => [index('idx_documents_repo_version').on(t.repositoryId, t.versionId)]
|
||||||
|
);
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
// snippets
|
// snippets
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
export const snippets = sqliteTable('snippets', {
|
export const snippets = sqliteTable(
|
||||||
|
'snippets',
|
||||||
|
{
|
||||||
id: text('id').primaryKey(), // UUID
|
id: text('id').primaryKey(), // UUID
|
||||||
documentId: text('document_id')
|
documentId: text('document_id')
|
||||||
.notNull()
|
.notNull()
|
||||||
@@ -94,10 +104,12 @@ export const snippets = sqliteTable('snippets', {
|
|||||||
breadcrumb: text('breadcrumb'), // e.g. "Installation > Getting Started"
|
breadcrumb: text('breadcrumb'), // e.g. "Installation > Getting Started"
|
||||||
tokenCount: integer('token_count').default(0),
|
tokenCount: integer('token_count').default(0),
|
||||||
createdAt: integer('created_at', { mode: 'timestamp' }).notNull()
|
createdAt: integer('created_at', { mode: 'timestamp' }).notNull()
|
||||||
}, (t) => [
|
},
|
||||||
|
(t) => [
|
||||||
index('idx_snippets_repo_version').on(t.repositoryId, t.versionId),
|
index('idx_snippets_repo_version').on(t.repositoryId, t.versionId),
|
||||||
index('idx_snippets_repo_type').on(t.repositoryId, t.type),
|
index('idx_snippets_repo_type').on(t.repositoryId, t.type)
|
||||||
]);
|
]
|
||||||
|
);
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
// embedding_profiles
|
// embedding_profiles
|
||||||
@@ -134,14 +146,16 @@ export const snippetEmbeddings = sqliteTable(
|
|||||||
},
|
},
|
||||||
(table) => [
|
(table) => [
|
||||||
primaryKey({ columns: [table.snippetId, table.profileId] }),
|
primaryKey({ columns: [table.snippetId, table.profileId] }),
|
||||||
index('idx_embeddings_profile').on(table.profileId, table.snippetId),
|
index('idx_embeddings_profile').on(table.profileId, table.snippetId)
|
||||||
]
|
]
|
||||||
);
|
);
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
// indexing_jobs
|
// indexing_jobs
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
export const indexingJobs = sqliteTable('indexing_jobs', {
|
export const indexingJobs = sqliteTable(
|
||||||
|
'indexing_jobs',
|
||||||
|
{
|
||||||
id: text('id').primaryKey(), // UUID
|
id: text('id').primaryKey(), // UUID
|
||||||
repositoryId: text('repository_id')
|
repositoryId: text('repository_id')
|
||||||
.notNull()
|
.notNull()
|
||||||
@@ -155,13 +169,29 @@ export const indexingJobs = sqliteTable('indexing_jobs', {
|
|||||||
progress: integer('progress').default(0), // 0–100
|
progress: integer('progress').default(0), // 0–100
|
||||||
totalFiles: integer('total_files').default(0),
|
totalFiles: integer('total_files').default(0),
|
||||||
processedFiles: integer('processed_files').default(0),
|
processedFiles: integer('processed_files').default(0),
|
||||||
stage: text('stage', { enum: ['queued', 'differential', 'crawling', 'cloning', 'parsing', 'storing', 'embedding', 'done', 'failed'] }).notNull().default('queued'),
|
stage: text('stage', {
|
||||||
|
enum: [
|
||||||
|
'queued',
|
||||||
|
'differential',
|
||||||
|
'crawling',
|
||||||
|
'cloning',
|
||||||
|
'parsing',
|
||||||
|
'storing',
|
||||||
|
'embedding',
|
||||||
|
'done',
|
||||||
|
'failed'
|
||||||
|
]
|
||||||
|
})
|
||||||
|
.notNull()
|
||||||
|
.default('queued'),
|
||||||
stageDetail: text('stage_detail'),
|
stageDetail: text('stage_detail'),
|
||||||
error: text('error'),
|
error: text('error'),
|
||||||
startedAt: integer('started_at', { mode: 'timestamp' }),
|
startedAt: integer('started_at', { mode: 'timestamp' }),
|
||||||
completedAt: integer('completed_at', { mode: 'timestamp' }),
|
completedAt: integer('completed_at', { mode: 'timestamp' }),
|
||||||
createdAt: integer('created_at', { mode: 'timestamp' }).notNull()
|
createdAt: integer('created_at', { mode: 'timestamp' }).notNull()
|
||||||
}, (t) => [index('idx_jobs_repo_status').on(t.repositoryId, t.status)]);
|
},
|
||||||
|
(t) => [index('idx_jobs_repo_status').on(t.repositoryId, t.status)]
|
||||||
|
);
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
// repository_configs
|
// repository_configs
|
||||||
|
|||||||
@@ -12,11 +12,7 @@ import { migrate } from 'drizzle-orm/better-sqlite3/migrator';
|
|||||||
import { readFileSync } from 'node:fs';
|
import { readFileSync } from 'node:fs';
|
||||||
import { join } from 'node:path';
|
import { join } from 'node:path';
|
||||||
import * as schema from '../db/schema.js';
|
import * as schema from '../db/schema.js';
|
||||||
import {
|
import { loadSqliteVec, sqliteVecRowidTableName, sqliteVecTableName } from '../db/sqlite-vec.js';
|
||||||
loadSqliteVec,
|
|
||||||
sqliteVecRowidTableName,
|
|
||||||
sqliteVecTableName
|
|
||||||
} from '../db/sqlite-vec.js';
|
|
||||||
import { SqliteVecStore } from '../search/sqlite-vec.store.js';
|
import { SqliteVecStore } from '../search/sqlite-vec.store.js';
|
||||||
|
|
||||||
import { NoopEmbeddingProvider, EmbeddingError, type EmbeddingVector } from './provider.js';
|
import { NoopEmbeddingProvider, EmbeddingError, type EmbeddingVector } from './provider.js';
|
||||||
@@ -424,6 +420,25 @@ describe('EmbeddingService', () => {
|
|||||||
expect(embedding![2]).toBeCloseTo(0.2, 5);
|
expect(embedding![2]).toBeCloseTo(0.2, 5);
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('can delegate embedding persistence to an injected writer', async () => {
|
||||||
|
const snippetId = seedSnippet(db, client);
|
||||||
|
const provider = makeProvider(4);
|
||||||
|
const persistEmbeddings = vi.fn().mockResolvedValue(undefined);
|
||||||
|
const service = new EmbeddingService(client, provider, 'local-default', {
|
||||||
|
persistEmbeddings
|
||||||
|
});
|
||||||
|
|
||||||
|
await service.embedSnippets([snippetId]);
|
||||||
|
|
||||||
|
expect(persistEmbeddings).toHaveBeenCalledTimes(1);
|
||||||
|
const rows = client
|
||||||
|
.prepare(
|
||||||
|
'SELECT COUNT(*) AS cnt FROM snippet_embeddings WHERE snippet_id = ? AND profile_id = ?'
|
||||||
|
)
|
||||||
|
.get(snippetId, 'local-default') as { cnt: number };
|
||||||
|
expect(rows.cnt).toBe(0);
|
||||||
|
});
|
||||||
|
|
||||||
it('stores embeddings under the configured profile ID', async () => {
|
it('stores embeddings under the configured profile ID', async () => {
|
||||||
client
|
client
|
||||||
.prepare(
|
.prepare(
|
||||||
@@ -431,16 +446,7 @@ describe('EmbeddingService', () => {
|
|||||||
(id, provider_kind, title, enabled, is_default, model, dimensions, config, created_at, updated_at)
|
(id, provider_kind, title, enabled, is_default, model, dimensions, config, created_at, updated_at)
|
||||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, unixepoch(), unixepoch())`
|
VALUES (?, ?, ?, ?, ?, ?, ?, ?, unixepoch(), unixepoch())`
|
||||||
)
|
)
|
||||||
.run(
|
.run('openai-custom', 'openai-compatible', 'OpenAI Custom', 1, 0, 'test-model', 4, '{}');
|
||||||
'openai-custom',
|
|
||||||
'openai-compatible',
|
|
||||||
'OpenAI Custom',
|
|
||||||
1,
|
|
||||||
0,
|
|
||||||
'test-model',
|
|
||||||
4,
|
|
||||||
'{}'
|
|
||||||
);
|
|
||||||
|
|
||||||
const snippetId = seedSnippet(db, client);
|
const snippetId = seedSnippet(db, client);
|
||||||
const provider = makeProvider(4, 'test-model');
|
const provider = makeProvider(4, 'test-model');
|
||||||
|
|||||||
@@ -6,6 +6,10 @@
|
|||||||
import type Database from 'better-sqlite3';
|
import type Database from 'better-sqlite3';
|
||||||
import type { EmbeddingProvider } from './provider.js';
|
import type { EmbeddingProvider } from './provider.js';
|
||||||
import { SqliteVecStore } from '$lib/server/search/sqlite-vec.store.js';
|
import { SqliteVecStore } from '$lib/server/search/sqlite-vec.store.js';
|
||||||
|
import {
|
||||||
|
upsertEmbeddings,
|
||||||
|
type PersistedEmbedding
|
||||||
|
} from '$lib/server/pipeline/write-operations.js';
|
||||||
|
|
||||||
interface SnippetRow {
|
interface SnippetRow {
|
||||||
id: string;
|
id: string;
|
||||||
@@ -23,7 +27,10 @@ export class EmbeddingService {
|
|||||||
constructor(
|
constructor(
|
||||||
private readonly db: Database.Database,
|
private readonly db: Database.Database,
|
||||||
private readonly provider: EmbeddingProvider,
|
private readonly provider: EmbeddingProvider,
|
||||||
private readonly profileId: string = 'local-default'
|
private readonly profileId: string = 'local-default',
|
||||||
|
private readonly persistenceDelegate?: {
|
||||||
|
persistEmbeddings?: (embeddings: PersistedEmbedding[]) => Promise<void>;
|
||||||
|
}
|
||||||
) {
|
) {
|
||||||
this.sqliteVecStore = new SqliteVecStore(db);
|
this.sqliteVecStore = new SqliteVecStore(db);
|
||||||
}
|
}
|
||||||
@@ -94,37 +101,31 @@ export class EmbeddingService {
|
|||||||
[s.title, s.breadcrumb, s.content].filter(Boolean).join('\n').slice(0, TEXT_MAX_CHARS)
|
[s.title, s.breadcrumb, s.content].filter(Boolean).join('\n').slice(0, TEXT_MAX_CHARS)
|
||||||
);
|
);
|
||||||
|
|
||||||
const insert = this.db.prepare<[string, string, string, number, Buffer]>(`
|
|
||||||
INSERT OR REPLACE INTO snippet_embeddings (snippet_id, profile_id, model, dimensions, embedding, created_at)
|
|
||||||
VALUES (?, ?, ?, ?, ?, unixepoch())
|
|
||||||
`);
|
|
||||||
|
|
||||||
for (let i = 0; i < snippets.length; i += BATCH_SIZE) {
|
for (let i = 0; i < snippets.length; i += BATCH_SIZE) {
|
||||||
const batchSnippets = snippets.slice(i, i + BATCH_SIZE);
|
const batchSnippets = snippets.slice(i, i + BATCH_SIZE);
|
||||||
const batchTexts = texts.slice(i, i + BATCH_SIZE);
|
const batchTexts = texts.slice(i, i + BATCH_SIZE);
|
||||||
|
|
||||||
const embeddings = await this.provider.embed(batchTexts);
|
const embeddings = await this.provider.embed(batchTexts);
|
||||||
|
const persistedEmbeddings: PersistedEmbedding[] = batchSnippets.map((snippet, index) => {
|
||||||
const insertMany = this.db.transaction(() => {
|
const embedding = embeddings[index];
|
||||||
for (let j = 0; j < batchSnippets.length; j++) {
|
return {
|
||||||
const snippet = batchSnippets[j];
|
snippetId: snippet.id,
|
||||||
const embedding = embeddings[j];
|
profileId: this.profileId,
|
||||||
|
model: embedding.model,
|
||||||
insert.run(
|
dimensions: embedding.dimensions,
|
||||||
snippet.id,
|
embedding: Buffer.from(
|
||||||
this.profileId,
|
|
||||||
embedding.model,
|
|
||||||
embedding.dimensions,
|
|
||||||
Buffer.from(
|
|
||||||
embedding.values.buffer,
|
embedding.values.buffer,
|
||||||
embedding.values.byteOffset,
|
embedding.values.byteOffset,
|
||||||
embedding.values.byteLength
|
embedding.values.byteLength
|
||||||
)
|
)
|
||||||
);
|
};
|
||||||
this.sqliteVecStore.upsertEmbedding(this.profileId, snippet.id, embedding.values);
|
|
||||||
}
|
|
||||||
});
|
});
|
||||||
insertMany();
|
|
||||||
|
if (this.persistenceDelegate?.persistEmbeddings) {
|
||||||
|
await this.persistenceDelegate.persistEmbeddings(persistedEmbeddings);
|
||||||
|
} else {
|
||||||
|
upsertEmbeddings(this.db, persistedEmbeddings);
|
||||||
|
}
|
||||||
|
|
||||||
onProgress?.(Math.min(i + BATCH_SIZE, snippets.length), snippets.length);
|
onProgress?.(Math.min(i + BATCH_SIZE, snippets.length), snippets.length);
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,7 +1,4 @@
|
|||||||
import {
|
import { EmbeddingProfile, EmbeddingProfileEntity } from '$lib/server/models/embedding-profile.js';
|
||||||
EmbeddingProfile,
|
|
||||||
EmbeddingProfileEntity
|
|
||||||
} from '$lib/server/models/embedding-profile.js';
|
|
||||||
|
|
||||||
function parseConfig(config: Record<string, unknown> | string | null): Record<string, unknown> {
|
function parseConfig(config: Record<string, unknown> | string | null): Record<string, unknown> {
|
||||||
if (!config) {
|
if (!config) {
|
||||||
|
|||||||
@@ -44,7 +44,10 @@ function createTestDb(): Database.Database {
|
|||||||
'0004_complete_sentry.sql'
|
'0004_complete_sentry.sql'
|
||||||
]) {
|
]) {
|
||||||
const sql = readFileSync(join(migrationsFolder, migrationFile), 'utf-8');
|
const sql = readFileSync(join(migrationsFolder, migrationFile), 'utf-8');
|
||||||
for (const stmt of sql.split('--> statement-breakpoint').map((s) => s.trim()).filter(Boolean)) {
|
for (const stmt of sql
|
||||||
|
.split('--> statement-breakpoint')
|
||||||
|
.map((s) => s.trim())
|
||||||
|
.filter(Boolean)) {
|
||||||
client.exec(stmt);
|
client.exec(stmt);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -113,9 +116,10 @@ function insertDocument(db: Database.Database, versionId: string, filePath: stri
|
|||||||
.run(
|
.run(
|
||||||
id,
|
id,
|
||||||
db
|
db
|
||||||
.prepare<[string], { repository_id: string }>(
|
.prepare<
|
||||||
`SELECT repository_id FROM repository_versions WHERE id = ?`
|
[string],
|
||||||
)
|
{ repository_id: string }
|
||||||
|
>(`SELECT repository_id FROM repository_versions WHERE id = ?`)
|
||||||
.get(versionId)?.repository_id ?? '/test/repo',
|
.get(versionId)?.repository_id ?? '/test/repo',
|
||||||
versionId,
|
versionId,
|
||||||
filePath,
|
filePath,
|
||||||
@@ -280,9 +284,9 @@ describe('buildDifferentialPlan', () => {
|
|||||||
insertDocument(db, v1Id, 'packages/react/index.js');
|
insertDocument(db, v1Id, 'packages/react/index.js');
|
||||||
insertDocument(db, v1Id, 'packages/react-dom/index.js');
|
insertDocument(db, v1Id, 'packages/react-dom/index.js');
|
||||||
|
|
||||||
const fetchFn = vi.fn().mockResolvedValue([
|
const fetchFn = vi
|
||||||
{ path: 'packages/react/index.js', status: 'modified' as const }
|
.fn()
|
||||||
]);
|
.mockResolvedValue([{ path: 'packages/react/index.js', status: 'modified' as const }]);
|
||||||
|
|
||||||
const plan = await buildDifferentialPlan({
|
const plan = await buildDifferentialPlan({
|
||||||
repo,
|
repo,
|
||||||
@@ -292,13 +296,7 @@ describe('buildDifferentialPlan', () => {
|
|||||||
});
|
});
|
||||||
|
|
||||||
expect(fetchFn).toHaveBeenCalledOnce();
|
expect(fetchFn).toHaveBeenCalledOnce();
|
||||||
expect(fetchFn).toHaveBeenCalledWith(
|
expect(fetchFn).toHaveBeenCalledWith('facebook', 'react', 'v18.0.0', 'v18.1.0', 'ghp_test123');
|
||||||
'facebook',
|
|
||||||
'react',
|
|
||||||
'v18.0.0',
|
|
||||||
'v18.1.0',
|
|
||||||
'ghp_test123'
|
|
||||||
);
|
|
||||||
|
|
||||||
expect(plan).not.toBeNull();
|
expect(plan).not.toBeNull();
|
||||||
expect(plan!.changedPaths.has('packages/react/index.js')).toBe(true);
|
expect(plan!.changedPaths.has('packages/react/index.js')).toBe(true);
|
||||||
|
|||||||
@@ -41,9 +41,7 @@ export async function buildDifferentialPlan(params: {
|
|||||||
try {
|
try {
|
||||||
// 1. Load all indexed versions for this repository
|
// 1. Load all indexed versions for this repository
|
||||||
const rows = db
|
const rows = db
|
||||||
.prepare(
|
.prepare(`SELECT * FROM repository_versions WHERE repository_id = ? AND state = 'indexed'`)
|
||||||
`SELECT * FROM repository_versions WHERE repository_id = ? AND state = 'indexed'`
|
|
||||||
)
|
|
||||||
.all(repo.id) as RepositoryVersionEntity[];
|
.all(repo.id) as RepositoryVersionEntity[];
|
||||||
|
|
||||||
const indexedVersions: RepositoryVersion[] = rows.map((row) =>
|
const indexedVersions: RepositoryVersion[] = rows.map((row) =>
|
||||||
|
|||||||
@@ -1,10 +1,19 @@
|
|||||||
import { workerData, parentPort } from 'node:worker_threads';
|
import { workerData, parentPort } from 'node:worker_threads';
|
||||||
import Database from 'better-sqlite3';
|
import Database from 'better-sqlite3';
|
||||||
import { EmbeddingService } from '$lib/server/embeddings/embedding.service.js';
|
import { EmbeddingService } from '$lib/server/embeddings/embedding.service.js';
|
||||||
|
import { applySqlitePragmas } from '$lib/server/db/connection.js';
|
||||||
import { createProviderFromProfile } from '$lib/server/embeddings/registry.js';
|
import { createProviderFromProfile } from '$lib/server/embeddings/registry.js';
|
||||||
import { EmbeddingProfileMapper } from '$lib/server/mappers/embedding-profile.mapper.js';
|
import { EmbeddingProfileMapper } from '$lib/server/mappers/embedding-profile.mapper.js';
|
||||||
import { EmbeddingProfileEntity, type EmbeddingProfileEntityProps } from '$lib/server/models/embedding-profile.js';
|
import {
|
||||||
import type { EmbedWorkerRequest, EmbedWorkerResponse, WorkerInitData } from './worker-types.js';
|
EmbeddingProfileEntity,
|
||||||
|
type EmbeddingProfileEntityProps
|
||||||
|
} from '$lib/server/models/embedding-profile.js';
|
||||||
|
import type {
|
||||||
|
EmbedWorkerRequest,
|
||||||
|
EmbedWorkerResponse,
|
||||||
|
SerializedEmbedding,
|
||||||
|
WorkerInitData
|
||||||
|
} from './worker-types.js';
|
||||||
|
|
||||||
const { dbPath, embeddingProfileId } = workerData as WorkerInitData;
|
const { dbPath, embeddingProfileId } = workerData as WorkerInitData;
|
||||||
|
|
||||||
@@ -18,17 +27,12 @@ if (!embeddingProfileId) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
const db = new Database(dbPath);
|
const db = new Database(dbPath);
|
||||||
db.pragma('journal_mode = WAL');
|
applySqlitePragmas(db);
|
||||||
db.pragma('foreign_keys = ON');
|
|
||||||
db.pragma('busy_timeout = 5000');
|
|
||||||
db.pragma('synchronous = NORMAL');
|
|
||||||
db.pragma('cache_size = -65536');
|
|
||||||
db.pragma('temp_store = MEMORY');
|
|
||||||
db.pragma('mmap_size = 268435456');
|
|
||||||
db.pragma('wal_autocheckpoint = 1000');
|
|
||||||
|
|
||||||
// Load the embedding profile from DB
|
// Load the embedding profile from DB
|
||||||
const rawProfile = db.prepare('SELECT * FROM embedding_profiles WHERE id = ?').get(embeddingProfileId);
|
const rawProfile = db
|
||||||
|
.prepare('SELECT * FROM embedding_profiles WHERE id = ?')
|
||||||
|
.get(embeddingProfileId);
|
||||||
|
|
||||||
if (!rawProfile) {
|
if (!rawProfile) {
|
||||||
db.close();
|
db.close();
|
||||||
@@ -43,9 +47,55 @@ if (!rawProfile) {
|
|||||||
const profileEntity = new EmbeddingProfileEntity(rawProfile as EmbeddingProfileEntityProps);
|
const profileEntity = new EmbeddingProfileEntity(rawProfile as EmbeddingProfileEntityProps);
|
||||||
const profile = EmbeddingProfileMapper.fromEntity(profileEntity);
|
const profile = EmbeddingProfileMapper.fromEntity(profileEntity);
|
||||||
|
|
||||||
|
let pendingWrite: {
|
||||||
|
jobId: string;
|
||||||
|
resolve: () => void;
|
||||||
|
reject: (error: Error) => void;
|
||||||
|
} | null = null;
|
||||||
|
let currentJobId: string | null = null;
|
||||||
|
|
||||||
|
function requestWrite(
|
||||||
|
message: Extract<EmbedWorkerResponse, { type: 'write_embeddings' }>
|
||||||
|
): Promise<void> {
|
||||||
|
if (pendingWrite) {
|
||||||
|
return Promise.reject(new Error(`write request already in flight for ${pendingWrite.jobId}`));
|
||||||
|
}
|
||||||
|
|
||||||
|
return new Promise((resolve, reject) => {
|
||||||
|
pendingWrite = {
|
||||||
|
jobId: message.jobId,
|
||||||
|
resolve: () => {
|
||||||
|
pendingWrite = null;
|
||||||
|
resolve();
|
||||||
|
},
|
||||||
|
reject: (error: Error) => {
|
||||||
|
pendingWrite = null;
|
||||||
|
reject(error);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
parentPort!.postMessage(message);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
// Create provider and embedding service
|
// Create provider and embedding service
|
||||||
const provider = createProviderFromProfile(profile);
|
const provider = createProviderFromProfile(profile);
|
||||||
const embeddingService = new EmbeddingService(db, provider, embeddingProfileId);
|
const embeddingService = new EmbeddingService(db, provider, embeddingProfileId, {
|
||||||
|
persistEmbeddings: async (embeddings) => {
|
||||||
|
const serializedEmbeddings: SerializedEmbedding[] = embeddings.map((item) => ({
|
||||||
|
snippetId: item.snippetId,
|
||||||
|
profileId: item.profileId,
|
||||||
|
model: item.model,
|
||||||
|
dimensions: item.dimensions,
|
||||||
|
embedding: Uint8Array.from(item.embedding)
|
||||||
|
}));
|
||||||
|
|
||||||
|
await requestWrite({
|
||||||
|
type: 'write_embeddings',
|
||||||
|
jobId: currentJobId ?? 'unknown',
|
||||||
|
embeddings: serializedEmbeddings
|
||||||
|
});
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
// Signal ready after service initialization
|
// Signal ready after service initialization
|
||||||
parentPort!.postMessage({
|
parentPort!.postMessage({
|
||||||
@@ -53,12 +103,27 @@ parentPort!.postMessage({
|
|||||||
} satisfies EmbedWorkerResponse);
|
} satisfies EmbedWorkerResponse);
|
||||||
|
|
||||||
parentPort!.on('message', async (msg: EmbedWorkerRequest) => {
|
parentPort!.on('message', async (msg: EmbedWorkerRequest) => {
|
||||||
|
if (msg.type === 'write_ack') {
|
||||||
|
if (pendingWrite?.jobId === msg.jobId) {
|
||||||
|
pendingWrite.resolve();
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (msg.type === 'write_error') {
|
||||||
|
if (pendingWrite?.jobId === msg.jobId) {
|
||||||
|
pendingWrite.reject(new Error(msg.error));
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
if (msg.type === 'shutdown') {
|
if (msg.type === 'shutdown') {
|
||||||
db.close();
|
db.close();
|
||||||
process.exit(0);
|
process.exit(0);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (msg.type === 'embed') {
|
if (msg.type === 'embed') {
|
||||||
|
currentJobId = msg.jobId;
|
||||||
try {
|
try {
|
||||||
const snippetIds = embeddingService.findSnippetIdsMissingEmbeddings(
|
const snippetIds = embeddingService.findSnippetIdsMissingEmbeddings(
|
||||||
msg.repositoryId,
|
msg.repositoryId,
|
||||||
@@ -84,6 +149,8 @@ parentPort!.on('message', async (msg: EmbedWorkerRequest) => {
|
|||||||
jobId: msg.jobId,
|
jobId: msg.jobId,
|
||||||
error: err instanceof Error ? err.message : String(err)
|
error: err instanceof Error ? err.message : String(err)
|
||||||
} satisfies EmbedWorkerResponse);
|
} satisfies EmbedWorkerResponse);
|
||||||
|
} finally {
|
||||||
|
currentJobId = null;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|||||||
@@ -466,12 +466,15 @@ describe('IndexingPipeline', () => {
|
|||||||
const job1 = makeJob();
|
const job1 = makeJob();
|
||||||
await pipeline.run(job1 as never);
|
await pipeline.run(job1 as never);
|
||||||
|
|
||||||
const firstSnippetIds = (db.prepare(`SELECT id FROM snippets ORDER BY id`).all() as { id: string }[])
|
const firstSnippetIds = (
|
||||||
.map((row) => row.id);
|
db.prepare(`SELECT id FROM snippets ORDER BY id`).all() as { id: string }[]
|
||||||
|
).map((row) => row.id);
|
||||||
expect(firstSnippetIds.length).toBeGreaterThan(0);
|
expect(firstSnippetIds.length).toBeGreaterThan(0);
|
||||||
|
|
||||||
const firstEmbeddingCount = (
|
const firstEmbeddingCount = (
|
||||||
db.prepare(`SELECT COUNT(*) as n FROM snippet_embeddings WHERE profile_id = 'local-default'`).get() as {
|
db
|
||||||
|
.prepare(`SELECT COUNT(*) as n FROM snippet_embeddings WHERE profile_id = 'local-default'`)
|
||||||
|
.get() as {
|
||||||
n: number;
|
n: number;
|
||||||
}
|
}
|
||||||
).n;
|
).n;
|
||||||
@@ -483,11 +486,15 @@ describe('IndexingPipeline', () => {
|
|||||||
const job2 = db.prepare(`SELECT * FROM indexing_jobs WHERE id = ?`).get(job2Id) as never;
|
const job2 = db.prepare(`SELECT * FROM indexing_jobs WHERE id = ?`).get(job2Id) as never;
|
||||||
await pipeline.run(job2);
|
await pipeline.run(job2);
|
||||||
|
|
||||||
const secondSnippetIds = (db.prepare(`SELECT id FROM snippets ORDER BY id`).all() as {
|
const secondSnippetIds = (
|
||||||
|
db.prepare(`SELECT id FROM snippets ORDER BY id`).all() as {
|
||||||
id: string;
|
id: string;
|
||||||
}[]).map((row) => row.id);
|
}[]
|
||||||
|
).map((row) => row.id);
|
||||||
const secondEmbeddingCount = (
|
const secondEmbeddingCount = (
|
||||||
db.prepare(`SELECT COUNT(*) as n FROM snippet_embeddings WHERE profile_id = 'local-default'`).get() as {
|
db
|
||||||
|
.prepare(`SELECT COUNT(*) as n FROM snippet_embeddings WHERE profile_id = 'local-default'`)
|
||||||
|
.get() as {
|
||||||
n: number;
|
n: number;
|
||||||
}
|
}
|
||||||
).n;
|
).n;
|
||||||
@@ -918,9 +925,9 @@ describe('IndexingPipeline', () => {
|
|||||||
|
|
||||||
await pipeline.run(job as never);
|
await pipeline.run(job as never);
|
||||||
|
|
||||||
const docs = db
|
const docs = db.prepare(`SELECT file_path FROM documents ORDER BY file_path`).all() as {
|
||||||
.prepare(`SELECT file_path FROM documents ORDER BY file_path`)
|
file_path: string;
|
||||||
.all() as { file_path: string }[];
|
}[];
|
||||||
const filePaths = docs.map((d) => d.file_path);
|
const filePaths = docs.map((d) => d.file_path);
|
||||||
|
|
||||||
// migration-guide.md and docs/legacy-api.md must be absent.
|
// migration-guide.md and docs/legacy-api.md must be absent.
|
||||||
@@ -956,7 +963,10 @@ describe('IndexingPipeline', () => {
|
|||||||
|
|
||||||
expect(row).toBeDefined();
|
expect(row).toBeDefined();
|
||||||
const rules = JSON.parse(row!.rules);
|
const rules = JSON.parse(row!.rules);
|
||||||
expect(rules).toEqual(['Always use TypeScript strict mode', 'Prefer async/await over callbacks']);
|
expect(rules).toEqual([
|
||||||
|
'Always use TypeScript strict mode',
|
||||||
|
'Prefer async/await over callbacks'
|
||||||
|
]);
|
||||||
});
|
});
|
||||||
|
|
||||||
it('persists version-specific rules under (repositoryId, versionId) when job has versionId', async () => {
|
it('persists version-specific rules under (repositoryId, versionId) when job has versionId', async () => {
|
||||||
@@ -1219,12 +1229,7 @@ describe('differential indexing', () => {
|
|||||||
insertSnippet(db, doc1Id, { repository_id: '/test/repo', version_id: ancestorVersionId });
|
insertSnippet(db, doc1Id, { repository_id: '/test/repo', version_id: ancestorVersionId });
|
||||||
insertSnippet(db, doc2Id, { repository_id: '/test/repo', version_id: ancestorVersionId });
|
insertSnippet(db, doc2Id, { repository_id: '/test/repo', version_id: ancestorVersionId });
|
||||||
|
|
||||||
const pipeline = new IndexingPipeline(
|
const pipeline = new IndexingPipeline(db, vi.fn() as never, { crawl: vi.fn() } as never, null);
|
||||||
db,
|
|
||||||
vi.fn() as never,
|
|
||||||
{ crawl: vi.fn() } as never,
|
|
||||||
null
|
|
||||||
);
|
|
||||||
(pipeline as unknown as PipelineInternals).cloneFromAncestor(
|
(pipeline as unknown as PipelineInternals).cloneFromAncestor(
|
||||||
ancestorVersionId,
|
ancestorVersionId,
|
||||||
targetVersionId,
|
targetVersionId,
|
||||||
@@ -1236,9 +1241,7 @@ describe('differential indexing', () => {
|
|||||||
.prepare(`SELECT * FROM documents WHERE version_id = ?`)
|
.prepare(`SELECT * FROM documents WHERE version_id = ?`)
|
||||||
.all(targetVersionId) as { id: string; file_path: string }[];
|
.all(targetVersionId) as { id: string; file_path: string }[];
|
||||||
expect(targetDocs).toHaveLength(2);
|
expect(targetDocs).toHaveLength(2);
|
||||||
expect(targetDocs.map((d) => d.file_path).sort()).toEqual(
|
expect(targetDocs.map((d) => d.file_path).sort()).toEqual(['README.md', 'src/index.ts'].sort());
|
||||||
['README.md', 'src/index.ts'].sort()
|
|
||||||
);
|
|
||||||
// New IDs must differ from ancestor doc IDs.
|
// New IDs must differ from ancestor doc IDs.
|
||||||
const targetDocIds = targetDocs.map((d) => d.id);
|
const targetDocIds = targetDocs.map((d) => d.id);
|
||||||
expect(targetDocIds).not.toContain(doc1Id);
|
expect(targetDocIds).not.toContain(doc1Id);
|
||||||
@@ -1261,12 +1264,7 @@ describe('differential indexing', () => {
|
|||||||
checksum: 'sha-main'
|
checksum: 'sha-main'
|
||||||
});
|
});
|
||||||
|
|
||||||
const pipeline = new IndexingPipeline(
|
const pipeline = new IndexingPipeline(db, vi.fn() as never, { crawl: vi.fn() } as never, null);
|
||||||
db,
|
|
||||||
vi.fn() as never,
|
|
||||||
{ crawl: vi.fn() } as never,
|
|
||||||
null
|
|
||||||
);
|
|
||||||
(pipeline as unknown as PipelineInternals).cloneFromAncestor(
|
(pipeline as unknown as PipelineInternals).cloneFromAncestor(
|
||||||
ancestorVersionId,
|
ancestorVersionId,
|
||||||
targetVersionId,
|
targetVersionId,
|
||||||
@@ -1323,9 +1321,9 @@ describe('differential indexing', () => {
|
|||||||
|
|
||||||
await pipeline.run(job);
|
await pipeline.run(job);
|
||||||
|
|
||||||
const updatedJob = db
|
const updatedJob = db.prepare(`SELECT status FROM indexing_jobs WHERE id = ?`).get(jobId) as {
|
||||||
.prepare(`SELECT status FROM indexing_jobs WHERE id = ?`)
|
status: string;
|
||||||
.get(jobId) as { status: string };
|
};
|
||||||
expect(updatedJob.status).toBe('done');
|
expect(updatedJob.status).toBe('done');
|
||||||
|
|
||||||
const docs = db
|
const docs = db
|
||||||
@@ -1375,9 +1373,7 @@ describe('differential indexing', () => {
|
|||||||
deletedPaths: new Set<string>(),
|
deletedPaths: new Set<string>(),
|
||||||
unchangedPaths: new Set(['unchanged.md'])
|
unchangedPaths: new Set(['unchanged.md'])
|
||||||
};
|
};
|
||||||
const spy = vi
|
const spy = vi.spyOn(diffStrategy, 'buildDifferentialPlan').mockResolvedValueOnce(mockPlan);
|
||||||
.spyOn(diffStrategy, 'buildDifferentialPlan')
|
|
||||||
.mockResolvedValueOnce(mockPlan);
|
|
||||||
|
|
||||||
const pipeline = new IndexingPipeline(
|
const pipeline = new IndexingPipeline(
|
||||||
db,
|
db,
|
||||||
@@ -1398,9 +1394,9 @@ describe('differential indexing', () => {
|
|||||||
spy.mockRestore();
|
spy.mockRestore();
|
||||||
|
|
||||||
// 6. Assert job completed and both docs exist under the target version.
|
// 6. Assert job completed and both docs exist under the target version.
|
||||||
const finalJob = db
|
const finalJob = db.prepare(`SELECT status FROM indexing_jobs WHERE id = ?`).get(jobId) as {
|
||||||
.prepare(`SELECT status FROM indexing_jobs WHERE id = ?`)
|
status: string;
|
||||||
.get(jobId) as { status: string };
|
};
|
||||||
expect(finalJob.status).toBe('done');
|
expect(finalJob.status).toBe('done');
|
||||||
|
|
||||||
const targetDocs = db
|
const targetDocs = db
|
||||||
|
|||||||
@@ -28,6 +28,14 @@ import { parseFile } from '$lib/server/parser/index.js';
|
|||||||
import { computeTrustScore } from '$lib/server/search/trust-score.js';
|
import { computeTrustScore } from '$lib/server/search/trust-score.js';
|
||||||
import { computeDiff } from './diff.js';
|
import { computeDiff } from './diff.js';
|
||||||
import { buildDifferentialPlan, type DifferentialPlan } from './differential-strategy.js';
|
import { buildDifferentialPlan, type DifferentialPlan } from './differential-strategy.js';
|
||||||
|
import {
|
||||||
|
cloneFromAncestor as cloneFromAncestorInDatabase,
|
||||||
|
replaceSnippets as replaceSnippetsInDatabase,
|
||||||
|
updateRepo as updateRepoInDatabase,
|
||||||
|
updateVersion as updateVersionInDatabase,
|
||||||
|
type CloneFromAncestorRequest
|
||||||
|
} from './write-operations.js';
|
||||||
|
import type { SerializedFields } from './worker-types.js';
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
// Progress calculation
|
// Progress calculation
|
||||||
@@ -70,7 +78,23 @@ export class IndexingPipeline {
|
|||||||
private readonly db: Database.Database,
|
private readonly db: Database.Database,
|
||||||
private readonly githubCrawl: typeof GithubCrawlFn,
|
private readonly githubCrawl: typeof GithubCrawlFn,
|
||||||
private readonly localCrawler: LocalCrawler,
|
private readonly localCrawler: LocalCrawler,
|
||||||
private readonly embeddingService: EmbeddingService | null
|
private readonly embeddingService: EmbeddingService | null,
|
||||||
|
private readonly writeDelegate?: {
|
||||||
|
persistJobUpdates?: boolean;
|
||||||
|
replaceSnippets?: (
|
||||||
|
changedDocIds: string[],
|
||||||
|
newDocuments: NewDocument[],
|
||||||
|
newSnippets: NewSnippet[]
|
||||||
|
) => Promise<void>;
|
||||||
|
cloneFromAncestor?: (request: CloneFromAncestorRequest) => Promise<void>;
|
||||||
|
updateRepo?: (repositoryId: string, fields: SerializedFields) => Promise<void>;
|
||||||
|
updateVersion?: (versionId: string, fields: SerializedFields) => Promise<void>;
|
||||||
|
upsertRepoConfig?: (
|
||||||
|
repositoryId: string,
|
||||||
|
versionId: string | null,
|
||||||
|
rules: string[]
|
||||||
|
) => Promise<void>;
|
||||||
|
}
|
||||||
) {
|
) {
|
||||||
this.sqliteVecStore = new SqliteVecStore(db);
|
this.sqliteVecStore = new SqliteVecStore(db);
|
||||||
}
|
}
|
||||||
@@ -117,14 +141,12 @@ export class IndexingPipeline {
|
|||||||
if (!repo) throw new Error(`Repository ${repositoryId} not found`);
|
if (!repo) throw new Error(`Repository ${repositoryId} not found`);
|
||||||
|
|
||||||
// Mark repo as actively indexing.
|
// Mark repo as actively indexing.
|
||||||
this.updateRepo(repo.id, { state: 'indexing' });
|
await this.updateRepo(repo.id, { state: 'indexing' });
|
||||||
if (normJob.versionId) {
|
if (normJob.versionId) {
|
||||||
this.updateVersion(normJob.versionId, { state: 'indexing' });
|
await this.updateVersion(normJob.versionId, { state: 'indexing' });
|
||||||
}
|
}
|
||||||
|
|
||||||
const versionTag = normJob.versionId
|
const versionTag = normJob.versionId ? this.getVersionTag(normJob.versionId) : undefined;
|
||||||
? this.getVersionTag(normJob.versionId)
|
|
||||||
: undefined;
|
|
||||||
|
|
||||||
// ---- Stage 0: Differential strategy (TRUEREF-0021) ----------------------
|
// ---- Stage 0: Differential strategy (TRUEREF-0021) ----------------------
|
||||||
// When indexing a tagged version, check if we can inherit unchanged files
|
// When indexing a tagged version, check if we can inherit unchanged files
|
||||||
@@ -147,12 +169,12 @@ export class IndexingPipeline {
|
|||||||
// If a differential plan exists, clone unchanged files from ancestor.
|
// If a differential plan exists, clone unchanged files from ancestor.
|
||||||
if (differentialPlan && differentialPlan.unchangedPaths.size > 0) {
|
if (differentialPlan && differentialPlan.unchangedPaths.size > 0) {
|
||||||
reportStage('cloning');
|
reportStage('cloning');
|
||||||
this.cloneFromAncestor(
|
await this.cloneFromAncestor({
|
||||||
differentialPlan.ancestorVersionId,
|
ancestorVersionId: differentialPlan.ancestorVersionId,
|
||||||
normJob.versionId!,
|
targetVersionId: normJob.versionId!,
|
||||||
repo.id,
|
repositoryId: repo.id,
|
||||||
differentialPlan.unchangedPaths
|
unchangedPaths: [...differentialPlan.unchangedPaths]
|
||||||
);
|
});
|
||||||
console.info(
|
console.info(
|
||||||
`[IndexingPipeline] Differential indexing: cloned ${differentialPlan.unchangedPaths.size} unchanged files from ${differentialPlan.ancestorTag}`
|
`[IndexingPipeline] Differential indexing: cloned ${differentialPlan.unchangedPaths.size} unchanged files from ${differentialPlan.ancestorTag}`
|
||||||
);
|
);
|
||||||
@@ -174,7 +196,11 @@ export class IndexingPipeline {
|
|||||||
if (crawlResult.config) {
|
if (crawlResult.config) {
|
||||||
// Config was pre-parsed by the crawler — wrap it in a ParsedConfig
|
// Config was pre-parsed by the crawler — wrap it in a ParsedConfig
|
||||||
// shell so the rest of the pipeline can use it uniformly.
|
// shell so the rest of the pipeline can use it uniformly.
|
||||||
parsedConfig = { config: crawlResult.config, source: 'trueref.json', warnings: [] } satisfies ParsedConfig;
|
parsedConfig = {
|
||||||
|
config: crawlResult.config,
|
||||||
|
source: 'trueref.json',
|
||||||
|
warnings: []
|
||||||
|
} satisfies ParsedConfig;
|
||||||
} else {
|
} else {
|
||||||
const configFile = crawlResult.files.find(
|
const configFile = crawlResult.files.find(
|
||||||
(f) => f.path === 'trueref.json' || f.path === 'context7.json'
|
(f) => f.path === 'trueref.json' || f.path === 'context7.json'
|
||||||
@@ -189,7 +215,10 @@ export class IndexingPipeline {
|
|||||||
const filteredFiles =
|
const filteredFiles =
|
||||||
excludeFiles.length > 0
|
excludeFiles.length > 0
|
||||||
? crawlResult.files.filter(
|
? crawlResult.files.filter(
|
||||||
(f) => !excludeFiles.some((pattern) => IndexingPipeline.matchesExcludePattern(f.path, pattern))
|
(f) =>
|
||||||
|
!excludeFiles.some((pattern) =>
|
||||||
|
IndexingPipeline.matchesExcludePattern(f.path, pattern)
|
||||||
|
)
|
||||||
)
|
)
|
||||||
: crawlResult.files;
|
: crawlResult.files;
|
||||||
|
|
||||||
@@ -303,7 +332,13 @@ export class IndexingPipeline {
|
|||||||
this.embeddingService !== null
|
this.embeddingService !== null
|
||||||
);
|
);
|
||||||
this.updateJob(job.id, { processedFiles: totalProcessed, progress });
|
this.updateJob(job.id, { processedFiles: totalProcessed, progress });
|
||||||
reportStage('parsing', `${totalProcessed} / ${totalFiles} files`, progress, totalProcessed, totalFiles);
|
reportStage(
|
||||||
|
'parsing',
|
||||||
|
`${totalProcessed} / ${totalFiles} files`,
|
||||||
|
progress,
|
||||||
|
totalProcessed,
|
||||||
|
totalFiles
|
||||||
|
);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -312,7 +347,7 @@ export class IndexingPipeline {
|
|||||||
|
|
||||||
// ---- Stage 3: Atomic replacement ------------------------------------
|
// ---- Stage 3: Atomic replacement ------------------------------------
|
||||||
reportStage('storing');
|
reportStage('storing');
|
||||||
this.replaceSnippets(repo.id, changedDocIds, newDocuments, newSnippets);
|
await this.replaceSnippets(repo.id, changedDocIds, newDocuments, newSnippets);
|
||||||
|
|
||||||
// ---- Stage 4: Embeddings (if provider is configured) ----------------
|
// ---- Stage 4: Embeddings (if provider is configured) ----------------
|
||||||
if (this.embeddingService) {
|
if (this.embeddingService) {
|
||||||
@@ -350,7 +385,7 @@ export class IndexingPipeline {
|
|||||||
state: 'indexed'
|
state: 'indexed'
|
||||||
});
|
});
|
||||||
|
|
||||||
this.updateRepo(repo.id, {
|
await this.updateRepo(repo.id, {
|
||||||
state: 'indexed',
|
state: 'indexed',
|
||||||
totalSnippets: stats.totalSnippets,
|
totalSnippets: stats.totalSnippets,
|
||||||
totalTokens: stats.totalTokens,
|
totalTokens: stats.totalTokens,
|
||||||
@@ -360,7 +395,7 @@ export class IndexingPipeline {
|
|||||||
|
|
||||||
if (normJob.versionId) {
|
if (normJob.versionId) {
|
||||||
const versionStats = this.computeVersionStats(normJob.versionId);
|
const versionStats = this.computeVersionStats(normJob.versionId);
|
||||||
this.updateVersion(normJob.versionId, {
|
await this.updateVersion(normJob.versionId, {
|
||||||
state: 'indexed',
|
state: 'indexed',
|
||||||
totalSnippets: versionStats.totalSnippets,
|
totalSnippets: versionStats.totalSnippets,
|
||||||
indexedAt: Math.floor(Date.now() / 1000)
|
indexedAt: Math.floor(Date.now() / 1000)
|
||||||
@@ -371,12 +406,12 @@ export class IndexingPipeline {
|
|||||||
if (parsedConfig?.config.rules?.length) {
|
if (parsedConfig?.config.rules?.length) {
|
||||||
if (!normJob.versionId) {
|
if (!normJob.versionId) {
|
||||||
// Main-branch job: write the repo-wide entry only.
|
// Main-branch job: write the repo-wide entry only.
|
||||||
this.upsertRepoConfig(repo.id, null, parsedConfig.config.rules);
|
await this.upsertRepoConfig(repo.id, null, parsedConfig.config.rules);
|
||||||
} else {
|
} else {
|
||||||
// Version job: write only the version-specific entry.
|
// Version job: write only the version-specific entry.
|
||||||
// Writing to the NULL row here would overwrite repo-wide rules
|
// Writing to the NULL row here would overwrite repo-wide rules
|
||||||
// with whatever the last-indexed version happened to carry.
|
// with whatever the last-indexed version happened to carry.
|
||||||
this.upsertRepoConfig(repo.id, normJob.versionId, parsedConfig.config.rules);
|
await this.upsertRepoConfig(repo.id, normJob.versionId, parsedConfig.config.rules);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -398,9 +433,9 @@ export class IndexingPipeline {
|
|||||||
});
|
});
|
||||||
|
|
||||||
// Restore repo to error state but preserve any existing indexed data.
|
// Restore repo to error state but preserve any existing indexed data.
|
||||||
this.updateRepo(repositoryId, { state: 'error' });
|
await this.updateRepo(repositoryId, { state: 'error' });
|
||||||
if (normJob.versionId) {
|
if (normJob.versionId) {
|
||||||
this.updateVersion(normJob.versionId, { state: 'error' });
|
await this.updateVersion(normJob.versionId, { state: 'error' });
|
||||||
}
|
}
|
||||||
|
|
||||||
throw error;
|
throw error;
|
||||||
@@ -411,7 +446,11 @@ export class IndexingPipeline {
|
|||||||
// Private — crawl
|
// Private — crawl
|
||||||
// -------------------------------------------------------------------------
|
// -------------------------------------------------------------------------
|
||||||
|
|
||||||
private async crawl(repo: Repository, ref?: string, allowedPaths?: Set<string>): Promise<{
|
private async crawl(
|
||||||
|
repo: Repository,
|
||||||
|
ref?: string,
|
||||||
|
allowedPaths?: Set<string>
|
||||||
|
): Promise<{
|
||||||
files: Array<{ path: string; content: string; sha: string; size: number; language: string }>;
|
files: Array<{ path: string; content: string; sha: string; size: number; language: string }>;
|
||||||
totalFiles: number;
|
totalFiles: number;
|
||||||
/** Pre-parsed trueref.json / context7.json, or undefined when absent. */
|
/** Pre-parsed trueref.json / context7.json, or undefined when absent. */
|
||||||
@@ -473,219 +512,50 @@ export class IndexingPipeline {
|
|||||||
*
|
*
|
||||||
* Runs in a single SQLite transaction for atomicity.
|
* Runs in a single SQLite transaction for atomicity.
|
||||||
*/
|
*/
|
||||||
private cloneFromAncestor(
|
private async cloneFromAncestor(
|
||||||
ancestorVersionId: string,
|
requestOrAncestorVersionId: CloneFromAncestorRequest | string,
|
||||||
targetVersionId: string,
|
targetVersionId?: string,
|
||||||
repositoryId: string,
|
repositoryId?: string,
|
||||||
unchangedPaths: Set<string>
|
unchangedPaths?: Set<string>
|
||||||
): void {
|
): Promise<void> {
|
||||||
this.db.transaction(() => {
|
const request: CloneFromAncestorRequest =
|
||||||
const pathList = [...unchangedPaths];
|
typeof requestOrAncestorVersionId === 'string'
|
||||||
const placeholders = pathList.map(() => '?').join(',');
|
? {
|
||||||
const ancestorDocs = this.db
|
ancestorVersionId: requestOrAncestorVersionId,
|
||||||
.prepare(
|
targetVersionId: targetVersionId!,
|
||||||
`SELECT * FROM documents WHERE version_id = ? AND file_path IN (${placeholders})`
|
repositoryId: repositoryId!,
|
||||||
)
|
unchangedPaths: [...(unchangedPaths ?? new Set<string>())]
|
||||||
.all(ancestorVersionId, ...pathList) as Array<{
|
}
|
||||||
id: string;
|
: requestOrAncestorVersionId;
|
||||||
repository_id: string;
|
|
||||||
file_path: string;
|
|
||||||
title: string | null;
|
|
||||||
language: string | null;
|
|
||||||
token_count: number;
|
|
||||||
checksum: string;
|
|
||||||
indexed_at: number;
|
|
||||||
}>;
|
|
||||||
|
|
||||||
const docIdMap = new Map<string, string>();
|
if (request.unchangedPaths.length === 0) {
|
||||||
const nowEpoch = Math.floor(Date.now() / 1000);
|
return;
|
||||||
|
|
||||||
for (const doc of ancestorDocs) {
|
|
||||||
const newDocId = randomUUID();
|
|
||||||
docIdMap.set(doc.id, newDocId);
|
|
||||||
this.db
|
|
||||||
.prepare(
|
|
||||||
`INSERT INTO documents (id, repository_id, version_id, file_path, title, language, token_count, checksum, indexed_at)
|
|
||||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
|
||||||
)
|
|
||||||
.run(
|
|
||||||
newDocId,
|
|
||||||
repositoryId,
|
|
||||||
targetVersionId,
|
|
||||||
doc.file_path,
|
|
||||||
doc.title,
|
|
||||||
doc.language,
|
|
||||||
doc.token_count,
|
|
||||||
doc.checksum,
|
|
||||||
nowEpoch
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if (docIdMap.size === 0) return;
|
if (this.writeDelegate?.cloneFromAncestor) {
|
||||||
|
await this.writeDelegate.cloneFromAncestor(request);
|
||||||
const oldDocIds = [...docIdMap.keys()];
|
return;
|
||||||
const snippetPlaceholders = oldDocIds.map(() => '?').join(',');
|
|
||||||
const ancestorSnippets = this.db
|
|
||||||
.prepare(
|
|
||||||
`SELECT * FROM snippets WHERE document_id IN (${snippetPlaceholders})`
|
|
||||||
)
|
|
||||||
.all(...oldDocIds) as Array<{
|
|
||||||
id: string;
|
|
||||||
document_id: string;
|
|
||||||
repository_id: string;
|
|
||||||
version_id: string | null;
|
|
||||||
type: string;
|
|
||||||
title: string | null;
|
|
||||||
content: string;
|
|
||||||
language: string | null;
|
|
||||||
breadcrumb: string | null;
|
|
||||||
token_count: number;
|
|
||||||
created_at: number;
|
|
||||||
}>;
|
|
||||||
|
|
||||||
const snippetIdMap = new Map<string, string>();
|
|
||||||
for (const snippet of ancestorSnippets) {
|
|
||||||
const newSnippetId = randomUUID();
|
|
||||||
snippetIdMap.set(snippet.id, newSnippetId);
|
|
||||||
const newDocId = docIdMap.get(snippet.document_id)!;
|
|
||||||
this.db
|
|
||||||
.prepare(
|
|
||||||
`INSERT INTO snippets (id, document_id, repository_id, version_id, type, title, content, language, breadcrumb, token_count, created_at)
|
|
||||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
|
||||||
)
|
|
||||||
.run(
|
|
||||||
newSnippetId,
|
|
||||||
newDocId,
|
|
||||||
repositoryId,
|
|
||||||
targetVersionId,
|
|
||||||
snippet.type,
|
|
||||||
snippet.title,
|
|
||||||
snippet.content,
|
|
||||||
snippet.language,
|
|
||||||
snippet.breadcrumb,
|
|
||||||
snippet.token_count,
|
|
||||||
snippet.created_at
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if (snippetIdMap.size > 0) {
|
cloneFromAncestorInDatabase(this.db, request);
|
||||||
const oldSnippetIds = [...snippetIdMap.keys()];
|
|
||||||
const embPlaceholders = oldSnippetIds.map(() => '?').join(',');
|
|
||||||
const ancestorEmbeddings = this.db
|
|
||||||
.prepare(
|
|
||||||
`SELECT * FROM snippet_embeddings WHERE snippet_id IN (${embPlaceholders})`
|
|
||||||
)
|
|
||||||
.all(...oldSnippetIds) as Array<{
|
|
||||||
snippet_id: string;
|
|
||||||
profile_id: string;
|
|
||||||
model: string;
|
|
||||||
dimensions: number;
|
|
||||||
embedding: Buffer;
|
|
||||||
created_at: number;
|
|
||||||
}>;
|
|
||||||
for (const emb of ancestorEmbeddings) {
|
|
||||||
const newSnippetId = snippetIdMap.get(emb.snippet_id)!;
|
|
||||||
this.db
|
|
||||||
.prepare(
|
|
||||||
`INSERT INTO snippet_embeddings (snippet_id, profile_id, model, dimensions, embedding, created_at)
|
|
||||||
VALUES (?, ?, ?, ?, ?, ?)`
|
|
||||||
)
|
|
||||||
.run(
|
|
||||||
newSnippetId,
|
|
||||||
emb.profile_id,
|
|
||||||
emb.model,
|
|
||||||
emb.dimensions,
|
|
||||||
emb.embedding,
|
|
||||||
emb.created_at
|
|
||||||
);
|
|
||||||
this.sqliteVecStore.upsertEmbeddingBuffer(
|
|
||||||
emb.profile_id,
|
|
||||||
newSnippetId,
|
|
||||||
emb.embedding,
|
|
||||||
emb.dimensions
|
|
||||||
);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
})();
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// -------------------------------------------------------------------------
|
// -------------------------------------------------------------------------
|
||||||
// Private — atomic snippet replacement
|
// Private — atomic snippet replacement
|
||||||
// -------------------------------------------------------------------------
|
// -------------------------------------------------------------------------
|
||||||
|
|
||||||
private replaceSnippets(
|
private async replaceSnippets(
|
||||||
_repositoryId: string,
|
_repositoryId: string,
|
||||||
changedDocIds: string[],
|
changedDocIds: string[],
|
||||||
newDocuments: NewDocument[],
|
newDocuments: NewDocument[],
|
||||||
newSnippets: NewSnippet[]
|
newSnippets: NewSnippet[]
|
||||||
): void {
|
): Promise<void> {
|
||||||
const insertDoc = this.db.prepare(
|
if (this.writeDelegate?.replaceSnippets) {
|
||||||
`INSERT INTO documents
|
await this.writeDelegate.replaceSnippets(changedDocIds, newDocuments, newSnippets);
|
||||||
(id, repository_id, version_id, file_path, title, language,
|
return;
|
||||||
token_count, checksum, indexed_at)
|
|
||||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
|
||||||
);
|
|
||||||
|
|
||||||
const insertSnippet = this.db.prepare(
|
|
||||||
`INSERT INTO snippets
|
|
||||||
(id, document_id, repository_id, version_id, type, title,
|
|
||||||
content, language, breadcrumb, token_count, created_at)
|
|
||||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
|
||||||
);
|
|
||||||
|
|
||||||
this.db.transaction(() => {
|
|
||||||
this.sqliteVecStore.deleteEmbeddingsForDocumentIds(changedDocIds);
|
|
||||||
|
|
||||||
// Delete stale documents (cascade deletes their snippets via FK).
|
|
||||||
if (changedDocIds.length > 0) {
|
|
||||||
const placeholders = changedDocIds.map(() => '?').join(',');
|
|
||||||
this.db
|
|
||||||
.prepare(`DELETE FROM documents WHERE id IN (${placeholders})`)
|
|
||||||
.run(...changedDocIds);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// Insert new documents.
|
replaceSnippetsInDatabase(this.db, changedDocIds, newDocuments, newSnippets);
|
||||||
for (const doc of newDocuments) {
|
|
||||||
const indexedAtSeconds =
|
|
||||||
doc.indexedAt instanceof Date
|
|
||||||
? Math.floor(doc.indexedAt.getTime() / 1000)
|
|
||||||
: Math.floor(Date.now() / 1000);
|
|
||||||
|
|
||||||
insertDoc.run(
|
|
||||||
doc.id,
|
|
||||||
doc.repositoryId,
|
|
||||||
doc.versionId ?? null,
|
|
||||||
doc.filePath,
|
|
||||||
doc.title ?? null,
|
|
||||||
doc.language ?? null,
|
|
||||||
doc.tokenCount ?? 0,
|
|
||||||
doc.checksum,
|
|
||||||
indexedAtSeconds
|
|
||||||
);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Insert new snippets.
|
|
||||||
for (const snippet of newSnippets) {
|
|
||||||
const createdAtSeconds =
|
|
||||||
snippet.createdAt instanceof Date
|
|
||||||
? Math.floor(snippet.createdAt.getTime() / 1000)
|
|
||||||
: Math.floor(Date.now() / 1000);
|
|
||||||
|
|
||||||
insertSnippet.run(
|
|
||||||
snippet.id,
|
|
||||||
snippet.documentId,
|
|
||||||
snippet.repositoryId,
|
|
||||||
snippet.versionId ?? null,
|
|
||||||
snippet.type,
|
|
||||||
snippet.title ?? null,
|
|
||||||
snippet.content,
|
|
||||||
snippet.language ?? null,
|
|
||||||
snippet.breadcrumb ?? null,
|
|
||||||
snippet.tokenCount ?? 0,
|
|
||||||
createdAtSeconds
|
|
||||||
);
|
|
||||||
}
|
|
||||||
})();
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// -------------------------------------------------------------------------
|
// -------------------------------------------------------------------------
|
||||||
@@ -709,9 +579,10 @@ export class IndexingPipeline {
|
|||||||
|
|
||||||
private computeVersionStats(versionId: string): { totalSnippets: number } {
|
private computeVersionStats(versionId: string): { totalSnippets: number } {
|
||||||
const row = this.db
|
const row = this.db
|
||||||
.prepare<[string], { total_snippets: number }>(
|
.prepare<
|
||||||
`SELECT COUNT(*) as total_snippets FROM snippets WHERE version_id = ?`
|
[string],
|
||||||
)
|
{ total_snippets: number }
|
||||||
|
>(`SELECT COUNT(*) as total_snippets FROM snippets WHERE version_id = ?`)
|
||||||
.get(versionId);
|
.get(versionId);
|
||||||
|
|
||||||
return { totalSnippets: row?.total_snippets ?? 0 };
|
return { totalSnippets: row?.total_snippets ?? 0 };
|
||||||
@@ -750,6 +621,10 @@ export class IndexingPipeline {
|
|||||||
}
|
}
|
||||||
|
|
||||||
private updateJob(id: string, fields: Record<string, unknown>): void {
|
private updateJob(id: string, fields: Record<string, unknown>): void {
|
||||||
|
if (this.writeDelegate?.persistJobUpdates === false) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
const sets = Object.keys(fields)
|
const sets = Object.keys(fields)
|
||||||
.map((k) => `${toSnake(k)} = ?`)
|
.map((k) => `${toSnake(k)} = ?`)
|
||||||
.join(', ');
|
.join(', ');
|
||||||
@@ -757,43 +632,44 @@ export class IndexingPipeline {
|
|||||||
this.db.prepare(`UPDATE indexing_jobs SET ${sets} WHERE id = ?`).run(...values);
|
this.db.prepare(`UPDATE indexing_jobs SET ${sets} WHERE id = ?`).run(...values);
|
||||||
}
|
}
|
||||||
|
|
||||||
private updateRepo(id: string, fields: Record<string, unknown>): void {
|
private async updateRepo(id: string, fields: SerializedFields): Promise<void> {
|
||||||
const now = Math.floor(Date.now() / 1000);
|
if (this.writeDelegate?.updateRepo) {
|
||||||
const allFields = { ...fields, updatedAt: now };
|
await this.writeDelegate.updateRepo(id, fields);
|
||||||
const sets = Object.keys(allFields)
|
return;
|
||||||
.map((k) => `${toSnake(k)} = ?`)
|
|
||||||
.join(', ');
|
|
||||||
const values = [...Object.values(allFields), id];
|
|
||||||
this.db.prepare(`UPDATE repositories SET ${sets} WHERE id = ?`).run(...values);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
private updateVersion(id: string, fields: Record<string, unknown>): void {
|
updateRepoInDatabase(this.db, id, fields);
|
||||||
const sets = Object.keys(fields)
|
|
||||||
.map((k) => `${toSnake(k)} = ?`)
|
|
||||||
.join(', ');
|
|
||||||
const values = [...Object.values(fields), id];
|
|
||||||
this.db.prepare(`UPDATE repository_versions SET ${sets} WHERE id = ?`).run(...values);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
private upsertRepoConfig(
|
private async updateVersion(id: string, fields: SerializedFields): Promise<void> {
|
||||||
|
if (this.writeDelegate?.updateVersion) {
|
||||||
|
await this.writeDelegate.updateVersion(id, fields);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
updateVersionInDatabase(this.db, id, fields);
|
||||||
|
}
|
||||||
|
|
||||||
|
private async upsertRepoConfig(
|
||||||
repositoryId: string,
|
repositoryId: string,
|
||||||
versionId: string | null,
|
versionId: string | null,
|
||||||
rules: string[]
|
rules: string[]
|
||||||
): void {
|
): Promise<void> {
|
||||||
|
if (this.writeDelegate?.upsertRepoConfig) {
|
||||||
|
await this.writeDelegate.upsertRepoConfig(repositoryId, versionId, rules);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
const now = Math.floor(Date.now() / 1000);
|
const now = Math.floor(Date.now() / 1000);
|
||||||
// Use DELETE + INSERT because ON CONFLICT … DO UPDATE doesn't work reliably
|
// Use DELETE + INSERT because ON CONFLICT … DO UPDATE doesn't work reliably
|
||||||
// with partial unique indexes in all SQLite versions.
|
// with partial unique indexes in all SQLite versions.
|
||||||
if (versionId === null) {
|
if (versionId === null) {
|
||||||
this.db
|
this.db
|
||||||
.prepare(
|
.prepare(`DELETE FROM repository_configs WHERE repository_id = ? AND version_id IS NULL`)
|
||||||
`DELETE FROM repository_configs WHERE repository_id = ? AND version_id IS NULL`
|
|
||||||
)
|
|
||||||
.run(repositoryId);
|
.run(repositoryId);
|
||||||
} else {
|
} else {
|
||||||
this.db
|
this.db
|
||||||
.prepare(
|
.prepare(`DELETE FROM repository_configs WHERE repository_id = ? AND version_id = ?`)
|
||||||
`DELETE FROM repository_configs WHERE repository_id = ? AND version_id = ?`
|
|
||||||
)
|
|
||||||
.run(repositoryId, versionId);
|
.run(repositoryId, versionId);
|
||||||
}
|
}
|
||||||
this.db
|
this.db
|
||||||
|
|||||||
@@ -36,10 +36,10 @@ function normalizeStatuses(status?: JobStatusFilter): Array<IndexingJob['status'
|
|||||||
return [...new Set(statuses)];
|
return [...new Set(statuses)];
|
||||||
}
|
}
|
||||||
|
|
||||||
function buildJobFilterQuery(options?: {
|
function buildJobFilterQuery(options?: { repositoryId?: string; status?: JobStatusFilter }): {
|
||||||
repositoryId?: string;
|
where: string;
|
||||||
status?: JobStatusFilter;
|
params: unknown[];
|
||||||
}): { where: string; params: unknown[] } {
|
} {
|
||||||
const conditions: string[] = [];
|
const conditions: string[] = [];
|
||||||
const params: unknown[] = [];
|
const params: unknown[] = [];
|
||||||
|
|
||||||
@@ -164,7 +164,9 @@ export class JobQueue {
|
|||||||
*/
|
*/
|
||||||
private async processNext(): Promise<void> {
|
private async processNext(): Promise<void> {
|
||||||
// Fallback path: no worker pool configured, run directly (used by tests and dev mode)
|
// Fallback path: no worker pool configured, run directly (used by tests and dev mode)
|
||||||
console.warn('[JobQueue] Running in fallback mode (no worker pool) — direct pipeline execution.');
|
console.warn(
|
||||||
|
'[JobQueue] Running in fallback mode (no worker pool) — direct pipeline execution.'
|
||||||
|
);
|
||||||
|
|
||||||
const rawJob = this.db
|
const rawJob = this.db
|
||||||
.prepare<[], IndexingJobEntity>(
|
.prepare<[], IndexingJobEntity>(
|
||||||
@@ -176,7 +178,9 @@ export class JobQueue {
|
|||||||
|
|
||||||
if (!rawJob) return;
|
if (!rawJob) return;
|
||||||
|
|
||||||
console.warn('[JobQueue] processNext: no pipeline or pool configured — skipping job processing');
|
console.warn(
|
||||||
|
'[JobQueue] processNext: no pipeline or pool configured — skipping job processing'
|
||||||
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
|||||||
@@ -181,7 +181,9 @@ describe('ProgressBroadcaster', () => {
|
|||||||
concurrency: 2,
|
concurrency: 2,
|
||||||
active: 1,
|
active: 1,
|
||||||
idle: 1,
|
idle: 1,
|
||||||
workers: [{ index: 0, state: 'running', jobId: 'job-1', repositoryId: '/repo/1', versionId: null }]
|
workers: [
|
||||||
|
{ index: 0, state: 'running', jobId: 'job-1', repositoryId: '/repo/1', versionId: null }
|
||||||
|
]
|
||||||
});
|
});
|
||||||
|
|
||||||
const { value } = await reader.read();
|
const { value } = await reader.read();
|
||||||
|
|||||||
@@ -19,6 +19,7 @@ import { WorkerPool } from './worker-pool.js';
|
|||||||
import { initBroadcaster } from './progress-broadcaster.js';
|
import { initBroadcaster } from './progress-broadcaster.js';
|
||||||
import type { ProgressBroadcaster } from './progress-broadcaster.js';
|
import type { ProgressBroadcaster } from './progress-broadcaster.js';
|
||||||
import path from 'node:path';
|
import path from 'node:path';
|
||||||
|
import { existsSync } from 'node:fs';
|
||||||
import { fileURLToPath } from 'node:url';
|
import { fileURLToPath } from 'node:url';
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
@@ -57,6 +58,21 @@ let _pipeline: IndexingPipeline | null = null;
|
|||||||
let _pool: WorkerPool | null = null;
|
let _pool: WorkerPool | null = null;
|
||||||
let _broadcaster: ProgressBroadcaster | null = null;
|
let _broadcaster: ProgressBroadcaster | null = null;
|
||||||
|
|
||||||
|
function resolveWorkerScript(...segments: string[]): string {
|
||||||
|
const candidates = [
|
||||||
|
path.resolve(process.cwd(), ...segments),
|
||||||
|
path.resolve(path.dirname(fileURLToPath(import.meta.url)), '../../../../', ...segments)
|
||||||
|
];
|
||||||
|
|
||||||
|
for (const candidate of candidates) {
|
||||||
|
if (existsSync(candidate)) {
|
||||||
|
return candidate;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return candidates[0];
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Initialise (or return the existing) JobQueue + IndexingPipeline pair.
|
* Initialise (or return the existing) JobQueue + IndexingPipeline pair.
|
||||||
*
|
*
|
||||||
@@ -91,19 +107,17 @@ export function initializePipeline(
|
|||||||
|
|
||||||
const getRepositoryIdForJob = (jobId: string): string => {
|
const getRepositoryIdForJob = (jobId: string): string => {
|
||||||
const row = db
|
const row = db
|
||||||
.prepare<[string], { repository_id: string }>(
|
.prepare<
|
||||||
`SELECT repository_id FROM indexing_jobs WHERE id = ?`
|
[string],
|
||||||
)
|
{ repository_id: string }
|
||||||
|
>(`SELECT repository_id FROM indexing_jobs WHERE id = ?`)
|
||||||
.get(jobId);
|
.get(jobId);
|
||||||
return row?.repository_id ?? '';
|
return row?.repository_id ?? '';
|
||||||
};
|
};
|
||||||
|
|
||||||
// Resolve worker script paths relative to this file (build/workers/ directory)
|
const workerScript = resolveWorkerScript('build', 'workers', 'worker-entry.mjs');
|
||||||
const __filename = fileURLToPath(import.meta.url);
|
const embedWorkerScript = resolveWorkerScript('build', 'workers', 'embed-worker-entry.mjs');
|
||||||
const __dirname = path.dirname(__filename);
|
const writeWorkerScript = resolveWorkerScript('build', 'workers', 'write-worker-entry.mjs');
|
||||||
const workerScript = path.join(__dirname, '../../../build/workers/worker-entry.mjs');
|
|
||||||
const embedWorkerScript = path.join(__dirname, '../../../build/workers/embed-worker-entry.mjs');
|
|
||||||
const writeWorkerScript = path.join(__dirname, '../../../build/workers/write-worker-entry.mjs');
|
|
||||||
|
|
||||||
try {
|
try {
|
||||||
_pool = new WorkerPool({
|
_pool = new WorkerPool({
|
||||||
@@ -113,13 +127,6 @@ export function initializePipeline(
|
|||||||
writeWorkerScript,
|
writeWorkerScript,
|
||||||
dbPath: options.dbPath,
|
dbPath: options.dbPath,
|
||||||
onProgress: (jobId, msg) => {
|
onProgress: (jobId, msg) => {
|
||||||
// Update DB with progress
|
|
||||||
db.prepare(
|
|
||||||
`UPDATE indexing_jobs
|
|
||||||
SET stage = ?, stage_detail = ?, progress = ?, processed_files = ?, total_files = ?
|
|
||||||
WHERE id = ?`
|
|
||||||
).run(msg.stage, msg.stageDetail ?? null, msg.progress, msg.processedFiles, msg.totalFiles, jobId);
|
|
||||||
|
|
||||||
// Broadcast progress event
|
// Broadcast progress event
|
||||||
if (_broadcaster) {
|
if (_broadcaster) {
|
||||||
_broadcaster.broadcast(jobId, getRepositoryIdForJob(jobId), 'job-progress', {
|
_broadcaster.broadcast(jobId, getRepositoryIdForJob(jobId), 'job-progress', {
|
||||||
@@ -129,11 +136,6 @@ export function initializePipeline(
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
onJobDone: (jobId: string) => {
|
onJobDone: (jobId: string) => {
|
||||||
// Update job status to done
|
|
||||||
db.prepare(`UPDATE indexing_jobs SET status = 'done', completed_at = unixepoch() WHERE id = ?`).run(
|
|
||||||
jobId
|
|
||||||
);
|
|
||||||
|
|
||||||
// Broadcast done event
|
// Broadcast done event
|
||||||
if (_broadcaster) {
|
if (_broadcaster) {
|
||||||
_broadcaster.broadcast(jobId, getRepositoryIdForJob(jobId), 'job-done', {
|
_broadcaster.broadcast(jobId, getRepositoryIdForJob(jobId), 'job-done', {
|
||||||
@@ -143,11 +145,6 @@ export function initializePipeline(
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
onJobFailed: (jobId: string, error: string) => {
|
onJobFailed: (jobId: string, error: string) => {
|
||||||
// Update job status to failed with error message
|
|
||||||
db.prepare(
|
|
||||||
`UPDATE indexing_jobs SET status = 'failed', error = ?, completed_at = unixepoch() WHERE id = ?`
|
|
||||||
).run(error, jobId);
|
|
||||||
|
|
||||||
// Broadcast failed event
|
// Broadcast failed event
|
||||||
if (_broadcaster) {
|
if (_broadcaster) {
|
||||||
_broadcaster.broadcast(jobId, getRepositoryIdForJob(jobId), 'job-failed', {
|
_broadcaster.broadcast(jobId, getRepositoryIdForJob(jobId), 'job-failed', {
|
||||||
@@ -231,5 +228,3 @@ export function _resetSingletons(): void {
|
|||||||
_pool = null;
|
_pool = null;
|
||||||
_broadcaster = null;
|
_broadcaster = null;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -5,24 +5,175 @@ import { crawl as githubCrawl } from '$lib/server/crawler/github.crawler.js';
|
|||||||
import { LocalCrawler } from '$lib/server/crawler/local.crawler.js';
|
import { LocalCrawler } from '$lib/server/crawler/local.crawler.js';
|
||||||
import { IndexingJobMapper } from '$lib/server/mappers/indexing-job.mapper.js';
|
import { IndexingJobMapper } from '$lib/server/mappers/indexing-job.mapper.js';
|
||||||
import { IndexingJobEntity, type IndexingJobEntityProps } from '$lib/server/models/indexing-job.js';
|
import { IndexingJobEntity, type IndexingJobEntityProps } from '$lib/server/models/indexing-job.js';
|
||||||
import type { ParseWorkerRequest, ParseWorkerResponse, WorkerInitData } from './worker-types.js';
|
import { applySqlitePragmas } from '$lib/server/db/connection.js';
|
||||||
|
import type {
|
||||||
|
ParseWorkerRequest,
|
||||||
|
ParseWorkerResponse,
|
||||||
|
SerializedDocument,
|
||||||
|
SerializedSnippet,
|
||||||
|
WorkerInitData
|
||||||
|
} from './worker-types.js';
|
||||||
import type { IndexingStage } from '$lib/types.js';
|
import type { IndexingStage } from '$lib/types.js';
|
||||||
|
|
||||||
const { dbPath } = workerData as WorkerInitData;
|
const { dbPath } = workerData as WorkerInitData;
|
||||||
const db = new Database(dbPath);
|
const db = new Database(dbPath);
|
||||||
db.pragma('journal_mode = WAL');
|
applySqlitePragmas(db);
|
||||||
db.pragma('foreign_keys = ON');
|
|
||||||
db.pragma('busy_timeout = 5000');
|
|
||||||
db.pragma('synchronous = NORMAL');
|
|
||||||
db.pragma('cache_size = -65536');
|
|
||||||
db.pragma('temp_store = MEMORY');
|
|
||||||
db.pragma('mmap_size = 268435456');
|
|
||||||
db.pragma('wal_autocheckpoint = 1000');
|
|
||||||
|
|
||||||
const pipeline = new IndexingPipeline(db, githubCrawl, new LocalCrawler(), null);
|
let pendingWrite: {
|
||||||
|
jobId: string;
|
||||||
|
resolve: () => void;
|
||||||
|
reject: (error: Error) => void;
|
||||||
|
} | null = null;
|
||||||
|
|
||||||
|
function serializeDocument(document: {
|
||||||
|
id: string;
|
||||||
|
repositoryId: string;
|
||||||
|
versionId?: string | null;
|
||||||
|
filePath: string;
|
||||||
|
title?: string | null;
|
||||||
|
language?: string | null;
|
||||||
|
tokenCount?: number | null;
|
||||||
|
checksum: string;
|
||||||
|
indexedAt: Date;
|
||||||
|
}): SerializedDocument {
|
||||||
|
return {
|
||||||
|
id: document.id,
|
||||||
|
repositoryId: document.repositoryId,
|
||||||
|
versionId: document.versionId ?? null,
|
||||||
|
filePath: document.filePath,
|
||||||
|
title: document.title ?? null,
|
||||||
|
language: document.language ?? null,
|
||||||
|
tokenCount: document.tokenCount ?? 0,
|
||||||
|
checksum: document.checksum,
|
||||||
|
indexedAt: Math.floor(document.indexedAt.getTime() / 1000)
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function serializeSnippet(snippet: {
|
||||||
|
id: string;
|
||||||
|
documentId: string;
|
||||||
|
repositoryId: string;
|
||||||
|
versionId?: string | null;
|
||||||
|
type: 'code' | 'info';
|
||||||
|
title?: string | null;
|
||||||
|
content: string;
|
||||||
|
language?: string | null;
|
||||||
|
breadcrumb?: string | null;
|
||||||
|
tokenCount?: number | null;
|
||||||
|
createdAt: Date;
|
||||||
|
}): SerializedSnippet {
|
||||||
|
return {
|
||||||
|
id: snippet.id,
|
||||||
|
documentId: snippet.documentId,
|
||||||
|
repositoryId: snippet.repositoryId,
|
||||||
|
versionId: snippet.versionId ?? null,
|
||||||
|
type: snippet.type,
|
||||||
|
title: snippet.title ?? null,
|
||||||
|
content: snippet.content,
|
||||||
|
language: snippet.language ?? null,
|
||||||
|
breadcrumb: snippet.breadcrumb ?? null,
|
||||||
|
tokenCount: snippet.tokenCount ?? 0,
|
||||||
|
createdAt: Math.floor(snippet.createdAt.getTime() / 1000)
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function requestWrite(
|
||||||
|
message: Extract<
|
||||||
|
ParseWorkerResponse,
|
||||||
|
{
|
||||||
|
type:
|
||||||
|
| 'write_replace'
|
||||||
|
| 'write_clone'
|
||||||
|
| 'write_repo_update'
|
||||||
|
| 'write_version_update'
|
||||||
|
| 'write_repo_config';
|
||||||
|
}
|
||||||
|
>
|
||||||
|
): Promise<void> {
|
||||||
|
if (pendingWrite) {
|
||||||
|
return Promise.reject(new Error(`write request already in flight for ${pendingWrite.jobId}`));
|
||||||
|
}
|
||||||
|
|
||||||
|
return new Promise((resolve, reject) => {
|
||||||
|
pendingWrite = {
|
||||||
|
jobId: message.jobId,
|
||||||
|
resolve: () => {
|
||||||
|
pendingWrite = null;
|
||||||
|
resolve();
|
||||||
|
},
|
||||||
|
reject: (error: Error) => {
|
||||||
|
pendingWrite = null;
|
||||||
|
reject(error);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
parentPort!.postMessage(message);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
const pipeline = new IndexingPipeline(db, githubCrawl, new LocalCrawler(), null, {
|
||||||
|
persistJobUpdates: false,
|
||||||
|
replaceSnippets: async (changedDocIds, newDocuments, newSnippets) => {
|
||||||
|
await requestWrite({
|
||||||
|
type: 'write_replace',
|
||||||
|
jobId: currentJobId ?? 'unknown',
|
||||||
|
changedDocIds,
|
||||||
|
documents: newDocuments.map(serializeDocument),
|
||||||
|
snippets: newSnippets.map(serializeSnippet)
|
||||||
|
});
|
||||||
|
},
|
||||||
|
cloneFromAncestor: async (request) => {
|
||||||
|
await requestWrite({
|
||||||
|
type: 'write_clone',
|
||||||
|
jobId: currentJobId ?? 'unknown',
|
||||||
|
ancestorVersionId: request.ancestorVersionId,
|
||||||
|
targetVersionId: request.targetVersionId,
|
||||||
|
repositoryId: request.repositoryId,
|
||||||
|
unchangedPaths: request.unchangedPaths
|
||||||
|
});
|
||||||
|
},
|
||||||
|
updateRepo: async (repositoryId, fields) => {
|
||||||
|
await requestWrite({
|
||||||
|
type: 'write_repo_update',
|
||||||
|
jobId: currentJobId ?? 'unknown',
|
||||||
|
repositoryId,
|
||||||
|
fields
|
||||||
|
});
|
||||||
|
},
|
||||||
|
updateVersion: async (versionId, fields) => {
|
||||||
|
await requestWrite({
|
||||||
|
type: 'write_version_update',
|
||||||
|
jobId: currentJobId ?? 'unknown',
|
||||||
|
versionId,
|
||||||
|
fields
|
||||||
|
});
|
||||||
|
},
|
||||||
|
upsertRepoConfig: async (repositoryId, versionId, rules) => {
|
||||||
|
await requestWrite({
|
||||||
|
type: 'write_repo_config',
|
||||||
|
jobId: currentJobId ?? 'unknown',
|
||||||
|
repositoryId,
|
||||||
|
versionId,
|
||||||
|
rules
|
||||||
|
});
|
||||||
|
}
|
||||||
|
});
|
||||||
let currentJobId: string | null = null;
|
let currentJobId: string | null = null;
|
||||||
|
|
||||||
parentPort!.on('message', async (msg: ParseWorkerRequest) => {
|
parentPort!.on('message', async (msg: ParseWorkerRequest) => {
|
||||||
|
if (msg.type === 'write_ack') {
|
||||||
|
if (pendingWrite?.jobId === msg.jobId) {
|
||||||
|
pendingWrite.resolve();
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (msg.type === 'write_error') {
|
||||||
|
if (pendingWrite?.jobId === msg.jobId) {
|
||||||
|
pendingWrite.reject(new Error(msg.error));
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
if (msg.type === 'shutdown') {
|
if (msg.type === 'shutdown') {
|
||||||
db.close();
|
db.close();
|
||||||
process.exit(0);
|
process.exit(0);
|
||||||
@@ -35,11 +186,19 @@ parentPort!.on('message', async (msg: ParseWorkerRequest) => {
|
|||||||
if (!rawJob) {
|
if (!rawJob) {
|
||||||
throw new Error(`Job ${msg.jobId} not found`);
|
throw new Error(`Job ${msg.jobId} not found`);
|
||||||
}
|
}
|
||||||
const job = IndexingJobMapper.fromEntity(new IndexingJobEntity(rawJob as IndexingJobEntityProps));
|
const job = IndexingJobMapper.fromEntity(
|
||||||
|
new IndexingJobEntity(rawJob as IndexingJobEntityProps)
|
||||||
|
);
|
||||||
|
|
||||||
await pipeline.run(
|
await pipeline.run(
|
||||||
job,
|
job,
|
||||||
(stage: IndexingStage, detail?: string, progress?: number, processedFiles?: number, totalFiles?: number) => {
|
(
|
||||||
|
stage: IndexingStage,
|
||||||
|
detail?: string,
|
||||||
|
progress?: number,
|
||||||
|
processedFiles?: number,
|
||||||
|
totalFiles?: number
|
||||||
|
) => {
|
||||||
parentPort!.postMessage({
|
parentPort!.postMessage({
|
||||||
type: 'progress',
|
type: 'progress',
|
||||||
jobId: msg.jobId,
|
jobId: msg.jobId,
|
||||||
|
|||||||
@@ -8,7 +8,6 @@
|
|||||||
|
|
||||||
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
|
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
|
||||||
import { writeFileSync, unlinkSync, existsSync } from 'node:fs';
|
import { writeFileSync, unlinkSync, existsSync } from 'node:fs';
|
||||||
import { EventEmitter } from 'node:events';
|
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
// Hoist FakeWorker + registry so vi.mock can reference them.
|
// Hoist FakeWorker + registry so vi.mock can reference them.
|
||||||
@@ -36,7 +35,7 @@ const { createdWorkers, FakeWorker } = vi.hoisted(() => {
|
|||||||
this.threadId = 0;
|
this.threadId = 0;
|
||||||
});
|
});
|
||||||
|
|
||||||
constructor(_script: string, _opts?: unknown) {
|
constructor() {
|
||||||
super();
|
super();
|
||||||
createdWorkers.push(this);
|
createdWorkers.push(this);
|
||||||
}
|
}
|
||||||
@@ -67,6 +66,7 @@ function makeOpts(overrides: Partial<WorkerPoolOptions> = {}): WorkerPoolOptions
|
|||||||
concurrency: 2,
|
concurrency: 2,
|
||||||
workerScript: FAKE_SCRIPT,
|
workerScript: FAKE_SCRIPT,
|
||||||
embedWorkerScript: MISSING_SCRIPT,
|
embedWorkerScript: MISSING_SCRIPT,
|
||||||
|
writeWorkerScript: MISSING_SCRIPT,
|
||||||
dbPath: ':memory:',
|
dbPath: ':memory:',
|
||||||
onProgress: vi.fn(),
|
onProgress: vi.fn(),
|
||||||
onJobDone: vi.fn(),
|
onJobDone: vi.fn(),
|
||||||
@@ -142,6 +142,12 @@ describe('WorkerPool normal mode', () => {
|
|||||||
expect(createdWorkers).toHaveLength(3);
|
expect(createdWorkers).toHaveLength(3);
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('spawns a write worker when writeWorkerScript exists', () => {
|
||||||
|
new WorkerPool(makeOpts({ concurrency: 2, writeWorkerScript: FAKE_SCRIPT }));
|
||||||
|
|
||||||
|
expect(createdWorkers).toHaveLength(3);
|
||||||
|
});
|
||||||
|
|
||||||
// -------------------------------------------------------------------------
|
// -------------------------------------------------------------------------
|
||||||
// enqueue dispatches to an idle worker
|
// enqueue dispatches to an idle worker
|
||||||
// -------------------------------------------------------------------------
|
// -------------------------------------------------------------------------
|
||||||
@@ -208,8 +214,12 @@ describe('WorkerPool normal mode', () => {
|
|||||||
const runCalls = createdWorkers.flatMap((w) =>
|
const runCalls = createdWorkers.flatMap((w) =>
|
||||||
w.postMessage.mock.calls.filter((c) => (c[0] as { type: string })?.type === 'run')
|
w.postMessage.mock.calls.filter((c) => (c[0] as { type: string })?.type === 'run')
|
||||||
);
|
);
|
||||||
expect(runCalls.filter((c) => (c[0] as unknown as { jobId: string }).jobId === 'job-1')).toHaveLength(1);
|
expect(
|
||||||
expect(runCalls.filter((c) => (c[0] as unknown as { jobId: string }).jobId === 'job-2')).toHaveLength(0);
|
runCalls.filter((c) => (c[0] as unknown as { jobId: string }).jobId === 'job-1')
|
||||||
|
).toHaveLength(1);
|
||||||
|
expect(
|
||||||
|
runCalls.filter((c) => (c[0] as unknown as { jobId: string }).jobId === 'job-2')
|
||||||
|
).toHaveLength(0);
|
||||||
});
|
});
|
||||||
|
|
||||||
it('starts jobs for different repos concurrently', () => {
|
it('starts jobs for different repos concurrently', () => {
|
||||||
@@ -227,6 +237,83 @@ describe('WorkerPool normal mode', () => {
|
|||||||
expect(dispatchedIds).toContain('job-beta');
|
expect(dispatchedIds).toContain('job-beta');
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('dispatches same-repo jobs concurrently when versionIds differ', () => {
|
||||||
|
const pool = new WorkerPool(makeOpts({ concurrency: 2 }));
|
||||||
|
|
||||||
|
pool.enqueue('job-v1', '/repo/same', 'v1');
|
||||||
|
pool.enqueue('job-v2', '/repo/same', 'v2');
|
||||||
|
|
||||||
|
const runCalls = createdWorkers.flatMap((w) =>
|
||||||
|
w.postMessage.mock.calls.filter((c) => (c[0] as { type: string })?.type === 'run')
|
||||||
|
);
|
||||||
|
const dispatchedIds = runCalls.map((c) => (c[0] as unknown as { jobId: string }).jobId);
|
||||||
|
expect(dispatchedIds).toContain('job-v1');
|
||||||
|
expect(dispatchedIds).toContain('job-v2');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('forwards write worker acknowledgements back to the originating parse worker', () => {
|
||||||
|
new WorkerPool(makeOpts({ concurrency: 1, writeWorkerScript: FAKE_SCRIPT }));
|
||||||
|
const parseWorker = createdWorkers[0];
|
||||||
|
const writeWorker = createdWorkers[1];
|
||||||
|
writeWorker.emit('message', { type: 'ready' });
|
||||||
|
|
||||||
|
parseWorker.emit('message', {
|
||||||
|
type: 'write_replace',
|
||||||
|
jobId: 'job-write',
|
||||||
|
changedDocIds: [],
|
||||||
|
documents: [],
|
||||||
|
snippets: []
|
||||||
|
});
|
||||||
|
writeWorker.emit('message', { type: 'write_ack', jobId: 'job-write' });
|
||||||
|
|
||||||
|
expect(writeWorker.postMessage).toHaveBeenCalledWith({
|
||||||
|
type: 'write_replace',
|
||||||
|
jobId: 'job-write',
|
||||||
|
changedDocIds: [],
|
||||||
|
documents: [],
|
||||||
|
snippets: []
|
||||||
|
});
|
||||||
|
expect(parseWorker.postMessage).toHaveBeenCalledWith({ type: 'write_ack', jobId: 'job-write' });
|
||||||
|
});
|
||||||
|
|
||||||
|
it('forwards write worker acknowledgements back to the embed worker', () => {
|
||||||
|
new WorkerPool(
|
||||||
|
makeOpts({
|
||||||
|
concurrency: 1,
|
||||||
|
writeWorkerScript: FAKE_SCRIPT,
|
||||||
|
embedWorkerScript: FAKE_SCRIPT,
|
||||||
|
embeddingProfileId: 'local-default'
|
||||||
|
})
|
||||||
|
);
|
||||||
|
const parseWorker = createdWorkers[0];
|
||||||
|
const embedWorker = createdWorkers[1];
|
||||||
|
const writeWorker = createdWorkers[2];
|
||||||
|
writeWorker.emit('message', { type: 'ready' });
|
||||||
|
embedWorker.emit('message', { type: 'ready' });
|
||||||
|
|
||||||
|
embedWorker.emit('message', {
|
||||||
|
type: 'write_embeddings',
|
||||||
|
jobId: 'job-embed',
|
||||||
|
embeddings: []
|
||||||
|
});
|
||||||
|
writeWorker.emit('message', { type: 'write_ack', jobId: 'job-embed', embeddingCount: 0 });
|
||||||
|
|
||||||
|
expect(parseWorker.postMessage).not.toHaveBeenCalledWith({
|
||||||
|
type: 'write_ack',
|
||||||
|
jobId: 'job-embed'
|
||||||
|
});
|
||||||
|
expect(writeWorker.postMessage).toHaveBeenCalledWith({
|
||||||
|
type: 'write_embeddings',
|
||||||
|
jobId: 'job-embed',
|
||||||
|
embeddings: []
|
||||||
|
});
|
||||||
|
expect(embedWorker.postMessage).toHaveBeenCalledWith({
|
||||||
|
type: 'write_ack',
|
||||||
|
jobId: 'job-embed',
|
||||||
|
embeddingCount: 0
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
// -------------------------------------------------------------------------
|
// -------------------------------------------------------------------------
|
||||||
// Worker crash (exit code != 0)
|
// Worker crash (exit code != 0)
|
||||||
// -------------------------------------------------------------------------
|
// -------------------------------------------------------------------------
|
||||||
@@ -248,7 +335,7 @@ describe('WorkerPool normal mode', () => {
|
|||||||
|
|
||||||
it('does NOT call onJobFailed when a worker exits cleanly (code 0)', () => {
|
it('does NOT call onJobFailed when a worker exits cleanly (code 0)', () => {
|
||||||
const opts = makeOpts({ concurrency: 1 });
|
const opts = makeOpts({ concurrency: 1 });
|
||||||
const pool = new WorkerPool(opts);
|
new WorkerPool(opts);
|
||||||
|
|
||||||
// Exit without any running job
|
// Exit without any running job
|
||||||
const worker = createdWorkers[0];
|
const worker = createdWorkers[0];
|
||||||
|
|||||||
@@ -6,9 +6,12 @@ import type {
|
|||||||
EmbedWorkerRequest,
|
EmbedWorkerRequest,
|
||||||
EmbedWorkerResponse,
|
EmbedWorkerResponse,
|
||||||
WorkerInitData,
|
WorkerInitData,
|
||||||
|
WriteWorkerRequest,
|
||||||
WriteWorkerResponse
|
WriteWorkerResponse
|
||||||
} from './worker-types.js';
|
} from './worker-types.js';
|
||||||
|
|
||||||
|
type InFlightWriteRequest = Exclude<WriteWorkerRequest, { type: 'shutdown' }>;
|
||||||
|
|
||||||
export interface WorkerPoolOptions {
|
export interface WorkerPoolOptions {
|
||||||
concurrency: number;
|
concurrency: number;
|
||||||
workerScript: string;
|
workerScript: string;
|
||||||
@@ -68,6 +71,7 @@ export class WorkerPool {
|
|||||||
private runningJobs = new Map<Worker, RunningJob>();
|
private runningJobs = new Map<Worker, RunningJob>();
|
||||||
private runningJobKeys = new Set<string>();
|
private runningJobKeys = new Set<string>();
|
||||||
private embedQueue: EmbedQueuedJob[] = [];
|
private embedQueue: EmbedQueuedJob[] = [];
|
||||||
|
private pendingWriteWorkers = new Map<string, Worker>();
|
||||||
private options: WorkerPoolOptions;
|
private options: WorkerPoolOptions;
|
||||||
private fallbackMode = false;
|
private fallbackMode = false;
|
||||||
private shuttingDown = false;
|
private shuttingDown = false;
|
||||||
@@ -179,7 +183,11 @@ export class WorkerPool {
|
|||||||
const job = this.jobQueue.splice(jobIdx, 1)[0];
|
const job = this.jobQueue.splice(jobIdx, 1)[0];
|
||||||
const worker = this.idleWorkers.pop()!;
|
const worker = this.idleWorkers.pop()!;
|
||||||
|
|
||||||
this.runningJobs.set(worker, { jobId: job.jobId, repositoryId: job.repositoryId, versionId: job.versionId });
|
this.runningJobs.set(worker, {
|
||||||
|
jobId: job.jobId,
|
||||||
|
repositoryId: job.repositoryId,
|
||||||
|
versionId: job.versionId
|
||||||
|
});
|
||||||
this.runningJobKeys.add(WorkerPool.jobKey(job.repositoryId, job.versionId));
|
this.runningJobKeys.add(WorkerPool.jobKey(job.repositoryId, job.versionId));
|
||||||
statusChanged = true;
|
statusChanged = true;
|
||||||
|
|
||||||
@@ -192,14 +200,66 @@ export class WorkerPool {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
private postWriteRequest(request: InFlightWriteRequest, worker?: Worker): void {
|
||||||
|
if (!this.writeWorker || !this.writeReady) {
|
||||||
|
if (worker) {
|
||||||
|
worker.postMessage({
|
||||||
|
type: 'write_error',
|
||||||
|
jobId: request.jobId,
|
||||||
|
error: 'Write worker is not ready'
|
||||||
|
} satisfies ParseWorkerRequest);
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (worker) {
|
||||||
|
this.pendingWriteWorkers.set(request.jobId, worker);
|
||||||
|
}
|
||||||
|
|
||||||
|
this.writeWorker.postMessage(request);
|
||||||
|
}
|
||||||
|
|
||||||
private onWorkerMessage(worker: Worker, msg: ParseWorkerResponse): void {
|
private onWorkerMessage(worker: Worker, msg: ParseWorkerResponse): void {
|
||||||
if (msg.type === 'progress') {
|
if (msg.type === 'progress') {
|
||||||
|
this.postWriteRequest({
|
||||||
|
type: 'write_job_update',
|
||||||
|
jobId: msg.jobId,
|
||||||
|
fields: {
|
||||||
|
status: 'running',
|
||||||
|
startedAt: Math.floor(Date.now() / 1000),
|
||||||
|
stage: msg.stage,
|
||||||
|
stageDetail: msg.stageDetail ?? null,
|
||||||
|
progress: msg.progress,
|
||||||
|
processedFiles: msg.processedFiles,
|
||||||
|
totalFiles: msg.totalFiles
|
||||||
|
}
|
||||||
|
});
|
||||||
this.options.onProgress(msg.jobId, msg);
|
this.options.onProgress(msg.jobId, msg);
|
||||||
|
} else if (
|
||||||
|
msg.type === 'write_replace' ||
|
||||||
|
msg.type === 'write_clone' ||
|
||||||
|
msg.type === 'write_repo_update' ||
|
||||||
|
msg.type === 'write_version_update' ||
|
||||||
|
msg.type === 'write_repo_config'
|
||||||
|
) {
|
||||||
|
this.postWriteRequest(msg, worker);
|
||||||
} else if (msg.type === 'done') {
|
} else if (msg.type === 'done') {
|
||||||
const runningJob = this.runningJobs.get(worker);
|
const runningJob = this.runningJobs.get(worker);
|
||||||
|
this.postWriteRequest({
|
||||||
|
type: 'write_job_update',
|
||||||
|
jobId: msg.jobId,
|
||||||
|
fields: {
|
||||||
|
status: 'done',
|
||||||
|
stage: 'done',
|
||||||
|
progress: 100,
|
||||||
|
completedAt: Math.floor(Date.now() / 1000)
|
||||||
|
}
|
||||||
|
});
|
||||||
if (runningJob) {
|
if (runningJob) {
|
||||||
this.runningJobs.delete(worker);
|
this.runningJobs.delete(worker);
|
||||||
this.runningJobKeys.delete(WorkerPool.jobKey(runningJob.repositoryId, runningJob.versionId));
|
this.runningJobKeys.delete(
|
||||||
|
WorkerPool.jobKey(runningJob.repositoryId, runningJob.versionId)
|
||||||
|
);
|
||||||
}
|
}
|
||||||
this.idleWorkers.push(worker);
|
this.idleWorkers.push(worker);
|
||||||
this.options.onJobDone(msg.jobId);
|
this.options.onJobDone(msg.jobId);
|
||||||
@@ -207,20 +267,32 @@ export class WorkerPool {
|
|||||||
|
|
||||||
// If embedding configured, enqueue embed request
|
// If embedding configured, enqueue embed request
|
||||||
if (this.embedWorker && this.options.embeddingProfileId) {
|
if (this.embedWorker && this.options.embeddingProfileId) {
|
||||||
const runningJobData = runningJob || { jobId: msg.jobId, repositoryId: '', versionId: null };
|
const runningJobData = runningJob || {
|
||||||
this.enqueueEmbed(
|
jobId: msg.jobId,
|
||||||
msg.jobId,
|
repositoryId: '',
|
||||||
runningJobData.repositoryId,
|
versionId: null
|
||||||
runningJobData.versionId ?? null
|
};
|
||||||
);
|
this.enqueueEmbed(msg.jobId, runningJobData.repositoryId, runningJobData.versionId ?? null);
|
||||||
}
|
}
|
||||||
|
|
||||||
this.dispatch();
|
this.dispatch();
|
||||||
} else if (msg.type === 'failed') {
|
} else if (msg.type === 'failed') {
|
||||||
const runningJob = this.runningJobs.get(worker);
|
const runningJob = this.runningJobs.get(worker);
|
||||||
|
this.postWriteRequest({
|
||||||
|
type: 'write_job_update',
|
||||||
|
jobId: msg.jobId,
|
||||||
|
fields: {
|
||||||
|
status: 'failed',
|
||||||
|
stage: 'failed',
|
||||||
|
error: msg.error,
|
||||||
|
completedAt: Math.floor(Date.now() / 1000)
|
||||||
|
}
|
||||||
|
});
|
||||||
if (runningJob) {
|
if (runningJob) {
|
||||||
this.runningJobs.delete(worker);
|
this.runningJobs.delete(worker);
|
||||||
this.runningJobKeys.delete(WorkerPool.jobKey(runningJob.repositoryId, runningJob.versionId));
|
this.runningJobKeys.delete(
|
||||||
|
WorkerPool.jobKey(runningJob.repositoryId, runningJob.versionId)
|
||||||
|
);
|
||||||
}
|
}
|
||||||
this.idleWorkers.push(worker);
|
this.idleWorkers.push(worker);
|
||||||
this.options.onJobFailed(msg.jobId, msg.error);
|
this.options.onJobFailed(msg.jobId, msg.error);
|
||||||
@@ -273,6 +345,22 @@ export class WorkerPool {
|
|||||||
this.embedReady = true;
|
this.embedReady = true;
|
||||||
// Process any queued embed requests
|
// Process any queued embed requests
|
||||||
this.processEmbedQueue();
|
this.processEmbedQueue();
|
||||||
|
} else if (msg.type === 'write_embeddings') {
|
||||||
|
const embedWorker = this.embedWorker;
|
||||||
|
if (!embedWorker) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!this.writeWorker || !this.writeReady) {
|
||||||
|
embedWorker.postMessage({
|
||||||
|
type: 'write_error',
|
||||||
|
jobId: msg.jobId,
|
||||||
|
error: 'Write worker is not ready'
|
||||||
|
} satisfies EmbedWorkerRequest);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
this.postWriteRequest(msg, embedWorker);
|
||||||
} else if (msg.type === 'embed-progress') {
|
} else if (msg.type === 'embed-progress') {
|
||||||
// Progress message - could be tracked but not strictly required
|
// Progress message - could be tracked but not strictly required
|
||||||
} else if (msg.type === 'embed-done') {
|
} else if (msg.type === 'embed-done') {
|
||||||
@@ -288,6 +376,12 @@ export class WorkerPool {
|
|||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const worker = this.pendingWriteWorkers.get(msg.jobId);
|
||||||
|
if (worker) {
|
||||||
|
this.pendingWriteWorkers.delete(msg.jobId);
|
||||||
|
worker.postMessage(msg satisfies ParseWorkerRequest);
|
||||||
|
}
|
||||||
|
|
||||||
if (msg.type === 'write_error') {
|
if (msg.type === 'write_error') {
|
||||||
console.error('[WorkerPool] Write worker failed for job:', msg.jobId, msg.error);
|
console.error('[WorkerPool] Write worker failed for job:', msg.jobId, msg.error);
|
||||||
}
|
}
|
||||||
@@ -433,6 +527,7 @@ export class WorkerPool {
|
|||||||
this.idleWorkers = [];
|
this.idleWorkers = [];
|
||||||
this.embedWorker = null;
|
this.embedWorker = null;
|
||||||
this.writeWorker = null;
|
this.writeWorker = null;
|
||||||
|
this.pendingWriteWorkers.clear();
|
||||||
this.emitStatusChanged();
|
this.emitStatusChanged();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -2,29 +2,58 @@ import type { IndexingStage } from '$lib/types.js';
|
|||||||
|
|
||||||
export type ParseWorkerRequest =
|
export type ParseWorkerRequest =
|
||||||
| { type: 'run'; jobId: string }
|
| { type: 'run'; jobId: string }
|
||||||
|
| { type: 'write_ack'; jobId: string }
|
||||||
|
| { type: 'write_error'; jobId: string; error: string }
|
||||||
| { type: 'shutdown' };
|
| { type: 'shutdown' };
|
||||||
|
|
||||||
export type ParseWorkerResponse =
|
export type ParseWorkerResponse =
|
||||||
| { type: 'progress'; jobId: string; stage: IndexingStage; stageDetail?: string; progress: number; processedFiles: number; totalFiles: number }
|
| {
|
||||||
|
type: 'progress';
|
||||||
|
jobId: string;
|
||||||
|
stage: IndexingStage;
|
||||||
|
stageDetail?: string;
|
||||||
|
progress: number;
|
||||||
|
processedFiles: number;
|
||||||
|
totalFiles: number;
|
||||||
|
}
|
||||||
| { type: 'done'; jobId: string }
|
| { type: 'done'; jobId: string }
|
||||||
| { type: 'failed'; jobId: string; error: string };
|
| { type: 'failed'; jobId: string; error: string }
|
||||||
|
| WriteReplaceRequest
|
||||||
|
| WriteCloneRequest
|
||||||
|
| WriteRepoUpdateRequest
|
||||||
|
| WriteVersionUpdateRequest
|
||||||
|
| WriteRepoConfigRequest;
|
||||||
|
|
||||||
export type EmbedWorkerRequest =
|
export type EmbedWorkerRequest =
|
||||||
| { type: 'embed'; jobId: string; repositoryId: string; versionId: string | null }
|
| { type: 'embed'; jobId: string; repositoryId: string; versionId: string | null }
|
||||||
|
| {
|
||||||
|
type: 'write_ack';
|
||||||
|
jobId: string;
|
||||||
|
documentCount?: number;
|
||||||
|
snippetCount?: number;
|
||||||
|
embeddingCount?: number;
|
||||||
|
}
|
||||||
|
| { type: 'write_error'; jobId: string; error: string }
|
||||||
| { type: 'shutdown' };
|
| { type: 'shutdown' };
|
||||||
|
|
||||||
export type EmbedWorkerResponse =
|
export type EmbedWorkerResponse =
|
||||||
| { type: 'ready' }
|
| { type: 'ready' }
|
||||||
| { type: 'embed-progress'; jobId: string; done: number; total: number }
|
| { type: 'embed-progress'; jobId: string; done: number; total: number }
|
||||||
| { type: 'embed-done'; jobId: string }
|
| { type: 'embed-done'; jobId: string }
|
||||||
| { type: 'embed-failed'; jobId: string; error: string };
|
| { type: 'embed-failed'; jobId: string; error: string }
|
||||||
|
| WriteEmbeddingsRequest;
|
||||||
|
|
||||||
export type WriteWorkerRequest = WriteRequest | { type: 'shutdown' };
|
export type WriteWorkerRequest =
|
||||||
|
| ReplaceWriteRequest
|
||||||
|
| CloneWriteRequest
|
||||||
|
| JobUpdateWriteRequest
|
||||||
|
| RepoUpdateWriteRequest
|
||||||
|
| VersionUpdateWriteRequest
|
||||||
|
| RepoConfigWriteRequest
|
||||||
|
| EmbeddingsWriteRequest
|
||||||
|
| { type: 'shutdown' };
|
||||||
|
|
||||||
export type WriteWorkerResponse =
|
export type WriteWorkerResponse = { type: 'ready' } | WriteAck | WriteError;
|
||||||
| { type: 'ready' }
|
|
||||||
| WriteAck
|
|
||||||
| WriteError;
|
|
||||||
|
|
||||||
export interface WorkerInitData {
|
export interface WorkerInitData {
|
||||||
dbPath: string;
|
dbPath: string;
|
||||||
@@ -58,18 +87,84 @@ export interface SerializedSnippet {
|
|||||||
createdAt: number;
|
createdAt: number;
|
||||||
}
|
}
|
||||||
|
|
||||||
export type WriteRequest = {
|
export interface SerializedEmbedding {
|
||||||
type: 'write';
|
snippetId: string;
|
||||||
|
profileId: string;
|
||||||
|
model: string;
|
||||||
|
dimensions: number;
|
||||||
|
embedding: Uint8Array;
|
||||||
|
}
|
||||||
|
|
||||||
|
export type SerializedFieldValue = string | number | null;
|
||||||
|
|
||||||
|
export type SerializedFields = Record<string, SerializedFieldValue>;
|
||||||
|
|
||||||
|
export type ReplaceWriteRequest = {
|
||||||
|
type: 'write_replace';
|
||||||
jobId: string;
|
jobId: string;
|
||||||
|
changedDocIds: string[];
|
||||||
documents: SerializedDocument[];
|
documents: SerializedDocument[];
|
||||||
snippets: SerializedSnippet[];
|
snippets: SerializedSnippet[];
|
||||||
};
|
};
|
||||||
|
|
||||||
|
export type CloneWriteRequest = {
|
||||||
|
type: 'write_clone';
|
||||||
|
jobId: string;
|
||||||
|
ancestorVersionId: string;
|
||||||
|
targetVersionId: string;
|
||||||
|
repositoryId: string;
|
||||||
|
unchangedPaths: string[];
|
||||||
|
};
|
||||||
|
|
||||||
|
export type WriteReplaceRequest = ReplaceWriteRequest;
|
||||||
|
|
||||||
|
export type WriteCloneRequest = CloneWriteRequest;
|
||||||
|
|
||||||
|
export type EmbeddingsWriteRequest = {
|
||||||
|
type: 'write_embeddings';
|
||||||
|
jobId: string;
|
||||||
|
embeddings: SerializedEmbedding[];
|
||||||
|
};
|
||||||
|
|
||||||
|
export type RepoUpdateWriteRequest = {
|
||||||
|
type: 'write_repo_update';
|
||||||
|
jobId: string;
|
||||||
|
repositoryId: string;
|
||||||
|
fields: SerializedFields;
|
||||||
|
};
|
||||||
|
|
||||||
|
export type VersionUpdateWriteRequest = {
|
||||||
|
type: 'write_version_update';
|
||||||
|
jobId: string;
|
||||||
|
versionId: string;
|
||||||
|
fields: SerializedFields;
|
||||||
|
};
|
||||||
|
|
||||||
|
export type RepoConfigWriteRequest = {
|
||||||
|
type: 'write_repo_config';
|
||||||
|
jobId: string;
|
||||||
|
repositoryId: string;
|
||||||
|
versionId: string | null;
|
||||||
|
rules: string[];
|
||||||
|
};
|
||||||
|
|
||||||
|
export type JobUpdateWriteRequest = {
|
||||||
|
type: 'write_job_update';
|
||||||
|
jobId: string;
|
||||||
|
fields: SerializedFields;
|
||||||
|
};
|
||||||
|
|
||||||
|
export type WriteEmbeddingsRequest = EmbeddingsWriteRequest;
|
||||||
|
export type WriteRepoUpdateRequest = RepoUpdateWriteRequest;
|
||||||
|
export type WriteVersionUpdateRequest = VersionUpdateWriteRequest;
|
||||||
|
export type WriteRepoConfigRequest = RepoConfigWriteRequest;
|
||||||
|
|
||||||
export type WriteAck = {
|
export type WriteAck = {
|
||||||
type: 'write_ack';
|
type: 'write_ack';
|
||||||
jobId: string;
|
jobId: string;
|
||||||
documentCount: number;
|
documentCount?: number;
|
||||||
snippetCount: number;
|
snippetCount?: number;
|
||||||
|
embeddingCount?: number;
|
||||||
};
|
};
|
||||||
|
|
||||||
export type WriteError = {
|
export type WriteError = {
|
||||||
|
|||||||
343
src/lib/server/pipeline/write-operations.ts
Normal file
343
src/lib/server/pipeline/write-operations.ts
Normal file
@@ -0,0 +1,343 @@
|
|||||||
|
import { randomUUID } from 'node:crypto';
|
||||||
|
import type Database from 'better-sqlite3';
|
||||||
|
import type { NewDocument, NewSnippet } from '$lib/types';
|
||||||
|
import { SqliteVecStore } from '$lib/server/search/sqlite-vec.store.js';
|
||||||
|
import type {
|
||||||
|
SerializedDocument,
|
||||||
|
SerializedEmbedding,
|
||||||
|
SerializedFields,
|
||||||
|
SerializedSnippet
|
||||||
|
} from './worker-types.js';
|
||||||
|
|
||||||
|
type DocumentLike = Pick<
|
||||||
|
NewDocument,
|
||||||
|
| 'id'
|
||||||
|
| 'repositoryId'
|
||||||
|
| 'versionId'
|
||||||
|
| 'filePath'
|
||||||
|
| 'title'
|
||||||
|
| 'language'
|
||||||
|
| 'tokenCount'
|
||||||
|
| 'checksum'
|
||||||
|
> & {
|
||||||
|
indexedAt: Date | number;
|
||||||
|
};
|
||||||
|
|
||||||
|
type SnippetLike = Pick<
|
||||||
|
NewSnippet,
|
||||||
|
| 'id'
|
||||||
|
| 'documentId'
|
||||||
|
| 'repositoryId'
|
||||||
|
| 'versionId'
|
||||||
|
| 'type'
|
||||||
|
| 'title'
|
||||||
|
| 'content'
|
||||||
|
| 'language'
|
||||||
|
| 'breadcrumb'
|
||||||
|
| 'tokenCount'
|
||||||
|
> & {
|
||||||
|
createdAt: Date | number;
|
||||||
|
};
|
||||||
|
|
||||||
|
export interface CloneFromAncestorRequest {
|
||||||
|
ancestorVersionId: string;
|
||||||
|
targetVersionId: string;
|
||||||
|
repositoryId: string;
|
||||||
|
unchangedPaths: string[];
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface PersistedEmbedding {
|
||||||
|
snippetId: string;
|
||||||
|
profileId: string;
|
||||||
|
model: string;
|
||||||
|
dimensions: number;
|
||||||
|
embedding: Buffer | Uint8Array;
|
||||||
|
}
|
||||||
|
|
||||||
|
function toEpochSeconds(value: Date | number): number {
|
||||||
|
return value instanceof Date ? Math.floor(value.getTime() / 1000) : value;
|
||||||
|
}
|
||||||
|
|
||||||
|
function toSnake(key: string): string {
|
||||||
|
return key.replace(/[A-Z]/g, (char) => `_${char.toLowerCase()}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
function replaceSnippetsInternal(
|
||||||
|
db: Database.Database,
|
||||||
|
changedDocIds: string[],
|
||||||
|
newDocuments: DocumentLike[],
|
||||||
|
newSnippets: SnippetLike[]
|
||||||
|
): void {
|
||||||
|
const sqliteVecStore = new SqliteVecStore(db);
|
||||||
|
const insertDoc = db.prepare(
|
||||||
|
`INSERT INTO documents
|
||||||
|
(id, repository_id, version_id, file_path, title, language,
|
||||||
|
token_count, checksum, indexed_at)
|
||||||
|
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
||||||
|
);
|
||||||
|
|
||||||
|
const insertSnippet = db.prepare(
|
||||||
|
`INSERT INTO snippets
|
||||||
|
(id, document_id, repository_id, version_id, type, title,
|
||||||
|
content, language, breadcrumb, token_count, created_at)
|
||||||
|
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
||||||
|
);
|
||||||
|
|
||||||
|
db.transaction(() => {
|
||||||
|
sqliteVecStore.deleteEmbeddingsForDocumentIds(changedDocIds);
|
||||||
|
|
||||||
|
if (changedDocIds.length > 0) {
|
||||||
|
const placeholders = changedDocIds.map(() => '?').join(',');
|
||||||
|
db.prepare(`DELETE FROM documents WHERE id IN (${placeholders})`).run(...changedDocIds);
|
||||||
|
}
|
||||||
|
|
||||||
|
for (const doc of newDocuments) {
|
||||||
|
insertDoc.run(
|
||||||
|
doc.id,
|
||||||
|
doc.repositoryId,
|
||||||
|
doc.versionId ?? null,
|
||||||
|
doc.filePath,
|
||||||
|
doc.title ?? null,
|
||||||
|
doc.language ?? null,
|
||||||
|
doc.tokenCount ?? 0,
|
||||||
|
doc.checksum,
|
||||||
|
toEpochSeconds(doc.indexedAt)
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
for (const snippet of newSnippets) {
|
||||||
|
insertSnippet.run(
|
||||||
|
snippet.id,
|
||||||
|
snippet.documentId,
|
||||||
|
snippet.repositoryId,
|
||||||
|
snippet.versionId ?? null,
|
||||||
|
snippet.type,
|
||||||
|
snippet.title ?? null,
|
||||||
|
snippet.content,
|
||||||
|
snippet.language ?? null,
|
||||||
|
snippet.breadcrumb ?? null,
|
||||||
|
snippet.tokenCount ?? 0,
|
||||||
|
toEpochSeconds(snippet.createdAt)
|
||||||
|
);
|
||||||
|
}
|
||||||
|
})();
|
||||||
|
}
|
||||||
|
|
||||||
|
export function replaceSnippets(
|
||||||
|
db: Database.Database,
|
||||||
|
changedDocIds: string[],
|
||||||
|
newDocuments: NewDocument[],
|
||||||
|
newSnippets: NewSnippet[]
|
||||||
|
): void {
|
||||||
|
replaceSnippetsInternal(db, changedDocIds, newDocuments, newSnippets);
|
||||||
|
}
|
||||||
|
|
||||||
|
export function replaceSerializedSnippets(
|
||||||
|
db: Database.Database,
|
||||||
|
changedDocIds: string[],
|
||||||
|
documents: SerializedDocument[],
|
||||||
|
snippets: SerializedSnippet[]
|
||||||
|
): void {
|
||||||
|
replaceSnippetsInternal(db, changedDocIds, documents, snippets);
|
||||||
|
}
|
||||||
|
|
||||||
|
export function cloneFromAncestor(db: Database.Database, request: CloneFromAncestorRequest): void {
|
||||||
|
const sqliteVecStore = new SqliteVecStore(db);
|
||||||
|
const { ancestorVersionId, targetVersionId, repositoryId, unchangedPaths } = request;
|
||||||
|
|
||||||
|
db.transaction(() => {
|
||||||
|
const pathList = [...unchangedPaths];
|
||||||
|
if (pathList.length === 0) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const placeholders = pathList.map(() => '?').join(',');
|
||||||
|
const ancestorDocs = db
|
||||||
|
.prepare(`SELECT * FROM documents WHERE version_id = ? AND file_path IN (${placeholders})`)
|
||||||
|
.all(ancestorVersionId, ...pathList) as Array<{
|
||||||
|
id: string;
|
||||||
|
repository_id: string;
|
||||||
|
file_path: string;
|
||||||
|
title: string | null;
|
||||||
|
language: string | null;
|
||||||
|
token_count: number;
|
||||||
|
checksum: string;
|
||||||
|
indexed_at: number;
|
||||||
|
}>;
|
||||||
|
|
||||||
|
const docIdMap = new Map<string, string>();
|
||||||
|
const nowEpoch = Math.floor(Date.now() / 1000);
|
||||||
|
|
||||||
|
for (const doc of ancestorDocs) {
|
||||||
|
const newDocId = randomUUID();
|
||||||
|
docIdMap.set(doc.id, newDocId);
|
||||||
|
db.prepare(
|
||||||
|
`INSERT INTO documents (id, repository_id, version_id, file_path, title, language, token_count, checksum, indexed_at)
|
||||||
|
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
||||||
|
).run(
|
||||||
|
newDocId,
|
||||||
|
repositoryId,
|
||||||
|
targetVersionId,
|
||||||
|
doc.file_path,
|
||||||
|
doc.title,
|
||||||
|
doc.language,
|
||||||
|
doc.token_count,
|
||||||
|
doc.checksum,
|
||||||
|
nowEpoch
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (docIdMap.size === 0) return;
|
||||||
|
|
||||||
|
const oldDocIds = [...docIdMap.keys()];
|
||||||
|
const snippetPlaceholders = oldDocIds.map(() => '?').join(',');
|
||||||
|
const ancestorSnippets = db
|
||||||
|
.prepare(`SELECT * FROM snippets WHERE document_id IN (${snippetPlaceholders})`)
|
||||||
|
.all(...oldDocIds) as Array<{
|
||||||
|
id: string;
|
||||||
|
document_id: string;
|
||||||
|
repository_id: string;
|
||||||
|
version_id: string | null;
|
||||||
|
type: string;
|
||||||
|
title: string | null;
|
||||||
|
content: string;
|
||||||
|
language: string | null;
|
||||||
|
breadcrumb: string | null;
|
||||||
|
token_count: number;
|
||||||
|
created_at: number;
|
||||||
|
}>;
|
||||||
|
|
||||||
|
const snippetIdMap = new Map<string, string>();
|
||||||
|
for (const snippet of ancestorSnippets) {
|
||||||
|
const newSnippetId = randomUUID();
|
||||||
|
snippetIdMap.set(snippet.id, newSnippetId);
|
||||||
|
const newDocId = docIdMap.get(snippet.document_id)!;
|
||||||
|
db.prepare(
|
||||||
|
`INSERT INTO snippets (id, document_id, repository_id, version_id, type, title, content, language, breadcrumb, token_count, created_at)
|
||||||
|
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
||||||
|
).run(
|
||||||
|
newSnippetId,
|
||||||
|
newDocId,
|
||||||
|
repositoryId,
|
||||||
|
targetVersionId,
|
||||||
|
snippet.type,
|
||||||
|
snippet.title,
|
||||||
|
snippet.content,
|
||||||
|
snippet.language,
|
||||||
|
snippet.breadcrumb,
|
||||||
|
snippet.token_count,
|
||||||
|
snippet.created_at
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (snippetIdMap.size === 0) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const oldSnippetIds = [...snippetIdMap.keys()];
|
||||||
|
const embPlaceholders = oldSnippetIds.map(() => '?').join(',');
|
||||||
|
const ancestorEmbeddings = db
|
||||||
|
.prepare(`SELECT * FROM snippet_embeddings WHERE snippet_id IN (${embPlaceholders})`)
|
||||||
|
.all(...oldSnippetIds) as Array<{
|
||||||
|
snippet_id: string;
|
||||||
|
profile_id: string;
|
||||||
|
model: string;
|
||||||
|
dimensions: number;
|
||||||
|
embedding: Buffer;
|
||||||
|
created_at: number;
|
||||||
|
}>;
|
||||||
|
|
||||||
|
for (const emb of ancestorEmbeddings) {
|
||||||
|
const newSnippetId = snippetIdMap.get(emb.snippet_id)!;
|
||||||
|
db.prepare(
|
||||||
|
`INSERT INTO snippet_embeddings (snippet_id, profile_id, model, dimensions, embedding, created_at)
|
||||||
|
VALUES (?, ?, ?, ?, ?, ?)`
|
||||||
|
).run(newSnippetId, emb.profile_id, emb.model, emb.dimensions, emb.embedding, emb.created_at);
|
||||||
|
sqliteVecStore.upsertEmbeddingBuffer(
|
||||||
|
emb.profile_id,
|
||||||
|
newSnippetId,
|
||||||
|
emb.embedding,
|
||||||
|
emb.dimensions
|
||||||
|
);
|
||||||
|
}
|
||||||
|
})();
|
||||||
|
}
|
||||||
|
|
||||||
|
export function upsertEmbeddings(db: Database.Database, embeddings: PersistedEmbedding[]): void {
|
||||||
|
if (embeddings.length === 0) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const sqliteVecStore = new SqliteVecStore(db);
|
||||||
|
const insert = db.prepare<[string, string, string, number, Buffer]>(`
|
||||||
|
INSERT OR REPLACE INTO snippet_embeddings (snippet_id, profile_id, model, dimensions, embedding, created_at)
|
||||||
|
VALUES (?, ?, ?, ?, ?, unixepoch())
|
||||||
|
`);
|
||||||
|
|
||||||
|
db.transaction(() => {
|
||||||
|
for (const item of embeddings) {
|
||||||
|
const embeddingBuffer = Buffer.isBuffer(item.embedding)
|
||||||
|
? item.embedding
|
||||||
|
: Buffer.from(item.embedding);
|
||||||
|
|
||||||
|
insert.run(item.snippetId, item.profileId, item.model, item.dimensions, embeddingBuffer);
|
||||||
|
|
||||||
|
sqliteVecStore.upsertEmbeddingBuffer(
|
||||||
|
item.profileId,
|
||||||
|
item.snippetId,
|
||||||
|
embeddingBuffer,
|
||||||
|
item.dimensions
|
||||||
|
);
|
||||||
|
}
|
||||||
|
})();
|
||||||
|
}
|
||||||
|
|
||||||
|
export function upsertSerializedEmbeddings(
|
||||||
|
db: Database.Database,
|
||||||
|
embeddings: SerializedEmbedding[]
|
||||||
|
): void {
|
||||||
|
upsertEmbeddings(
|
||||||
|
db,
|
||||||
|
embeddings.map((item) => ({
|
||||||
|
snippetId: item.snippetId,
|
||||||
|
profileId: item.profileId,
|
||||||
|
model: item.model,
|
||||||
|
dimensions: item.dimensions,
|
||||||
|
embedding: item.embedding
|
||||||
|
}))
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
export function updateRepo(
|
||||||
|
db: Database.Database,
|
||||||
|
repositoryId: string,
|
||||||
|
fields: SerializedFields
|
||||||
|
): void {
|
||||||
|
const now = Math.floor(Date.now() / 1000);
|
||||||
|
const allFields = { ...fields, updatedAt: now };
|
||||||
|
const sets = Object.keys(allFields)
|
||||||
|
.map((key) => `${toSnake(key)} = ?`)
|
||||||
|
.join(', ');
|
||||||
|
const values = [...Object.values(allFields), repositoryId];
|
||||||
|
db.prepare(`UPDATE repositories SET ${sets} WHERE id = ?`).run(...values);
|
||||||
|
}
|
||||||
|
|
||||||
|
export function updateJob(db: Database.Database, jobId: string, fields: SerializedFields): void {
|
||||||
|
const sets = Object.keys(fields)
|
||||||
|
.map((key) => `${toSnake(key)} = ?`)
|
||||||
|
.join(', ');
|
||||||
|
const values = [...Object.values(fields), jobId];
|
||||||
|
db.prepare(`UPDATE indexing_jobs SET ${sets} WHERE id = ?`).run(...values);
|
||||||
|
}
|
||||||
|
|
||||||
|
export function updateVersion(
|
||||||
|
db: Database.Database,
|
||||||
|
versionId: string,
|
||||||
|
fields: SerializedFields
|
||||||
|
): void {
|
||||||
|
const sets = Object.keys(fields)
|
||||||
|
.map((key) => `${toSnake(key)} = ?`)
|
||||||
|
.join(', ');
|
||||||
|
const values = [...Object.values(fields), versionId];
|
||||||
|
db.prepare(`UPDATE repository_versions SET ${sets} WHERE id = ?`).run(...values);
|
||||||
|
}
|
||||||
@@ -1,67 +1,21 @@
|
|||||||
import { workerData, parentPort } from 'node:worker_threads';
|
import { workerData, parentPort } from 'node:worker_threads';
|
||||||
import Database from 'better-sqlite3';
|
import Database from 'better-sqlite3';
|
||||||
import type {
|
import { applySqlitePragmas } from '$lib/server/db/connection.js';
|
||||||
SerializedDocument,
|
import { loadSqliteVec } from '$lib/server/db/sqlite-vec.js';
|
||||||
SerializedSnippet,
|
import type { WorkerInitData, WriteWorkerRequest, WriteWorkerResponse } from './worker-types.js';
|
||||||
WorkerInitData,
|
import {
|
||||||
WriteWorkerRequest,
|
cloneFromAncestor,
|
||||||
WriteWorkerResponse
|
replaceSerializedSnippets,
|
||||||
} from './worker-types.js';
|
updateJob,
|
||||||
|
updateRepo,
|
||||||
|
updateVersion,
|
||||||
|
upsertSerializedEmbeddings
|
||||||
|
} from './write-operations.js';
|
||||||
|
|
||||||
const { dbPath } = workerData as WorkerInitData;
|
const { dbPath } = workerData as WorkerInitData;
|
||||||
const db = new Database(dbPath);
|
const db = new Database(dbPath);
|
||||||
db.pragma('journal_mode = WAL');
|
applySqlitePragmas(db);
|
||||||
db.pragma('foreign_keys = ON');
|
loadSqliteVec(db);
|
||||||
db.pragma('busy_timeout = 5000');
|
|
||||||
db.pragma('synchronous = NORMAL');
|
|
||||||
db.pragma('cache_size = -65536');
|
|
||||||
db.pragma('temp_store = MEMORY');
|
|
||||||
db.pragma('mmap_size = 268435456');
|
|
||||||
db.pragma('wal_autocheckpoint = 1000');
|
|
||||||
|
|
||||||
const insertDocument = db.prepare(
|
|
||||||
`INSERT OR REPLACE INTO documents
|
|
||||||
(id, repository_id, version_id, file_path, title, language, token_count, checksum, indexed_at)
|
|
||||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
|
||||||
);
|
|
||||||
|
|
||||||
const insertSnippet = db.prepare(
|
|
||||||
`INSERT OR REPLACE INTO snippets
|
|
||||||
(id, document_id, repository_id, version_id, type, title, content, language, breadcrumb, token_count, created_at)
|
|
||||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
|
||||||
);
|
|
||||||
|
|
||||||
const writeBatch = db.transaction((documents: SerializedDocument[], snippets: SerializedSnippet[]) => {
|
|
||||||
for (const document of documents) {
|
|
||||||
insertDocument.run(
|
|
||||||
document.id,
|
|
||||||
document.repositoryId,
|
|
||||||
document.versionId,
|
|
||||||
document.filePath,
|
|
||||||
document.title,
|
|
||||||
document.language,
|
|
||||||
document.tokenCount,
|
|
||||||
document.checksum,
|
|
||||||
document.indexedAt
|
|
||||||
);
|
|
||||||
}
|
|
||||||
|
|
||||||
for (const snippet of snippets) {
|
|
||||||
insertSnippet.run(
|
|
||||||
snippet.id,
|
|
||||||
snippet.documentId,
|
|
||||||
snippet.repositoryId,
|
|
||||||
snippet.versionId,
|
|
||||||
snippet.type,
|
|
||||||
snippet.title,
|
|
||||||
snippet.content,
|
|
||||||
snippet.language,
|
|
||||||
snippet.breadcrumb,
|
|
||||||
snippet.tokenCount,
|
|
||||||
snippet.createdAt
|
|
||||||
);
|
|
||||||
}
|
|
||||||
});
|
|
||||||
|
|
||||||
parentPort?.postMessage({ type: 'ready' } satisfies WriteWorkerResponse);
|
parentPort?.postMessage({ type: 'ready' } satisfies WriteWorkerResponse);
|
||||||
|
|
||||||
@@ -71,12 +25,9 @@ parentPort?.on('message', (msg: WriteWorkerRequest) => {
|
|||||||
process.exit(0);
|
process.exit(0);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (msg.type !== 'write') {
|
if (msg.type === 'write_replace') {
|
||||||
return;
|
|
||||||
}
|
|
||||||
|
|
||||||
try {
|
try {
|
||||||
writeBatch(msg.documents, msg.snippets);
|
replaceSerializedSnippets(db, msg.changedDocIds, msg.documents, msg.snippets);
|
||||||
parentPort?.postMessage({
|
parentPort?.postMessage({
|
||||||
type: 'write_ack',
|
type: 'write_ack',
|
||||||
jobId: msg.jobId,
|
jobId: msg.jobId,
|
||||||
@@ -90,4 +41,129 @@ parentPort?.on('message', (msg: WriteWorkerRequest) => {
|
|||||||
error: error instanceof Error ? error.message : String(error)
|
error: error instanceof Error ? error.message : String(error)
|
||||||
} satisfies WriteWorkerResponse);
|
} satisfies WriteWorkerResponse);
|
||||||
}
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (msg.type === 'write_clone') {
|
||||||
|
try {
|
||||||
|
cloneFromAncestor(db, {
|
||||||
|
ancestorVersionId: msg.ancestorVersionId,
|
||||||
|
targetVersionId: msg.targetVersionId,
|
||||||
|
repositoryId: msg.repositoryId,
|
||||||
|
unchangedPaths: msg.unchangedPaths
|
||||||
|
});
|
||||||
|
parentPort?.postMessage({
|
||||||
|
type: 'write_ack',
|
||||||
|
jobId: msg.jobId
|
||||||
|
} satisfies WriteWorkerResponse);
|
||||||
|
} catch (error) {
|
||||||
|
parentPort?.postMessage({
|
||||||
|
type: 'write_error',
|
||||||
|
jobId: msg.jobId,
|
||||||
|
error: error instanceof Error ? error.message : String(error)
|
||||||
|
} satisfies WriteWorkerResponse);
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (msg.type === 'write_embeddings') {
|
||||||
|
try {
|
||||||
|
upsertSerializedEmbeddings(db, msg.embeddings);
|
||||||
|
parentPort?.postMessage({
|
||||||
|
type: 'write_ack',
|
||||||
|
jobId: msg.jobId,
|
||||||
|
embeddingCount: msg.embeddings.length
|
||||||
|
} satisfies WriteWorkerResponse);
|
||||||
|
} catch (error) {
|
||||||
|
parentPort?.postMessage({
|
||||||
|
type: 'write_error',
|
||||||
|
jobId: msg.jobId,
|
||||||
|
error: error instanceof Error ? error.message : String(error)
|
||||||
|
} satisfies WriteWorkerResponse);
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (msg.type === 'write_job_update') {
|
||||||
|
try {
|
||||||
|
updateJob(db, msg.jobId, msg.fields);
|
||||||
|
parentPort?.postMessage({
|
||||||
|
type: 'write_ack',
|
||||||
|
jobId: msg.jobId
|
||||||
|
} satisfies WriteWorkerResponse);
|
||||||
|
} catch (error) {
|
||||||
|
parentPort?.postMessage({
|
||||||
|
type: 'write_error',
|
||||||
|
jobId: msg.jobId,
|
||||||
|
error: error instanceof Error ? error.message : String(error)
|
||||||
|
} satisfies WriteWorkerResponse);
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (msg.type === 'write_repo_update') {
|
||||||
|
try {
|
||||||
|
updateRepo(db, msg.repositoryId, msg.fields);
|
||||||
|
parentPort?.postMessage({
|
||||||
|
type: 'write_ack',
|
||||||
|
jobId: msg.jobId
|
||||||
|
} satisfies WriteWorkerResponse);
|
||||||
|
} catch (error) {
|
||||||
|
parentPort?.postMessage({
|
||||||
|
type: 'write_error',
|
||||||
|
jobId: msg.jobId,
|
||||||
|
error: error instanceof Error ? error.message : String(error)
|
||||||
|
} satisfies WriteWorkerResponse);
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (msg.type === 'write_version_update') {
|
||||||
|
try {
|
||||||
|
updateVersion(db, msg.versionId, msg.fields);
|
||||||
|
parentPort?.postMessage({
|
||||||
|
type: 'write_ack',
|
||||||
|
jobId: msg.jobId
|
||||||
|
} satisfies WriteWorkerResponse);
|
||||||
|
} catch (error) {
|
||||||
|
parentPort?.postMessage({
|
||||||
|
type: 'write_error',
|
||||||
|
jobId: msg.jobId,
|
||||||
|
error: error instanceof Error ? error.message : String(error)
|
||||||
|
} satisfies WriteWorkerResponse);
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (msg.type === 'write_repo_config') {
|
||||||
|
try {
|
||||||
|
const now = Math.floor(Date.now() / 1000);
|
||||||
|
if (msg.versionId === null) {
|
||||||
|
db.prepare(
|
||||||
|
`DELETE FROM repository_configs WHERE repository_id = ? AND version_id IS NULL`
|
||||||
|
).run(msg.repositoryId);
|
||||||
|
} else {
|
||||||
|
db.prepare(`DELETE FROM repository_configs WHERE repository_id = ? AND version_id = ?`).run(
|
||||||
|
msg.repositoryId,
|
||||||
|
msg.versionId
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
db.prepare(
|
||||||
|
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
|
||||||
|
VALUES (?, ?, ?, ?)`
|
||||||
|
).run(msg.repositoryId, msg.versionId, JSON.stringify(msg.rules), now);
|
||||||
|
|
||||||
|
parentPort?.postMessage({
|
||||||
|
type: 'write_ack',
|
||||||
|
jobId: msg.jobId
|
||||||
|
} satisfies WriteWorkerResponse);
|
||||||
|
} catch (error) {
|
||||||
|
parentPort?.postMessage({
|
||||||
|
type: 'write_error',
|
||||||
|
jobId: msg.jobId,
|
||||||
|
error: error instanceof Error ? error.message : String(error)
|
||||||
|
} satisfies WriteWorkerResponse);
|
||||||
|
}
|
||||||
|
}
|
||||||
});
|
});
|
||||||
@@ -383,7 +383,18 @@ describe('VectorSearch', () => {
|
|||||||
`INSERT INTO embedding_profiles (id, provider_kind, title, enabled, is_default, model, dimensions, config, created_at, updated_at)
|
`INSERT INTO embedding_profiles (id, provider_kind, title, enabled, is_default, model, dimensions, config, created_at, updated_at)
|
||||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
|
||||||
)
|
)
|
||||||
.run('secondary-profile', 'local-transformers', 'Secondary', 1, 0, 'test-model', 2, '{}', NOW_S, NOW_S);
|
.run(
|
||||||
|
'secondary-profile',
|
||||||
|
'local-transformers',
|
||||||
|
'Secondary',
|
||||||
|
1,
|
||||||
|
0,
|
||||||
|
'test-model',
|
||||||
|
2,
|
||||||
|
'{}',
|
||||||
|
NOW_S,
|
||||||
|
NOW_S
|
||||||
|
);
|
||||||
|
|
||||||
const defaultSnippet = seedSnippet(client, {
|
const defaultSnippet = seedSnippet(client, {
|
||||||
repositoryId: repoId,
|
repositoryId: repoId,
|
||||||
|
|||||||
@@ -90,17 +90,18 @@ export class SqliteVecStore {
|
|||||||
this.ensureProfileStore(profileId, tables.dimensions);
|
this.ensureProfileStore(profileId, tables.dimensions);
|
||||||
|
|
||||||
const existingRow = this.db
|
const existingRow = this.db
|
||||||
.prepare<[string], SnippetRowidRow>(
|
.prepare<
|
||||||
`SELECT rowid FROM ${tables.quotedRowidTableName} WHERE snippet_id = ?`
|
[string],
|
||||||
)
|
SnippetRowidRow
|
||||||
|
>(`SELECT rowid FROM ${tables.quotedRowidTableName} WHERE snippet_id = ?`)
|
||||||
.get(snippetId);
|
.get(snippetId);
|
||||||
|
|
||||||
const embeddingBuffer = toEmbeddingBuffer(embedding);
|
const embeddingBuffer = toEmbeddingBuffer(embedding);
|
||||||
if (existingRow) {
|
if (existingRow) {
|
||||||
this.db
|
this.db
|
||||||
.prepare<[Buffer, number]>(
|
.prepare<
|
||||||
`UPDATE ${tables.quotedVectorTableName} SET embedding = ? WHERE rowid = ?`
|
[Buffer, number]
|
||||||
)
|
>(`UPDATE ${tables.quotedVectorTableName} SET embedding = ? WHERE rowid = ?`)
|
||||||
.run(embeddingBuffer, existingRow.rowid);
|
.run(embeddingBuffer, existingRow.rowid);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
@@ -109,9 +110,9 @@ export class SqliteVecStore {
|
|||||||
.prepare<[Buffer]>(`INSERT INTO ${tables.quotedVectorTableName} (embedding) VALUES (?)`)
|
.prepare<[Buffer]>(`INSERT INTO ${tables.quotedVectorTableName} (embedding) VALUES (?)`)
|
||||||
.run(embeddingBuffer);
|
.run(embeddingBuffer);
|
||||||
this.db
|
this.db
|
||||||
.prepare<[number, string]>(
|
.prepare<
|
||||||
`INSERT INTO ${tables.quotedRowidTableName} (rowid, snippet_id) VALUES (?, ?)`
|
[number, string]
|
||||||
)
|
>(`INSERT INTO ${tables.quotedRowidTableName} (rowid, snippet_id) VALUES (?, ?)`)
|
||||||
.run(Number(insertResult.lastInsertRowid), snippetId);
|
.run(Number(insertResult.lastInsertRowid), snippetId);
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -134,9 +135,10 @@ export class SqliteVecStore {
|
|||||||
this.ensureProfileStore(profileId);
|
this.ensureProfileStore(profileId);
|
||||||
|
|
||||||
const existingRow = this.db
|
const existingRow = this.db
|
||||||
.prepare<[string], SnippetRowidRow>(
|
.prepare<
|
||||||
`SELECT rowid FROM ${tables.quotedRowidTableName} WHERE snippet_id = ?`
|
[string],
|
||||||
)
|
SnippetRowidRow
|
||||||
|
>(`SELECT rowid FROM ${tables.quotedRowidTableName} WHERE snippet_id = ?`)
|
||||||
.get(snippetId);
|
.get(snippetId);
|
||||||
|
|
||||||
if (!existingRow) {
|
if (!existingRow) {
|
||||||
@@ -280,11 +282,7 @@ export class SqliteVecStore {
|
|||||||
this.upsertEmbedding(
|
this.upsertEmbedding(
|
||||||
profileId,
|
profileId,
|
||||||
row.snippet_id,
|
row.snippet_id,
|
||||||
new Float32Array(
|
new Float32Array(row.embedding.buffer, row.embedding.byteOffset, tables.dimensions)
|
||||||
row.embedding.buffer,
|
|
||||||
row.embedding.byteOffset,
|
|
||||||
tables.dimensions
|
|
||||||
)
|
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
@@ -323,9 +321,10 @@ export class SqliteVecStore {
|
|||||||
loadSqliteVec(this.db);
|
loadSqliteVec(this.db);
|
||||||
|
|
||||||
const dimensionsRow = this.db
|
const dimensionsRow = this.db
|
||||||
.prepare<[string], ProfileDimensionsRow>(
|
.prepare<
|
||||||
'SELECT dimensions FROM embedding_profiles WHERE id = ?'
|
[string],
|
||||||
)
|
ProfileDimensionsRow
|
||||||
|
>('SELECT dimensions FROM embedding_profiles WHERE id = ?')
|
||||||
.get(profileId);
|
.get(profileId);
|
||||||
if (!dimensionsRow) {
|
if (!dimensionsRow) {
|
||||||
throw new Error(`Embedding profile not found: ${profileId}`);
|
throw new Error(`Embedding profile not found: ${profileId}`);
|
||||||
@@ -377,10 +376,7 @@ export class SqliteVecStore {
|
|||||||
throw new Error(`Stored embedding dimensions are missing for profile ${profileId}`);
|
throw new Error(`Stored embedding dimensions are missing for profile ${profileId}`);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (
|
if (preferredDimensions !== undefined && preferredDimensions !== canonicalDimensions) {
|
||||||
preferredDimensions !== undefined &&
|
|
||||||
preferredDimensions !== canonicalDimensions
|
|
||||||
) {
|
|
||||||
throw new Error(
|
throw new Error(
|
||||||
`Embedding dimension mismatch for profile ${profileId}: expected ${canonicalDimensions}, received ${preferredDimensions}`
|
`Embedding dimension mismatch for profile ${profileId}: expected ${canonicalDimensions}, received ${preferredDimensions}`
|
||||||
);
|
);
|
||||||
|
|||||||
@@ -1,6 +1,9 @@
|
|||||||
import type Database from 'better-sqlite3';
|
import type Database from 'better-sqlite3';
|
||||||
import type { EmbeddingSettingsUpdateDto } from '$lib/dtos/embedding-settings.js';
|
import type { EmbeddingSettingsUpdateDto } from '$lib/dtos/embedding-settings.js';
|
||||||
import { createProviderFromProfile, getDefaultLocalProfile } from '$lib/server/embeddings/registry.js';
|
import {
|
||||||
|
createProviderFromProfile,
|
||||||
|
getDefaultLocalProfile
|
||||||
|
} from '$lib/server/embeddings/registry.js';
|
||||||
import { EmbeddingProfileMapper } from '$lib/server/mappers/embedding-profile.mapper.js';
|
import { EmbeddingProfileMapper } from '$lib/server/mappers/embedding-profile.mapper.js';
|
||||||
import { EmbeddingProfile, EmbeddingProfileEntity } from '$lib/server/models/embedding-profile.js';
|
import { EmbeddingProfile, EmbeddingProfileEntity } from '$lib/server/models/embedding-profile.js';
|
||||||
import { EmbeddingSettings } from '$lib/server/models/embedding-settings.js';
|
import { EmbeddingSettings } from '$lib/server/models/embedding-settings.js';
|
||||||
@@ -94,7 +97,10 @@ export class EmbeddingSettingsService {
|
|||||||
private getCreatedAt(id: string, fallback: number): number {
|
private getCreatedAt(id: string, fallback: number): number {
|
||||||
return (
|
return (
|
||||||
this.db
|
this.db
|
||||||
.prepare<[string], { created_at: number }>('SELECT created_at FROM embedding_profiles WHERE id = ?')
|
.prepare<
|
||||||
|
[string],
|
||||||
|
{ created_at: number }
|
||||||
|
>('SELECT created_at FROM embedding_profiles WHERE id = ?')
|
||||||
.get(id)?.created_at ?? fallback
|
.get(id)?.created_at ?? fallback
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -11,7 +11,11 @@ import Database from 'better-sqlite3';
|
|||||||
import { readFileSync } from 'node:fs';
|
import { readFileSync } from 'node:fs';
|
||||||
import { join } from 'node:path';
|
import { join } from 'node:path';
|
||||||
import { RepositoryService } from './repository.service';
|
import { RepositoryService } from './repository.service';
|
||||||
import { loadSqliteVec, sqliteVecRowidTableName, sqliteVecTableName } from '$lib/server/db/sqlite-vec.js';
|
import {
|
||||||
|
loadSqliteVec,
|
||||||
|
sqliteVecRowidTableName,
|
||||||
|
sqliteVecTableName
|
||||||
|
} from '$lib/server/db/sqlite-vec.js';
|
||||||
import { SqliteVecStore } from '$lib/server/search/sqlite-vec.store.js';
|
import { SqliteVecStore } from '$lib/server/search/sqlite-vec.store.js';
|
||||||
import {
|
import {
|
||||||
AlreadyExistsError,
|
AlreadyExistsError,
|
||||||
@@ -465,7 +469,11 @@ describe('RepositoryService.getIndexSummary()', () => {
|
|||||||
beforeEach(() => {
|
beforeEach(() => {
|
||||||
client = createTestDb();
|
client = createTestDb();
|
||||||
service = makeService(client);
|
service = makeService(client);
|
||||||
service.add({ source: 'github', sourceUrl: 'https://github.com/facebook/react', branch: 'main' });
|
service.add({
|
||||||
|
source: 'github',
|
||||||
|
sourceUrl: 'https://github.com/facebook/react',
|
||||||
|
branch: 'main'
|
||||||
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
it('returns embedding counts and indexed version labels', () => {
|
it('returns embedding counts and indexed version labels', () => {
|
||||||
|
|||||||
@@ -10,7 +10,11 @@ import { describe, it, expect } from 'vitest';
|
|||||||
import Database from 'better-sqlite3';
|
import Database from 'better-sqlite3';
|
||||||
import { readFileSync } from 'node:fs';
|
import { readFileSync } from 'node:fs';
|
||||||
import { join } from 'node:path';
|
import { join } from 'node:path';
|
||||||
import { loadSqliteVec, sqliteVecRowidTableName, sqliteVecTableName } from '$lib/server/db/sqlite-vec.js';
|
import {
|
||||||
|
loadSqliteVec,
|
||||||
|
sqliteVecRowidTableName,
|
||||||
|
sqliteVecTableName
|
||||||
|
} from '$lib/server/db/sqlite-vec.js';
|
||||||
import { SqliteVecStore } from '$lib/server/search/sqlite-vec.store.js';
|
import { SqliteVecStore } from '$lib/server/search/sqlite-vec.store.js';
|
||||||
import { VersionService } from './version.service';
|
import { VersionService } from './version.service';
|
||||||
import { RepositoryService } from './repository.service';
|
import { RepositoryService } from './repository.service';
|
||||||
@@ -206,18 +210,24 @@ describe('VersionService.remove()', () => {
|
|||||||
const now = Math.floor(Date.now() / 1000);
|
const now = Math.floor(Date.now() / 1000);
|
||||||
const vecStore = new SqliteVecStore(client);
|
const vecStore = new SqliteVecStore(client);
|
||||||
|
|
||||||
client.prepare(
|
client
|
||||||
|
.prepare(
|
||||||
`INSERT INTO documents (id, repository_id, version_id, file_path, checksum, indexed_at)
|
`INSERT INTO documents (id, repository_id, version_id, file_path, checksum, indexed_at)
|
||||||
VALUES (?, '/facebook/react', ?, 'README.md', 'version-doc', ?)`
|
VALUES (?, '/facebook/react', ?, 'README.md', 'version-doc', ?)`
|
||||||
).run(docId, version.id, now);
|
)
|
||||||
client.prepare(
|
.run(docId, version.id, now);
|
||||||
|
client
|
||||||
|
.prepare(
|
||||||
`INSERT INTO snippets (id, document_id, repository_id, version_id, type, content, created_at)
|
`INSERT INTO snippets (id, document_id, repository_id, version_id, type, content, created_at)
|
||||||
VALUES (?, ?, '/facebook/react', ?, 'info', 'version snippet', ?)`
|
VALUES (?, ?, '/facebook/react', ?, 'info', 'version snippet', ?)`
|
||||||
).run(snippetId, docId, version.id, now);
|
)
|
||||||
client.prepare(
|
.run(snippetId, docId, version.id, now);
|
||||||
|
client
|
||||||
|
.prepare(
|
||||||
`INSERT INTO snippet_embeddings (snippet_id, profile_id, model, dimensions, embedding, created_at)
|
`INSERT INTO snippet_embeddings (snippet_id, profile_id, model, dimensions, embedding, created_at)
|
||||||
VALUES (?, 'local-default', 'test-model', 3, ?, ?)`
|
VALUES (?, 'local-default', 'test-model', 3, ?, ?)`
|
||||||
).run(snippetId, Buffer.from(embedding.buffer), now);
|
)
|
||||||
|
.run(snippetId, Buffer.from(embedding.buffer), now);
|
||||||
vecStore.upsertEmbedding('local-default', snippetId, embedding);
|
vecStore.upsertEmbedding('local-default', snippetId, embedding);
|
||||||
|
|
||||||
versionService.remove('/facebook/react', 'v18.3.0');
|
versionService.remove('/facebook/react', 'v18.3.0');
|
||||||
|
|||||||
@@ -9,7 +9,10 @@ import { RepositoryVersion } from '$lib/server/models/repository-version.js';
|
|||||||
// Helpers
|
// Helpers
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
|
|
||||||
function makeVersion(tag: string, state: RepositoryVersion['state'] = 'indexed'): RepositoryVersion {
|
function makeVersion(
|
||||||
|
tag: string,
|
||||||
|
state: RepositoryVersion['state'] = 'indexed'
|
||||||
|
): RepositoryVersion {
|
||||||
return new RepositoryVersion({
|
return new RepositoryVersion({
|
||||||
id: `/facebook/react/${tag}`,
|
id: `/facebook/react/${tag}`,
|
||||||
repositoryId: '/facebook/react',
|
repositoryId: '/facebook/react',
|
||||||
@@ -42,21 +45,13 @@ describe('findBestAncestorVersion', () => {
|
|||||||
});
|
});
|
||||||
|
|
||||||
it('returns the nearest semver predecessor from a list', () => {
|
it('returns the nearest semver predecessor from a list', () => {
|
||||||
const candidates = [
|
const candidates = [makeVersion('v1.0.0'), makeVersion('v1.1.0'), makeVersion('v2.0.0')];
|
||||||
makeVersion('v1.0.0'),
|
|
||||||
makeVersion('v1.1.0'),
|
|
||||||
makeVersion('v2.0.0')
|
|
||||||
];
|
|
||||||
const result = findBestAncestorVersion('v2.1.0', candidates);
|
const result = findBestAncestorVersion('v2.1.0', candidates);
|
||||||
expect(result?.tag).toBe('v2.0.0');
|
expect(result?.tag).toBe('v2.0.0');
|
||||||
});
|
});
|
||||||
|
|
||||||
it('handles v-prefix stripping correctly', () => {
|
it('handles v-prefix stripping correctly', () => {
|
||||||
const candidates = [
|
const candidates = [makeVersion('v1.0.0'), makeVersion('v1.5.0'), makeVersion('v2.0.0')];
|
||||||
makeVersion('v1.0.0'),
|
|
||||||
makeVersion('v1.5.0'),
|
|
||||||
makeVersion('v2.0.0')
|
|
||||||
];
|
|
||||||
const result = findBestAncestorVersion('v2.0.1', candidates);
|
const result = findBestAncestorVersion('v2.0.1', candidates);
|
||||||
expect(result?.tag).toBe('v2.0.0');
|
expect(result?.tag).toBe('v2.0.0');
|
||||||
});
|
});
|
||||||
|
|||||||
@@ -31,7 +31,16 @@ export type RepositorySource = 'github' | 'local';
|
|||||||
export type RepositoryState = 'pending' | 'indexing' | 'indexed' | 'error';
|
export type RepositoryState = 'pending' | 'indexing' | 'indexed' | 'error';
|
||||||
export type SnippetType = 'code' | 'info';
|
export type SnippetType = 'code' | 'info';
|
||||||
export type JobStatus = 'queued' | 'running' | 'done' | 'failed';
|
export type JobStatus = 'queued' | 'running' | 'done' | 'failed';
|
||||||
export type IndexingStage = 'queued' | 'differential' | 'crawling' | 'cloning' | 'parsing' | 'storing' | 'embedding' | 'done' | 'failed';
|
export type IndexingStage =
|
||||||
|
| 'queued'
|
||||||
|
| 'differential'
|
||||||
|
| 'crawling'
|
||||||
|
| 'cloning'
|
||||||
|
| 'parsing'
|
||||||
|
| 'storing'
|
||||||
|
| 'embedding'
|
||||||
|
| 'done'
|
||||||
|
| 'failed';
|
||||||
export type VersionState = 'pending' | 'indexing' | 'indexed' | 'error';
|
export type VersionState = 'pending' | 'indexing' | 'indexed' | 'error';
|
||||||
export type EmbeddingProviderKind = 'local-transformers' | 'openai-compatible';
|
export type EmbeddingProviderKind = 'local-transformers' | 'openai-compatible';
|
||||||
|
|
||||||
|
|||||||
@@ -38,6 +38,9 @@
|
|||||||
<a href={resolveRoute('/search')} class="text-sm text-gray-600 hover:text-gray-900">
|
<a href={resolveRoute('/search')} class="text-sm text-gray-600 hover:text-gray-900">
|
||||||
Search
|
Search
|
||||||
</a>
|
</a>
|
||||||
|
<a href={resolveRoute('/admin/jobs')} class="text-sm text-gray-600 hover:text-gray-900">
|
||||||
|
Admin
|
||||||
|
</a>
|
||||||
<a href={resolveRoute('/settings')} class="text-sm text-gray-600 hover:text-gray-900">
|
<a href={resolveRoute('/settings')} class="text-sm text-gray-600 hover:text-gray-900">
|
||||||
Settings
|
Settings
|
||||||
</a>
|
</a>
|
||||||
|
|||||||
@@ -95,7 +95,10 @@
|
|||||||
}
|
}
|
||||||
|
|
||||||
function filtersDirty(): boolean {
|
function filtersDirty(): boolean {
|
||||||
return repositoryInput.trim() !== appliedRepositoryFilter || !sameStatuses(selectedStatuses, appliedStatuses);
|
return (
|
||||||
|
repositoryInput.trim() !== appliedRepositoryFilter ||
|
||||||
|
!sameStatuses(selectedStatuses, appliedStatuses)
|
||||||
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
function isSpecificRepositoryId(repositoryId: string): boolean {
|
function isSpecificRepositoryId(repositoryId: string): boolean {
|
||||||
@@ -107,7 +110,8 @@
|
|||||||
const repositoryFilter = appliedRepositoryFilter;
|
const repositoryFilter = appliedRepositoryFilter;
|
||||||
const repositoryMatches = isSpecificRepositoryId(repositoryFilter)
|
const repositoryMatches = isSpecificRepositoryId(repositoryFilter)
|
||||||
? job.repositoryId === repositoryFilter
|
? job.repositoryId === repositoryFilter
|
||||||
: job.repositoryId === repositoryFilter || job.repositoryId.startsWith(`${repositoryFilter}/`);
|
: job.repositoryId === repositoryFilter ||
|
||||||
|
job.repositoryId.startsWith(`${repositoryFilter}/`);
|
||||||
|
|
||||||
if (!repositoryMatches) {
|
if (!repositoryMatches) {
|
||||||
return false;
|
return false;
|
||||||
@@ -316,7 +320,10 @@
|
|||||||
|
|
||||||
<WorkerStatusPanel />
|
<WorkerStatusPanel />
|
||||||
|
|
||||||
<form class="mb-6 rounded-lg border border-gray-200 bg-white p-4 shadow-sm" onsubmit={applyFilters}>
|
<form
|
||||||
|
class="mb-6 rounded-lg border border-gray-200 bg-white p-4 shadow-sm"
|
||||||
|
onsubmit={applyFilters}
|
||||||
|
>
|
||||||
<div class="flex flex-col gap-4 lg:flex-row lg:items-end lg:justify-between">
|
<div class="flex flex-col gap-4 lg:flex-row lg:items-end lg:justify-between">
|
||||||
<div class="flex-1">
|
<div class="flex-1">
|
||||||
<label class="mb-2 block text-sm font-medium text-gray-700" for="repository-filter">
|
<label class="mb-2 block text-sm font-medium text-gray-700" for="repository-filter">
|
||||||
@@ -327,10 +334,11 @@
|
|||||||
type="text"
|
type="text"
|
||||||
bind:value={repositoryInput}
|
bind:value={repositoryInput}
|
||||||
placeholder="/owner or /owner/repo"
|
placeholder="/owner or /owner/repo"
|
||||||
class="w-full rounded-md border border-gray-300 px-3 py-2 text-sm text-gray-900 shadow-sm focus:border-blue-500 focus:outline-none focus:ring-2 focus:ring-blue-200"
|
class="w-full rounded-md border border-gray-300 px-3 py-2 text-sm text-gray-900 shadow-sm focus:border-blue-500 focus:ring-2 focus:ring-blue-200 focus:outline-none"
|
||||||
/>
|
/>
|
||||||
<p class="mt-2 text-xs text-gray-500">
|
<p class="mt-2 text-xs text-gray-500">
|
||||||
Use an owner prefix like <code>/facebook</code> or a full repository ID like <code>/facebook/react</code>.
|
Use an owner prefix like <code>/facebook</code> or a full repository ID like
|
||||||
|
<code>/facebook/react</code>.
|
||||||
</p>
|
</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
@@ -341,7 +349,9 @@
|
|||||||
<button
|
<button
|
||||||
type="button"
|
type="button"
|
||||||
onclick={() => toggleStatusFilter(status)}
|
onclick={() => toggleStatusFilter(status)}
|
||||||
class="rounded-full border px-3 py-1 text-xs font-semibold uppercase transition {selectedStatuses.includes(status)
|
class="rounded-full border px-3 py-1 text-xs font-semibold uppercase transition {selectedStatuses.includes(
|
||||||
|
status
|
||||||
|
)
|
||||||
? 'border-blue-600 bg-blue-50 text-blue-700'
|
? 'border-blue-600 bg-blue-50 text-blue-700'
|
||||||
: 'border-gray-300 text-gray-600 hover:border-gray-400 hover:text-gray-900'}"
|
: 'border-gray-300 text-gray-600 hover:border-gray-400 hover:text-gray-900'}"
|
||||||
>
|
>
|
||||||
@@ -370,7 +380,9 @@
|
|||||||
</div>
|
</div>
|
||||||
</form>
|
</form>
|
||||||
|
|
||||||
<div class="mb-4 flex flex-col gap-2 text-sm text-gray-600 md:flex-row md:items-center md:justify-between">
|
<div
|
||||||
|
class="mb-4 flex flex-col gap-2 text-sm text-gray-600 md:flex-row md:items-center md:justify-between"
|
||||||
|
>
|
||||||
<p>
|
<p>
|
||||||
Showing <span class="font-semibold text-gray-900">{jobs.length}</span> of
|
Showing <span class="font-semibold text-gray-900">{jobs.length}</span> of
|
||||||
<span class="font-semibold text-gray-900">{total}</span> jobs
|
<span class="font-semibold text-gray-900">{total}</span> jobs
|
||||||
@@ -466,7 +478,9 @@
|
|||||||
<td class="px-6 py-4 text-sm whitespace-nowrap text-gray-500">
|
<td class="px-6 py-4 text-sm whitespace-nowrap text-gray-500">
|
||||||
<div class="space-y-2">
|
<div class="space-y-2">
|
||||||
<div class="flex items-center gap-2">
|
<div class="flex items-center gap-2">
|
||||||
<span class="w-12 text-right text-xs font-semibold text-gray-600">{job.progress}%</span>
|
<span class="w-12 text-right text-xs font-semibold text-gray-600"
|
||||||
|
>{job.progress}%</span
|
||||||
|
>
|
||||||
<div class="h-2 w-32 rounded-full bg-gray-200">
|
<div class="h-2 w-32 rounded-full bg-gray-200">
|
||||||
<div
|
<div
|
||||||
class="h-2 rounded-full bg-blue-600 transition-all"
|
class="h-2 rounded-full bg-blue-600 transition-all"
|
||||||
@@ -553,4 +567,4 @@
|
|||||||
{/if}
|
{/if}
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<Toast bind:toasts={toasts} />
|
<Toast bind:toasts />
|
||||||
|
|||||||
@@ -36,9 +36,10 @@ function getServices(db: ReturnType<typeof getClient>) {
|
|||||||
|
|
||||||
// Load the active embedding profile from the database
|
// Load the active embedding profile from the database
|
||||||
const profileRow = db
|
const profileRow = db
|
||||||
.prepare<[], EmbeddingProfileEntityProps>(
|
.prepare<
|
||||||
'SELECT * FROM embedding_profiles WHERE is_default = 1 AND enabled = 1 LIMIT 1'
|
[],
|
||||||
)
|
EmbeddingProfileEntityProps
|
||||||
|
>('SELECT * FROM embedding_profiles WHERE is_default = 1 AND enabled = 1 LIMIT 1')
|
||||||
.get();
|
.get();
|
||||||
|
|
||||||
const profile = profileRow
|
const profile = profileRow
|
||||||
@@ -227,10 +228,7 @@ export const GET: RequestHandler = async ({ url }) => {
|
|||||||
// Fall back to commit hash prefix match (min 7 chars).
|
// Fall back to commit hash prefix match (min 7 chars).
|
||||||
if (!resolvedVersion && parsed.version.length >= 7) {
|
if (!resolvedVersion && parsed.version.length >= 7) {
|
||||||
resolvedVersion = db
|
resolvedVersion = db
|
||||||
.prepare<
|
.prepare<[string, string], RawVersionRow>(
|
||||||
[string, string],
|
|
||||||
RawVersionRow
|
|
||||||
>(
|
|
||||||
`SELECT id, tag FROM repository_versions
|
`SELECT id, tag FROM repository_versions
|
||||||
WHERE repository_id = ? AND commit_hash LIKE ?`
|
WHERE repository_id = ? AND commit_hash LIKE ?`
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -22,17 +22,23 @@ const VALID_JOB_STATUSES: ReadonlySet<IndexingJob['status']> = new Set([
|
|||||||
'failed'
|
'failed'
|
||||||
]);
|
]);
|
||||||
|
|
||||||
function parseStatusFilter(searchValue: string | null): IndexingJob['status'] | Array<IndexingJob['status']> | undefined {
|
function parseStatusFilter(
|
||||||
|
searchValue: string | null
|
||||||
|
): IndexingJob['status'] | Array<IndexingJob['status']> | undefined {
|
||||||
if (!searchValue) {
|
if (!searchValue) {
|
||||||
return undefined;
|
return undefined;
|
||||||
}
|
}
|
||||||
|
|
||||||
const statuses = [...new Set(
|
const statuses = [
|
||||||
|
...new Set(
|
||||||
searchValue
|
searchValue
|
||||||
.split(',')
|
.split(',')
|
||||||
.map((value) => value.trim())
|
.map((value) => value.trim())
|
||||||
.filter((value): value is IndexingJob['status'] => VALID_JOB_STATUSES.has(value as IndexingJob['status']))
|
.filter((value): value is IndexingJob['status'] =>
|
||||||
)];
|
VALID_JOB_STATUSES.has(value as IndexingJob['status'])
|
||||||
|
)
|
||||||
|
)
|
||||||
|
];
|
||||||
|
|
||||||
if (statuses.length === 0) {
|
if (statuses.length === 0) {
|
||||||
return undefined;
|
return undefined;
|
||||||
|
|||||||
@@ -51,7 +51,9 @@ export const GET: RequestHandler = ({ params, request }) => {
|
|||||||
if (lastEventId) {
|
if (lastEventId) {
|
||||||
const lastEvent = broadcaster.getLastEvent(jobId);
|
const lastEvent = broadcaster.getLastEvent(jobId);
|
||||||
if (lastEvent && lastEvent.id >= parseInt(lastEventId, 10)) {
|
if (lastEvent && lastEvent.id >= parseInt(lastEventId, 10)) {
|
||||||
controller.enqueue(`id: ${lastEvent.id}\nevent: ${lastEvent.event}\ndata: ${lastEvent.data}\n\n`);
|
controller.enqueue(
|
||||||
|
`id: ${lastEvent.id}\nevent: ${lastEvent.event}\ndata: ${lastEvent.data}\n\n`
|
||||||
|
);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -80,10 +82,7 @@ export const GET: RequestHandler = ({ params, request }) => {
|
|||||||
controller.enqueue(value);
|
controller.enqueue(value);
|
||||||
|
|
||||||
// Check if the incoming event indicates job completion
|
// Check if the incoming event indicates job completion
|
||||||
if (
|
if (value.includes('event: job-done') || value.includes('event: job-failed')) {
|
||||||
value.includes('event: job-done') ||
|
|
||||||
value.includes('event: job-failed')
|
|
||||||
) {
|
|
||||||
controller.close();
|
controller.close();
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
@@ -111,7 +110,7 @@ export const GET: RequestHandler = ({ params, request }) => {
|
|||||||
headers: {
|
headers: {
|
||||||
'Content-Type': 'text/event-stream',
|
'Content-Type': 'text/event-stream',
|
||||||
'Cache-Control': 'no-cache',
|
'Cache-Control': 'no-cache',
|
||||||
'Connection': 'keep-alive',
|
Connection: 'keep-alive',
|
||||||
'X-Accel-Buffering': 'no',
|
'X-Accel-Buffering': 'no',
|
||||||
'Access-Control-Allow-Origin': '*'
|
'Access-Control-Allow-Origin': '*'
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -30,7 +30,7 @@ export const GET: RequestHandler = ({ url }) => {
|
|||||||
headers: {
|
headers: {
|
||||||
'Content-Type': 'text/event-stream',
|
'Content-Type': 'text/event-stream',
|
||||||
'Cache-Control': 'no-cache',
|
'Cache-Control': 'no-cache',
|
||||||
'Connection': 'keep-alive',
|
Connection: 'keep-alive',
|
||||||
'X-Accel-Buffering': 'no',
|
'X-Accel-Buffering': 'no',
|
||||||
'Access-Control-Allow-Origin': '*'
|
'Access-Control-Allow-Origin': '*'
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -124,8 +124,10 @@ describe('POST /api/v1/libs/:id/index', () => {
|
|||||||
versionService.add('/facebook/react', 'v18.3.0', 'React v18.3.0');
|
versionService.add('/facebook/react', 'v18.3.0', 'React v18.3.0');
|
||||||
versionService.add('/facebook/react', 'v17.0.0', 'React v17.0.0');
|
versionService.add('/facebook/react', 'v17.0.0', 'React v17.0.0');
|
||||||
|
|
||||||
const enqueue = vi.fn().mockImplementation(
|
const enqueue = vi
|
||||||
(repositoryId: string, versionId?: string) => makeEnqueueJob(repositoryId, versionId)
|
.fn()
|
||||||
|
.mockImplementation((repositoryId: string, versionId?: string) =>
|
||||||
|
makeEnqueueJob(repositoryId, versionId)
|
||||||
);
|
);
|
||||||
mockQueue = { enqueue };
|
mockQueue = { enqueue };
|
||||||
|
|
||||||
@@ -158,8 +160,10 @@ describe('POST /api/v1/libs/:id/index', () => {
|
|||||||
repoService.add({ source: 'github', sourceUrl: 'https://github.com/facebook/react' });
|
repoService.add({ source: 'github', sourceUrl: 'https://github.com/facebook/react' });
|
||||||
versionService.add('/facebook/react', 'v18.3.0', 'React v18.3.0');
|
versionService.add('/facebook/react', 'v18.3.0', 'React v18.3.0');
|
||||||
|
|
||||||
const enqueue = vi.fn().mockImplementation(
|
const enqueue = vi
|
||||||
(repositoryId: string, versionId?: string) => makeEnqueueJob(repositoryId, versionId)
|
.fn()
|
||||||
|
.mockImplementation((repositoryId: string, versionId?: string) =>
|
||||||
|
makeEnqueueJob(repositoryId, versionId)
|
||||||
);
|
);
|
||||||
mockQueue = { enqueue };
|
mockQueue = { enqueue };
|
||||||
|
|
||||||
|
|||||||
@@ -49,7 +49,10 @@ function createTestDb(): Database.Database {
|
|||||||
const client = new Database(':memory:');
|
const client = new Database(':memory:');
|
||||||
client.pragma('foreign_keys = ON');
|
client.pragma('foreign_keys = ON');
|
||||||
|
|
||||||
const migrationsFolder = join(import.meta.dirname, '../../../../../../../lib/server/db/migrations');
|
const migrationsFolder = join(
|
||||||
|
import.meta.dirname,
|
||||||
|
'../../../../../../../lib/server/db/migrations'
|
||||||
|
);
|
||||||
const ftsFile = join(import.meta.dirname, '../../../../../../../lib/server/db/fts.sql');
|
const ftsFile = join(import.meta.dirname, '../../../../../../../lib/server/db/fts.sql');
|
||||||
|
|
||||||
const migration0 = readFileSync(join(migrationsFolder, '0000_large_master_chief.sql'), 'utf-8');
|
const migration0 = readFileSync(join(migrationsFolder, '0000_large_master_chief.sql'), 'utf-8');
|
||||||
|
|||||||
@@ -18,9 +18,10 @@ export const GET: RequestHandler = () => {
|
|||||||
try {
|
try {
|
||||||
const db = getClient();
|
const db = getClient();
|
||||||
const row = db
|
const row = db
|
||||||
.prepare<[], { value: string }>(
|
.prepare<
|
||||||
"SELECT value FROM settings WHERE key = 'indexing.concurrency'"
|
[],
|
||||||
)
|
{ value: string }
|
||||||
|
>("SELECT value FROM settings WHERE key = 'indexing.concurrency'")
|
||||||
.get();
|
.get();
|
||||||
|
|
||||||
let concurrency = 2;
|
let concurrency = 2;
|
||||||
@@ -54,13 +55,13 @@ export const PUT: RequestHandler = async ({ request }) => {
|
|||||||
|
|
||||||
// Validate and clamp concurrency
|
// Validate and clamp concurrency
|
||||||
const maxConcurrency = Math.max(os.cpus().length - 1, 1);
|
const maxConcurrency = Math.max(os.cpus().length - 1, 1);
|
||||||
const concurrency = Math.max(1, Math.min(parseInt(String(body.concurrency ?? 2), 10), maxConcurrency));
|
const concurrency = Math.max(
|
||||||
|
1,
|
||||||
|
Math.min(parseInt(String(body.concurrency ?? 2), 10), maxConcurrency)
|
||||||
|
);
|
||||||
|
|
||||||
if (isNaN(concurrency)) {
|
if (isNaN(concurrency)) {
|
||||||
return json(
|
return json({ error: 'Concurrency must be a valid integer' }, { status: 400 });
|
||||||
{ error: 'Concurrency must be a valid integer' },
|
|
||||||
{ status: 400 }
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
const db = getClient();
|
const db = getClient();
|
||||||
|
|||||||
@@ -18,7 +18,8 @@ import type { ProgressBroadcaster as BroadcasterType } from '$lib/server/pipelin
|
|||||||
let db: Database.Database;
|
let db: Database.Database;
|
||||||
// Closed over by the vi.mock factory below.
|
// Closed over by the vi.mock factory below.
|
||||||
let mockBroadcaster: BroadcasterType | null = null;
|
let mockBroadcaster: BroadcasterType | null = null;
|
||||||
let mockPool: { getStatus: () => object; setMaxConcurrency?: (value: number) => void } | null = null;
|
let mockPool: { getStatus: () => object; setMaxConcurrency?: (value: number) => void } | null =
|
||||||
|
null;
|
||||||
|
|
||||||
vi.mock('$lib/server/db/client', () => ({
|
vi.mock('$lib/server/db/client', () => ({
|
||||||
getClient: () => db
|
getClient: () => db
|
||||||
@@ -39,7 +40,8 @@ vi.mock('$lib/server/pipeline/startup.js', () => ({
|
|||||||
}));
|
}));
|
||||||
|
|
||||||
vi.mock('$lib/server/pipeline/progress-broadcaster', async (importOriginal) => {
|
vi.mock('$lib/server/pipeline/progress-broadcaster', async (importOriginal) => {
|
||||||
const original = await importOriginal<typeof import('$lib/server/pipeline/progress-broadcaster.js')>();
|
const original =
|
||||||
|
await importOriginal<typeof import('$lib/server/pipeline/progress-broadcaster.js')>();
|
||||||
return {
|
return {
|
||||||
...original,
|
...original,
|
||||||
getBroadcaster: () => mockBroadcaster
|
getBroadcaster: () => mockBroadcaster
|
||||||
@@ -47,7 +49,8 @@ vi.mock('$lib/server/pipeline/progress-broadcaster', async (importOriginal) => {
|
|||||||
});
|
});
|
||||||
|
|
||||||
vi.mock('$lib/server/pipeline/progress-broadcaster.js', async (importOriginal) => {
|
vi.mock('$lib/server/pipeline/progress-broadcaster.js', async (importOriginal) => {
|
||||||
const original = await importOriginal<typeof import('$lib/server/pipeline/progress-broadcaster.js')>();
|
const original =
|
||||||
|
await importOriginal<typeof import('$lib/server/pipeline/progress-broadcaster.js')>();
|
||||||
return {
|
return {
|
||||||
...original,
|
...original,
|
||||||
getBroadcaster: () => mockBroadcaster
|
getBroadcaster: () => mockBroadcaster
|
||||||
@@ -62,7 +65,10 @@ import { ProgressBroadcaster } from '$lib/server/pipeline/progress-broadcaster.j
|
|||||||
import { GET as getJobsList } from './jobs/+server.js';
|
import { GET as getJobsList } from './jobs/+server.js';
|
||||||
import { GET as getJobStream } from './jobs/[id]/stream/+server.js';
|
import { GET as getJobStream } from './jobs/[id]/stream/+server.js';
|
||||||
import { GET as getJobsStream } from './jobs/stream/+server.js';
|
import { GET as getJobsStream } from './jobs/stream/+server.js';
|
||||||
import { GET as getIndexingSettings, PUT as putIndexingSettings } from './settings/indexing/+server.js';
|
import {
|
||||||
|
GET as getIndexingSettings,
|
||||||
|
PUT as putIndexingSettings
|
||||||
|
} from './settings/indexing/+server.js';
|
||||||
import { GET as getWorkers } from './workers/+server.js';
|
import { GET as getWorkers } from './workers/+server.js';
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
@@ -84,7 +90,10 @@ function createTestDb(): Database.Database {
|
|||||||
'0005_fix_stage_defaults.sql'
|
'0005_fix_stage_defaults.sql'
|
||||||
]) {
|
]) {
|
||||||
const sql = readFileSync(join(migrationsFolder, migrationFile), 'utf-8');
|
const sql = readFileSync(join(migrationsFolder, migrationFile), 'utf-8');
|
||||||
for (const stmt of sql.split('--> statement-breakpoint').map((s) => s.trim()).filter(Boolean)) {
|
for (const stmt of sql
|
||||||
|
.split('--> statement-breakpoint')
|
||||||
|
.map((s) => s.trim())
|
||||||
|
.filter(Boolean)) {
|
||||||
client.exec(stmt);
|
client.exec(stmt);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -201,9 +210,7 @@ describe('GET /api/v1/jobs/:id/stream', () => {
|
|||||||
it('returns 404 when the job does not exist', async () => {
|
it('returns 404 when the job does not exist', async () => {
|
||||||
seedRepo(db);
|
seedRepo(db);
|
||||||
|
|
||||||
const response = await getJobStream(
|
const response = await getJobStream(makeEvent({ params: { id: 'non-existent-job-id' } }));
|
||||||
makeEvent({ params: { id: 'non-existent-job-id' } })
|
|
||||||
);
|
|
||||||
|
|
||||||
expect(response.status).toBe(404);
|
expect(response.status).toBe(404);
|
||||||
});
|
});
|
||||||
@@ -363,7 +370,9 @@ describe('GET /api/v1/jobs/stream', () => {
|
|||||||
const subscribeSpy = vi.spyOn(mockBroadcaster!, 'subscribeRepository');
|
const subscribeSpy = vi.spyOn(mockBroadcaster!, 'subscribeRepository');
|
||||||
|
|
||||||
await getJobsStream(
|
await getJobsStream(
|
||||||
makeEvent<Parameters<typeof getJobsStream>[0]>({ url: 'http://localhost/api/v1/jobs/stream?repositoryId=/test/repo' })
|
makeEvent<Parameters<typeof getJobsStream>[0]>({
|
||||||
|
url: 'http://localhost/api/v1/jobs/stream?repositoryId=/test/repo'
|
||||||
|
})
|
||||||
);
|
);
|
||||||
|
|
||||||
expect(subscribeSpy).toHaveBeenCalledWith('/test/repo');
|
expect(subscribeSpy).toHaveBeenCalledWith('/test/repo');
|
||||||
@@ -383,7 +392,9 @@ describe('GET /api/v1/jobs/stream', () => {
|
|||||||
seedRepo(db, '/repo/alpha');
|
seedRepo(db, '/repo/alpha');
|
||||||
|
|
||||||
const response = await getJobsStream(
|
const response = await getJobsStream(
|
||||||
makeEvent<Parameters<typeof getJobsStream>[0]>({ url: 'http://localhost/api/v1/jobs/stream?repositoryId=/repo/alpha' })
|
makeEvent<Parameters<typeof getJobsStream>[0]>({
|
||||||
|
url: 'http://localhost/api/v1/jobs/stream?repositoryId=/repo/alpha'
|
||||||
|
})
|
||||||
);
|
);
|
||||||
|
|
||||||
// Broadcast an event for this repository
|
// Broadcast an event for this repository
|
||||||
@@ -521,7 +532,9 @@ describe('GET /api/v1/settings/indexing', () => {
|
|||||||
});
|
});
|
||||||
|
|
||||||
it('returns { concurrency: 2 } when no setting exists in DB', async () => {
|
it('returns { concurrency: 2 } when no setting exists in DB', async () => {
|
||||||
const response = await getIndexingSettings(makeEvent<Parameters<typeof getIndexingSettings>[0]>({}));
|
const response = await getIndexingSettings(
|
||||||
|
makeEvent<Parameters<typeof getIndexingSettings>[0]>({})
|
||||||
|
);
|
||||||
const body = await response.json();
|
const body = await response.json();
|
||||||
|
|
||||||
expect(response.status).toBe(200);
|
expect(response.status).toBe(200);
|
||||||
@@ -533,7 +546,9 @@ describe('GET /api/v1/settings/indexing', () => {
|
|||||||
"INSERT INTO settings (key, value, updated_at) VALUES ('indexing.concurrency', ?, ?)"
|
"INSERT INTO settings (key, value, updated_at) VALUES ('indexing.concurrency', ?, ?)"
|
||||||
).run(JSON.stringify(4), NOW_S);
|
).run(JSON.stringify(4), NOW_S);
|
||||||
|
|
||||||
const response = await getIndexingSettings(makeEvent<Parameters<typeof getIndexingSettings>[0]>({}));
|
const response = await getIndexingSettings(
|
||||||
|
makeEvent<Parameters<typeof getIndexingSettings>[0]>({})
|
||||||
|
);
|
||||||
const body = await response.json();
|
const body = await response.json();
|
||||||
|
|
||||||
expect(body.concurrency).toBe(4);
|
expect(body.concurrency).toBe(4);
|
||||||
@@ -544,7 +559,9 @@ describe('GET /api/v1/settings/indexing', () => {
|
|||||||
"INSERT INTO settings (key, value, updated_at) VALUES ('indexing.concurrency', ?, ?)"
|
"INSERT INTO settings (key, value, updated_at) VALUES ('indexing.concurrency', ?, ?)"
|
||||||
).run(JSON.stringify({ value: 5 }), NOW_S);
|
).run(JSON.stringify({ value: 5 }), NOW_S);
|
||||||
|
|
||||||
const response = await getIndexingSettings(makeEvent<Parameters<typeof getIndexingSettings>[0]>({}));
|
const response = await getIndexingSettings(
|
||||||
|
makeEvent<Parameters<typeof getIndexingSettings>[0]>({})
|
||||||
|
);
|
||||||
const body = await response.json();
|
const body = await response.json();
|
||||||
|
|
||||||
expect(body.concurrency).toBe(5);
|
expect(body.concurrency).toBe(5);
|
||||||
@@ -600,9 +617,10 @@ describe('PUT /api/v1/settings/indexing', () => {
|
|||||||
await putIndexingSettings(makePutEvent({ concurrency: 3 }));
|
await putIndexingSettings(makePutEvent({ concurrency: 3 }));
|
||||||
|
|
||||||
const row = db
|
const row = db
|
||||||
.prepare<[], { value: string }>(
|
.prepare<
|
||||||
"SELECT value FROM settings WHERE key = 'indexing.concurrency'"
|
[],
|
||||||
)
|
{ value: string }
|
||||||
|
>("SELECT value FROM settings WHERE key = 'indexing.concurrency'")
|
||||||
.get();
|
.get();
|
||||||
|
|
||||||
expect(row).toBeDefined();
|
expect(row).toBeDefined();
|
||||||
@@ -634,9 +652,7 @@ describe('PUT /api/v1/settings/indexing', () => {
|
|||||||
// The actual flow: parseInt('abc') => NaN, Math.max(1, Math.min(NaN, max)) => NaN,
|
// The actual flow: parseInt('abc') => NaN, Math.max(1, Math.min(NaN, max)) => NaN,
|
||||||
// then `if (isNaN(concurrency))` returns 400.
|
// then `if (isNaN(concurrency))` returns 400.
|
||||||
// We pass the raw string directly.
|
// We pass the raw string directly.
|
||||||
const response = await putIndexingSettings(
|
const response = await putIndexingSettings(makePutEvent({ concurrency: 'not-a-number' }));
|
||||||
makePutEvent({ concurrency: 'not-a-number' })
|
|
||||||
);
|
|
||||||
|
|
||||||
// parseInt('not-a-number') = NaN, so the handler should return 400
|
// parseInt('not-a-number') = NaN, so the handler should return 400
|
||||||
expect(response.status).toBe(400);
|
expect(response.status).toBe(400);
|
||||||
|
|||||||
@@ -39,8 +39,11 @@
|
|||||||
indexedAt: string | null;
|
indexedAt: string | null;
|
||||||
createdAt: string;
|
createdAt: string;
|
||||||
}
|
}
|
||||||
|
type VersionStateFilter = VersionDto['state'] | 'all';
|
||||||
let versions = $state<VersionDto[]>([]);
|
let versions = $state<VersionDto[]>([]);
|
||||||
let versionsLoading = $state(false);
|
let versionsLoading = $state(false);
|
||||||
|
let activeVersionFilter = $state<VersionStateFilter>('all');
|
||||||
|
let bulkReprocessBusy = $state(false);
|
||||||
|
|
||||||
// Add version form
|
// Add version form
|
||||||
let addVersionTag = $state('');
|
let addVersionTag = $state('');
|
||||||
@@ -49,7 +52,7 @@
|
|||||||
// Discover tags state
|
// Discover tags state
|
||||||
let discoverBusy = $state(false);
|
let discoverBusy = $state(false);
|
||||||
let discoveredTags = $state<Array<{ tag: string; commitHash: string }>>([]);
|
let discoveredTags = $state<Array<{ tag: string; commitHash: string }>>([]);
|
||||||
let selectedDiscoveredTags = new SvelteSet<string>();
|
const selectedDiscoveredTags = new SvelteSet<string>();
|
||||||
let showDiscoverPanel = $state(false);
|
let showDiscoverPanel = $state(false);
|
||||||
let registerBusy = $state(false);
|
let registerBusy = $state(false);
|
||||||
|
|
||||||
@@ -76,6 +79,14 @@
|
|||||||
error: 'Error'
|
error: 'Error'
|
||||||
};
|
};
|
||||||
|
|
||||||
|
const versionFilterOptions: Array<{ value: VersionStateFilter; label: string }> = [
|
||||||
|
{ value: 'all', label: 'All' },
|
||||||
|
{ value: 'pending', label: stateLabels.pending },
|
||||||
|
{ value: 'indexing', label: stateLabels.indexing },
|
||||||
|
{ value: 'indexed', label: stateLabels.indexed },
|
||||||
|
{ value: 'error', label: stateLabels.error }
|
||||||
|
];
|
||||||
|
|
||||||
const stageLabels: Record<string, string> = {
|
const stageLabels: Record<string, string> = {
|
||||||
queued: 'Queued',
|
queued: 'Queued',
|
||||||
differential: 'Diff',
|
differential: 'Diff',
|
||||||
@@ -88,6 +99,20 @@
|
|||||||
failed: 'Failed'
|
failed: 'Failed'
|
||||||
};
|
};
|
||||||
|
|
||||||
|
const filteredVersions = $derived(
|
||||||
|
activeVersionFilter === 'all'
|
||||||
|
? versions
|
||||||
|
: versions.filter((version) => version.state === activeVersionFilter)
|
||||||
|
);
|
||||||
|
const actionableErroredTags = $derived(
|
||||||
|
versions
|
||||||
|
.filter((version) => version.state === 'error' && !activeVersionJobs[version.tag])
|
||||||
|
.map((version) => version.tag)
|
||||||
|
);
|
||||||
|
const activeVersionFilterLabel = $derived(
|
||||||
|
versionFilterOptions.find((option) => option.value === activeVersionFilter)?.label ?? 'All'
|
||||||
|
);
|
||||||
|
|
||||||
async function refreshRepo() {
|
async function refreshRepo() {
|
||||||
try {
|
try {
|
||||||
const res = await fetch(`/api/v1/libs/${encodeURIComponent(repo.id)}`);
|
const res = await fetch(`/api/v1/libs/${encodeURIComponent(repo.id)}`);
|
||||||
@@ -123,9 +148,7 @@
|
|||||||
if (!repo.id) return;
|
if (!repo.id) return;
|
||||||
|
|
||||||
let stopped = false;
|
let stopped = false;
|
||||||
const es = new EventSource(
|
const es = new EventSource(`/api/v1/jobs/stream?repositoryId=${encodeURIComponent(repo.id)}`);
|
||||||
`/api/v1/jobs/stream?repositoryId=${encodeURIComponent(repo.id)}`
|
|
||||||
);
|
|
||||||
|
|
||||||
es.addEventListener('job-progress', (event) => {
|
es.addEventListener('job-progress', (event) => {
|
||||||
if (stopped) return;
|
if (stopped) return;
|
||||||
@@ -277,6 +300,16 @@
|
|||||||
async function handleIndexVersion(tag: string) {
|
async function handleIndexVersion(tag: string) {
|
||||||
errorMessage = null;
|
errorMessage = null;
|
||||||
try {
|
try {
|
||||||
|
const jobId = await queueVersionIndex(tag);
|
||||||
|
if (jobId) {
|
||||||
|
activeVersionJobs = { ...activeVersionJobs, [tag]: jobId };
|
||||||
|
}
|
||||||
|
} catch (e) {
|
||||||
|
errorMessage = (e as Error).message;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function queueVersionIndex(tag: string): Promise<string | null> {
|
||||||
const res = await fetch(
|
const res = await fetch(
|
||||||
`/api/v1/libs/${encodeURIComponent(repo.id)}/versions/${encodeURIComponent(tag)}/index`,
|
`/api/v1/libs/${encodeURIComponent(repo.id)}/versions/${encodeURIComponent(tag)}/index`,
|
||||||
{ method: 'POST' }
|
{ method: 'POST' }
|
||||||
@@ -286,11 +319,36 @@
|
|||||||
throw new Error(d.error ?? 'Failed to queue version indexing');
|
throw new Error(d.error ?? 'Failed to queue version indexing');
|
||||||
}
|
}
|
||||||
const d = await res.json();
|
const d = await res.json();
|
||||||
if (d.job?.id) {
|
return d.job?.id ?? null;
|
||||||
activeVersionJobs = { ...activeVersionJobs, [tag]: d.job.id };
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
async function handleBulkReprocessErroredVersions() {
|
||||||
|
if (actionableErroredTags.length === 0) return;
|
||||||
|
bulkReprocessBusy = true;
|
||||||
|
errorMessage = null;
|
||||||
|
successMessage = null;
|
||||||
|
try {
|
||||||
|
const tags = [...actionableErroredTags];
|
||||||
|
const BATCH_SIZE = 5;
|
||||||
|
let next = { ...activeVersionJobs };
|
||||||
|
|
||||||
|
for (let i = 0; i < tags.length; i += BATCH_SIZE) {
|
||||||
|
const batch = tags.slice(i, i + BATCH_SIZE);
|
||||||
|
const jobIds = await Promise.all(batch.map((versionTag) => queueVersionIndex(versionTag)));
|
||||||
|
for (let j = 0; j < batch.length; j++) {
|
||||||
|
if (jobIds[j]) {
|
||||||
|
next = { ...next, [batch[j]]: jobIds[j] ?? undefined };
|
||||||
|
}
|
||||||
|
}
|
||||||
|
activeVersionJobs = next;
|
||||||
|
}
|
||||||
|
|
||||||
|
successMessage = `Queued ${tags.length} errored tag${tags.length === 1 ? '' : 's'} for reprocessing.`;
|
||||||
|
await loadVersions();
|
||||||
} catch (e) {
|
} catch (e) {
|
||||||
errorMessage = (e as Error).message;
|
errorMessage = (e as Error).message;
|
||||||
|
} finally {
|
||||||
|
bulkReprocessBusy = false;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -318,10 +376,9 @@
|
|||||||
discoverBusy = true;
|
discoverBusy = true;
|
||||||
errorMessage = null;
|
errorMessage = null;
|
||||||
try {
|
try {
|
||||||
const res = await fetch(
|
const res = await fetch(`/api/v1/libs/${encodeURIComponent(repo.id)}/versions/discover`, {
|
||||||
`/api/v1/libs/${encodeURIComponent(repo.id)}/versions/discover`,
|
method: 'POST'
|
||||||
{ method: 'POST' }
|
});
|
||||||
);
|
|
||||||
if (!res.ok) {
|
if (!res.ok) {
|
||||||
const d = await res.json();
|
const d = await res.json();
|
||||||
throw new Error(d.error ?? 'Failed to discover tags');
|
throw new Error(d.error ?? 'Failed to discover tags');
|
||||||
@@ -331,7 +388,10 @@
|
|||||||
discoveredTags = (d.tags ?? []).filter(
|
discoveredTags = (d.tags ?? []).filter(
|
||||||
(t: { tag: string; commitHash: string }) => !registeredTags.has(t.tag)
|
(t: { tag: string; commitHash: string }) => !registeredTags.has(t.tag)
|
||||||
);
|
);
|
||||||
selectedDiscoveredTags = new SvelteSet(discoveredTags.map((t) => t.tag));
|
selectedDiscoveredTags.clear();
|
||||||
|
for (const discoveredTag of discoveredTags) {
|
||||||
|
selectedDiscoveredTags.add(discoveredTag.tag);
|
||||||
|
}
|
||||||
showDiscoverPanel = true;
|
showDiscoverPanel = true;
|
||||||
} catch (e) {
|
} catch (e) {
|
||||||
errorMessage = (e as Error).message;
|
errorMessage = (e as Error).message;
|
||||||
@@ -380,7 +440,7 @@
|
|||||||
activeVersionJobs = next;
|
activeVersionJobs = next;
|
||||||
showDiscoverPanel = false;
|
showDiscoverPanel = false;
|
||||||
discoveredTags = [];
|
discoveredTags = [];
|
||||||
selectedDiscoveredTags = new SvelteSet();
|
selectedDiscoveredTags.clear();
|
||||||
await loadVersions();
|
await loadVersions();
|
||||||
} catch (e) {
|
} catch (e) {
|
||||||
errorMessage = (e as Error).message;
|
errorMessage = (e as Error).message;
|
||||||
@@ -498,9 +558,36 @@
|
|||||||
|
|
||||||
<!-- Versions -->
|
<!-- Versions -->
|
||||||
<div class="mt-6 rounded-xl border border-gray-200 bg-white p-5">
|
<div class="mt-6 rounded-xl border border-gray-200 bg-white p-5">
|
||||||
<div class="mb-4 flex flex-wrap items-center justify-between gap-3">
|
<div class="mb-4 flex flex-col gap-3">
|
||||||
|
<div class="flex flex-wrap items-center justify-between gap-3">
|
||||||
|
<div class="flex flex-wrap items-center gap-3">
|
||||||
<h2 class="text-sm font-semibold text-gray-700">Versions</h2>
|
<h2 class="text-sm font-semibold text-gray-700">Versions</h2>
|
||||||
|
<div class="flex flex-wrap items-center gap-1 rounded-lg bg-gray-100 p-1">
|
||||||
|
{#each versionFilterOptions as option (option.value)}
|
||||||
|
<button
|
||||||
|
type="button"
|
||||||
|
onclick={() => (activeVersionFilter = option.value)}
|
||||||
|
class="rounded-md px-2.5 py-1 text-xs font-medium transition-colors {activeVersionFilter ===
|
||||||
|
option.value
|
||||||
|
? 'bg-white text-gray-900 shadow-sm'
|
||||||
|
: 'text-gray-500 hover:text-gray-700'}"
|
||||||
|
>
|
||||||
|
{option.label}
|
||||||
|
</button>
|
||||||
|
{/each}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
<div class="flex flex-wrap items-center gap-2">
|
<div class="flex flex-wrap items-center gap-2">
|
||||||
|
<button
|
||||||
|
type="button"
|
||||||
|
onclick={handleBulkReprocessErroredVersions}
|
||||||
|
disabled={bulkReprocessBusy || actionableErroredTags.length === 0}
|
||||||
|
class="rounded-lg border border-red-200 px-3 py-1.5 text-sm font-medium text-red-600 hover:bg-red-50 disabled:cursor-not-allowed disabled:opacity-50"
|
||||||
|
>
|
||||||
|
{bulkReprocessBusy
|
||||||
|
? 'Reprocessing...'
|
||||||
|
: `Reprocess errored${actionableErroredTags.length > 0 ? ` (${actionableErroredTags.length})` : ''}`}
|
||||||
|
</button>
|
||||||
<!-- Add version inline form -->
|
<!-- Add version inline form -->
|
||||||
<form
|
<form
|
||||||
onsubmit={(e) => {
|
onsubmit={(e) => {
|
||||||
@@ -535,6 +622,7 @@
|
|||||||
{/if}
|
{/if}
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
<!-- Discover panel -->
|
<!-- Discover panel -->
|
||||||
{#if showDiscoverPanel}
|
{#if showDiscoverPanel}
|
||||||
@@ -549,7 +637,7 @@
|
|||||||
onclick={() => {
|
onclick={() => {
|
||||||
showDiscoverPanel = false;
|
showDiscoverPanel = false;
|
||||||
discoveredTags = [];
|
discoveredTags = [];
|
||||||
selectedDiscoveredTags = new SvelteSet();
|
selectedDiscoveredTags.clear();
|
||||||
}}
|
}}
|
||||||
class="text-xs text-blue-600 hover:underline"
|
class="text-xs text-blue-600 hover:underline"
|
||||||
>
|
>
|
||||||
@@ -567,7 +655,9 @@
|
|||||||
class="rounded border-gray-300"
|
class="rounded border-gray-300"
|
||||||
/>
|
/>
|
||||||
<span class="font-mono text-gray-800">{discovered.tag}</span>
|
<span class="font-mono text-gray-800">{discovered.tag}</span>
|
||||||
<span class="font-mono text-xs text-gray-400">{discovered.commitHash.slice(0, 8)}</span>
|
<span class="font-mono text-xs text-gray-400"
|
||||||
|
>{discovered.commitHash.slice(0, 8)}</span
|
||||||
|
>
|
||||||
</label>
|
</label>
|
||||||
{/each}
|
{/each}
|
||||||
</div>
|
</div>
|
||||||
@@ -576,9 +666,7 @@
|
|||||||
disabled={registerBusy || selectedDiscoveredTags.size === 0}
|
disabled={registerBusy || selectedDiscoveredTags.size === 0}
|
||||||
class="rounded-lg bg-blue-600 px-3 py-1.5 text-sm font-medium text-white hover:bg-blue-700 disabled:cursor-not-allowed disabled:opacity-50"
|
class="rounded-lg bg-blue-600 px-3 py-1.5 text-sm font-medium text-white hover:bg-blue-700 disabled:cursor-not-allowed disabled:opacity-50"
|
||||||
>
|
>
|
||||||
{registerBusy
|
{registerBusy ? 'Registering...' : `Register ${selectedDiscoveredTags.size} selected`}
|
||||||
? 'Registering...'
|
|
||||||
: `Register ${selectedDiscoveredTags.size} selected`}
|
|
||||||
</button>
|
</button>
|
||||||
{/if}
|
{/if}
|
||||||
</div>
|
</div>
|
||||||
@@ -589,9 +677,15 @@
|
|||||||
<p class="text-sm text-gray-400">Loading versions...</p>
|
<p class="text-sm text-gray-400">Loading versions...</p>
|
||||||
{:else if versions.length === 0}
|
{:else if versions.length === 0}
|
||||||
<p class="text-sm text-gray-400">No versions registered. Add a tag above to get started.</p>
|
<p class="text-sm text-gray-400">No versions registered. Add a tag above to get started.</p>
|
||||||
|
{:else if filteredVersions.length === 0}
|
||||||
|
<div class="rounded-lg border border-dashed border-gray-200 bg-gray-50 px-4 py-5">
|
||||||
|
<p class="text-sm text-gray-500">
|
||||||
|
No versions match the {activeVersionFilterLabel.toLowerCase()} filter.
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
{:else}
|
{:else}
|
||||||
<div class="divide-y divide-gray-100">
|
<div class="divide-y divide-gray-100">
|
||||||
{#each versions as version (version.id)}
|
{#each filteredVersions as version (version.id)}
|
||||||
<div class="py-2.5">
|
<div class="py-2.5">
|
||||||
<div class="flex items-center justify-between">
|
<div class="flex items-center justify-between">
|
||||||
<div class="flex items-center gap-3">
|
<div class="flex items-center gap-3">
|
||||||
@@ -609,7 +703,9 @@
|
|||||||
disabled={version.state === 'indexing' || !!activeVersionJobs[version.tag]}
|
disabled={version.state === 'indexing' || !!activeVersionJobs[version.tag]}
|
||||||
class="rounded-lg border border-blue-200 px-3 py-1 text-xs font-medium text-blue-600 hover:bg-blue-50 disabled:cursor-not-allowed disabled:opacity-50"
|
class="rounded-lg border border-blue-200 px-3 py-1 text-xs font-medium text-blue-600 hover:bg-blue-50 disabled:cursor-not-allowed disabled:opacity-50"
|
||||||
>
|
>
|
||||||
{version.state === 'indexing' || !!activeVersionJobs[version.tag] ? 'Indexing...' : 'Index'}
|
{version.state === 'indexing' || !!activeVersionJobs[version.tag]
|
||||||
|
? 'Indexing...'
|
||||||
|
: 'Index'}
|
||||||
</button>
|
</button>
|
||||||
<button
|
<button
|
||||||
onclick={() => (removeTag = version.tag)}
|
onclick={() => (removeTag = version.tag)}
|
||||||
@@ -625,12 +721,8 @@
|
|||||||
version.totalSnippets > 0
|
version.totalSnippets > 0
|
||||||
? { text: `${version.totalSnippets} snippets`, mono: false }
|
? { text: `${version.totalSnippets} snippets`, mono: false }
|
||||||
: null,
|
: null,
|
||||||
version.commitHash
|
version.commitHash ? { text: version.commitHash.slice(0, 8), mono: true } : null,
|
||||||
? { text: version.commitHash.slice(0, 8), mono: true }
|
version.indexedAt ? { text: formatDate(version.indexedAt), mono: false } : null
|
||||||
: null,
|
|
||||||
version.indexedAt
|
|
||||||
? { text: formatDate(version.indexedAt), mono: false }
|
|
||||||
: null
|
|
||||||
] as Array<{ text: string; mono: boolean } | null>
|
] as Array<{ text: string; mono: boolean } | null>
|
||||||
).filter((p): p is { text: string; mono: boolean } => p !== null)}
|
).filter((p): p is { text: string; mono: boolean } => p !== null)}
|
||||||
<div class="mt-1 flex items-center gap-1.5">
|
<div class="mt-1 flex items-center gap-1.5">
|
||||||
@@ -638,7 +730,8 @@
|
|||||||
{#if i > 0}
|
{#if i > 0}
|
||||||
<span class="text-xs text-gray-300">·</span>
|
<span class="text-xs text-gray-300">·</span>
|
||||||
{/if}
|
{/if}
|
||||||
<span class="text-xs text-gray-400{part.mono ? ' font-mono' : ''}">{part.text}</span>
|
<span class="text-xs text-gray-400{part.mono ? ' font-mono' : ''}">{part.text}</span
|
||||||
|
>
|
||||||
{/each}
|
{/each}
|
||||||
</div>
|
</div>
|
||||||
{/if}
|
{/if}
|
||||||
@@ -647,7 +740,9 @@
|
|||||||
<div class="mt-2">
|
<div class="mt-2">
|
||||||
<div class="flex justify-between text-xs text-gray-500">
|
<div class="flex justify-between text-xs text-gray-500">
|
||||||
<span>
|
<span>
|
||||||
{#if job?.stageDetail}{job.stageDetail}{:else}{(job?.processedFiles ?? 0).toLocaleString()} / {(job?.totalFiles ?? 0).toLocaleString()} files{/if}
|
{#if job?.stageDetail}{job.stageDetail}{:else}{(
|
||||||
|
job?.processedFiles ?? 0
|
||||||
|
).toLocaleString()} / {(job?.totalFiles ?? 0).toLocaleString()} files{/if}
|
||||||
{#if job?.stage}{' - ' + (stageLabels[job.stage] ?? job.stage)}{/if}
|
{#if job?.stage}{' - ' + (stageLabels[job.stage] ?? job.stage)}{/if}
|
||||||
</span>
|
</span>
|
||||||
<span>{job?.progress ?? 0}%</span>
|
<span>{job?.progress ?? 0}%</span>
|
||||||
|
|||||||
@@ -20,9 +20,7 @@ export const load: PageServerLoad = async () => {
|
|||||||
// Read indexing concurrency setting
|
// Read indexing concurrency setting
|
||||||
let indexingConcurrency = 2;
|
let indexingConcurrency = 2;
|
||||||
const concurrencyRow = db
|
const concurrencyRow = db
|
||||||
.prepare<[], { value: string }>(
|
.prepare<[], { value: string }>("SELECT value FROM settings WHERE key = 'indexing.concurrency'")
|
||||||
"SELECT value FROM settings WHERE key = 'indexing.concurrency'"
|
|
||||||
)
|
|
||||||
.get();
|
.get();
|
||||||
|
|
||||||
if (concurrencyRow && concurrencyRow.value) {
|
if (concurrencyRow && concurrencyRow.value) {
|
||||||
|
|||||||
@@ -199,7 +199,9 @@
|
|||||||
}
|
}
|
||||||
|
|
||||||
function getOpenAiProfile(settings: EmbeddingSettingsDto): EmbeddingProfileDto | null {
|
function getOpenAiProfile(settings: EmbeddingSettingsDto): EmbeddingProfileDto | null {
|
||||||
return settings.profiles.find((profile) => profile.providerKind === 'openai-compatible') ?? null;
|
return (
|
||||||
|
settings.profiles.find((profile) => profile.providerKind === 'openai-compatible') ?? null
|
||||||
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
function resolveProvider(profile: EmbeddingProfileDto | null): 'none' | 'openai' | 'local' {
|
function resolveProvider(profile: EmbeddingProfileDto | null): 'none' | 'openai' | 'local' {
|
||||||
@@ -210,7 +212,8 @@
|
|||||||
}
|
}
|
||||||
|
|
||||||
function resolveBaseUrl(settings: EmbeddingSettingsDto): string {
|
function resolveBaseUrl(settings: EmbeddingSettingsDto): string {
|
||||||
const profile = settings.activeProfile?.providerKind === 'openai-compatible'
|
const profile =
|
||||||
|
settings.activeProfile?.providerKind === 'openai-compatible'
|
||||||
? settings.activeProfile
|
? settings.activeProfile
|
||||||
: getOpenAiProfile(settings);
|
: getOpenAiProfile(settings);
|
||||||
return typeof profile?.config.baseUrl === 'string'
|
return typeof profile?.config.baseUrl === 'string'
|
||||||
@@ -219,16 +222,18 @@
|
|||||||
}
|
}
|
||||||
|
|
||||||
function resolveModel(settings: EmbeddingSettingsDto): string {
|
function resolveModel(settings: EmbeddingSettingsDto): string {
|
||||||
const profile = settings.activeProfile?.providerKind === 'openai-compatible'
|
const profile =
|
||||||
|
settings.activeProfile?.providerKind === 'openai-compatible'
|
||||||
? settings.activeProfile
|
? settings.activeProfile
|
||||||
: getOpenAiProfile(settings);
|
: getOpenAiProfile(settings);
|
||||||
return typeof profile?.config.model === 'string'
|
return typeof profile?.config.model === 'string'
|
||||||
? profile.config.model
|
? profile.config.model
|
||||||
: profile?.model ?? 'text-embedding-3-small';
|
: (profile?.model ?? 'text-embedding-3-small');
|
||||||
}
|
}
|
||||||
|
|
||||||
function resolveDimensions(settings: EmbeddingSettingsDto): number | undefined {
|
function resolveDimensions(settings: EmbeddingSettingsDto): number | undefined {
|
||||||
const profile = settings.activeProfile?.providerKind === 'openai-compatible'
|
const profile =
|
||||||
|
settings.activeProfile?.providerKind === 'openai-compatible'
|
||||||
? settings.activeProfile
|
? settings.activeProfile
|
||||||
: getOpenAiProfile(settings);
|
: getOpenAiProfile(settings);
|
||||||
return profile?.dimensions ?? 1536;
|
return profile?.dimensions ?? 1536;
|
||||||
@@ -296,7 +301,7 @@
|
|||||||
<dt class="font-medium text-gray-500">Provider</dt>
|
<dt class="font-medium text-gray-500">Provider</dt>
|
||||||
<dd class="font-semibold text-gray-900">{activeProfile.providerKind}</dd>
|
<dd class="font-semibold text-gray-900">{activeProfile.providerKind}</dd>
|
||||||
<dt class="font-medium text-gray-500">Model</dt>
|
<dt class="font-medium text-gray-500">Model</dt>
|
||||||
<dd class="break-all font-semibold text-gray-900">{activeProfile.model}</dd>
|
<dd class="font-semibold break-all text-gray-900">{activeProfile.model}</dd>
|
||||||
<dt class="font-medium text-gray-500">Dimensions</dt>
|
<dt class="font-medium text-gray-500">Dimensions</dt>
|
||||||
<dd class="font-semibold text-gray-900">{activeProfile.dimensions}</dd>
|
<dd class="font-semibold text-gray-900">{activeProfile.dimensions}</dd>
|
||||||
</div>
|
</div>
|
||||||
@@ -314,16 +319,20 @@
|
|||||||
|
|
||||||
<div class="rounded-lg border border-gray-200 bg-gray-50 p-4">
|
<div class="rounded-lg border border-gray-200 bg-gray-50 p-4">
|
||||||
<p class="text-sm font-medium text-gray-800">Provider configuration</p>
|
<p class="text-sm font-medium text-gray-800">Provider configuration</p>
|
||||||
<p class="mb-3 mt-1 text-sm text-gray-500">
|
<p class="mt-1 mb-3 text-sm text-gray-500">
|
||||||
These are the provider-specific settings currently saved for the active profile.
|
These are the provider-specific settings currently saved for the active profile.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
{#if activeConfigEntries.length > 0}
|
{#if activeConfigEntries.length > 0}
|
||||||
<ul class="space-y-2 text-sm">
|
<ul class="space-y-2 text-sm">
|
||||||
{#each activeConfigEntries as entry (entry.key)}
|
{#each activeConfigEntries as entry (entry.key)}
|
||||||
<li class="flex items-start justify-between gap-4 border-b border-gray-200 pb-2 last:border-b-0 last:pb-0">
|
<li
|
||||||
|
class="flex items-start justify-between gap-4 border-b border-gray-200 pb-2 last:border-b-0 last:pb-0"
|
||||||
|
>
|
||||||
<span class="font-medium text-gray-600">{entry.key}</span>
|
<span class="font-medium text-gray-600">{entry.key}</span>
|
||||||
<span class={entry.redacted ? 'text-gray-500' : 'text-gray-800'}>{entry.value}</span>
|
<span class={entry.redacted ? 'text-gray-500' : 'text-gray-800'}
|
||||||
|
>{entry.value}</span
|
||||||
|
>
|
||||||
</li>
|
</li>
|
||||||
{/each}
|
{/each}
|
||||||
</ul>
|
</ul>
|
||||||
@@ -332,9 +341,9 @@
|
|||||||
No provider-specific configuration is stored for this profile.
|
No provider-specific configuration is stored for this profile.
|
||||||
</p>
|
</p>
|
||||||
<p class="mt-2 text-sm text-gray-500">
|
<p class="mt-2 text-sm text-gray-500">
|
||||||
For <span class="font-medium text-gray-700">OpenAI-compatible</span> profiles, edit the
|
For <span class="font-medium text-gray-700">OpenAI-compatible</span> profiles, edit
|
||||||
settings in the <span class="font-medium text-gray-700">Embedding Provider</span> form
|
the settings in the <span class="font-medium text-gray-700">Embedding Provider</span>
|
||||||
below. The built-in <span class="font-medium text-gray-700">Local Model</span> profile
|
form below. The built-in <span class="font-medium text-gray-700">Local Model</span> profile
|
||||||
does not currently expose extra configurable fields.
|
does not currently expose extra configurable fields.
|
||||||
</p>
|
</p>
|
||||||
{/if}
|
{/if}
|
||||||
@@ -342,14 +351,17 @@
|
|||||||
</div>
|
</div>
|
||||||
{:else}
|
{:else}
|
||||||
<div class="rounded-lg border border-amber-200 bg-amber-50 p-4 text-sm text-amber-800">
|
<div class="rounded-lg border border-amber-200 bg-amber-50 p-4 text-sm text-amber-800">
|
||||||
Embeddings are currently disabled. Keyword search remains available, but no embedding profile is active.
|
Embeddings are currently disabled. Keyword search remains available, but no embedding
|
||||||
|
profile is active.
|
||||||
</div>
|
</div>
|
||||||
{/if}
|
{/if}
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="rounded-xl border border-gray-200 bg-white p-6">
|
<div class="rounded-xl border border-gray-200 bg-white p-6">
|
||||||
<h2 class="mb-1 text-base font-semibold text-gray-900">Profile Inventory</h2>
|
<h2 class="mb-1 text-base font-semibold text-gray-900">Profile Inventory</h2>
|
||||||
<p class="mb-4 text-sm text-gray-500">Profiles stored in the database and available for activation.</p>
|
<p class="mb-4 text-sm text-gray-500">
|
||||||
|
Profiles stored in the database and available for activation.
|
||||||
|
</p>
|
||||||
<div class="grid grid-cols-2 gap-3">
|
<div class="grid grid-cols-2 gap-3">
|
||||||
<StatBadge label="Profiles" value={String(currentSettings.profiles.length)} />
|
<StatBadge label="Profiles" value={String(currentSettings.profiles.length)} />
|
||||||
<StatBadge label="Active" value={activeProfile ? '1' : '0'} />
|
<StatBadge label="Active" value={activeProfile ? '1' : '0'} />
|
||||||
@@ -363,7 +375,9 @@
|
|||||||
<p class="text-gray-500">{profile.id}</p>
|
<p class="text-gray-500">{profile.id}</p>
|
||||||
</div>
|
</div>
|
||||||
{#if profile.id === currentSettings.activeProfileId}
|
{#if profile.id === currentSettings.activeProfileId}
|
||||||
<span class="rounded-full bg-blue-50 px-2 py-0.5 text-xs font-medium text-blue-700">Active</span>
|
<span class="rounded-full bg-blue-50 px-2 py-0.5 text-xs font-medium text-blue-700"
|
||||||
|
>Active</span
|
||||||
|
>
|
||||||
{/if}
|
{/if}
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
@@ -396,11 +410,7 @@
|
|||||||
: 'border border-gray-200 text-gray-700 hover:bg-gray-50'
|
: 'border border-gray-200 text-gray-700 hover:bg-gray-50'
|
||||||
].join(' ')}
|
].join(' ')}
|
||||||
>
|
>
|
||||||
{p === 'none'
|
{p === 'none' ? 'None (FTS5 only)' : p === 'openai' ? 'OpenAI-compatible' : 'Local Model'}
|
||||||
? 'None (FTS5 only)'
|
|
||||||
: p === 'openai'
|
|
||||||
? 'OpenAI-compatible'
|
|
||||||
: 'Local Model'}
|
|
||||||
</button>
|
</button>
|
||||||
{/each}
|
{/each}
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
Reference in New Issue
Block a user