chore(TRUEREF-0022): fix lint errors and update architecture docs
- Fix 15 ESLint errors across pipeline workers, SSE endpoints, and UI - Replace explicit any with proper entity types in worker entries - Remove unused imports and variables (basename, SSEEvent, getBroadcasterFn, seedRules) - Use empty catch clauses instead of unused error variables - Use SvelteSet for reactive Set state in repository page - Fix operator precedence in nullish coalescing expression - Replace $state+$effect with $derived for concurrency input - Use resolve() directly in href for navigation lint rule - Update ARCHITECTURE.md and FINDINGS.md for worker-thread architecture
This commit is contained in:
@@ -1,15 +1,16 @@
|
||||
# Architecture
|
||||
|
||||
Last Updated: 2026-03-27T00:24:13.000Z
|
||||
Last Updated: 2026-03-30T00:00:00.000Z
|
||||
|
||||
## Overview
|
||||
|
||||
TrueRef is a TypeScript-first, self-hosted documentation retrieval platform built on SvelteKit. The repository contains a Node-targeted web application, a REST API, a Model Context Protocol server, and a server-side indexing pipeline backed by SQLite via better-sqlite3 and Drizzle ORM.
|
||||
TrueRef is a TypeScript-first, self-hosted documentation retrieval platform built on SvelteKit. The repository contains a Node-targeted web application, a REST API, a Model Context Protocol server, and a multi-threaded server-side indexing pipeline backed by SQLite via better-sqlite3 and Drizzle ORM.
|
||||
|
||||
- Primary language: TypeScript (110 files) with a small amount of JavaScript configuration (2 files)
|
||||
- Application type: Full-stack SvelteKit application with server-side indexing and retrieval services
|
||||
- Primary language: TypeScript (141 files) with a small amount of JavaScript configuration (2 files)
|
||||
- Application type: Full-stack SvelteKit application with worker-threaded indexing and retrieval services
|
||||
- Runtime framework: SvelteKit with adapter-node
|
||||
- Storage: SQLite with Drizzle-managed schema plus hand-written FTS5 setup
|
||||
- Storage: SQLite (WAL mode) with Drizzle-managed schema plus hand-written FTS5 setup
|
||||
- Concurrency: Node.js worker_threads for parse and embedding work
|
||||
- Testing: Vitest with separate client and server projects
|
||||
|
||||
## Project Structure
|
||||
@@ -25,7 +26,7 @@ TrueRef is a TypeScript-first, self-hosted documentation retrieval platform buil
|
||||
|
||||
### src/routes
|
||||
|
||||
Contains the UI entry points and API routes. The API tree under src/routes/api/v1 is the public HTTP contract for repository management, indexing jobs, search/context retrieval, settings, filesystem browsing, and JSON schema discovery.
|
||||
Contains the UI entry points and API routes. The API tree under src/routes/api/v1 is the public HTTP contract for repository management, indexing jobs, search/context retrieval, settings, filesystem browsing, JSON schema discovery, real-time SSE progress streaming, and job control (pause/resume/cancel).
|
||||
|
||||
### src/lib/server/db
|
||||
|
||||
@@ -33,7 +34,15 @@ Owns SQLite schema definitions, migration bootstrapping, and FTS initialization.
|
||||
|
||||
### src/lib/server/pipeline
|
||||
|
||||
Coordinates crawl, parse, chunk, store, and optional embedding generation work. Startup recovery marks stale jobs as failed, resets repositories stuck in indexing state, initializes singleton queue/pipeline instances, and drains queued work after restart.
|
||||
Coordinates crawl, parse, chunk, store, and optional embedding generation work using a worker thread pool. The pipeline module consists of:
|
||||
|
||||
- **WorkerPool** (`worker-pool.ts`): Manages a configurable number of Node.js `worker_threads` for parse jobs and an optional dedicated embed worker. Dispatches jobs round-robin to idle workers, enforces per-repository serialisation (one active job per repo), auto-respawns crashed workers, and supports runtime concurrency adjustment via `setMaxConcurrency()`. Falls back to main-thread execution when worker scripts are not found.
|
||||
- **Parse worker** (`worker-entry.ts`): Runs in a worker thread. Opens its own `better-sqlite3` connection (WAL mode, `busy_timeout = 5000`), constructs a local `IndexingPipeline` instance, and processes jobs by posting `progress`, `done`, or `failed` messages back to the parent.
|
||||
- **Embed worker** (`embed-worker-entry.ts`): Dedicated worker for embedding generation. Loads the embedding profile from the database, creates an `EmbeddingService`, and processes embed requests after the parse worker finishes a job.
|
||||
- **ProgressBroadcaster** (`progress-broadcaster.ts`): Server-side pub/sub for real-time SSE streaming. Supports per-job, per-repository, and global subscriptions. Caches the last event per job for reconnect support.
|
||||
- **Worker types** (`worker-types.ts`): Shared TypeScript discriminated union types for `ParseWorkerRequest`/`ParseWorkerResponse` and `EmbedWorkerRequest`/`EmbedWorkerResponse` message protocols.
|
||||
- **Startup** (`startup.ts`): Recovers stale jobs, constructs singleton `JobQueue`, `IndexingPipeline`, `WorkerPool`, and `ProgressBroadcaster` instances, reads concurrency settings from the database, and drains queued work after restart.
|
||||
- **JobQueue** (`job-queue.ts`): SQLite-backed queue that delegates to the `WorkerPool` when available, with pause/resume/cancel support.
|
||||
|
||||
### src/lib/server/search
|
||||
|
||||
@@ -49,16 +58,18 @@ Provides a thin compatibility layer over the HTTP API. The MCP server exposes re
|
||||
|
||||
## Design Patterns
|
||||
|
||||
- No explicit design patterns detected from semantic analysis.
|
||||
- The implementation does consistently use service classes such as RepositoryService, SearchService, and HybridSearchService for business logic.
|
||||
- Mapping and entity layers separate raw database rows from domain objects through mapper/entity pairs such as RepositoryMapper and RepositoryEntity.
|
||||
- Pipeline startup uses module-level singleton state for JobQueue and IndexingPipeline lifecycle management.
|
||||
- The WorkerPool implements an **observer/callback pattern**: the pool owner provides `onProgress`, `onJobDone`, `onJobFailed`, `onEmbedDone`, and `onEmbedFailed` callbacks at construction time, and the pool invokes them when workers post messages.
|
||||
- ProgressBroadcaster implements a **pub/sub pattern** with three subscription tiers (per-job, per-repository, global) and last-event caching for SSE reconnect.
|
||||
- The implementation consistently uses **service classes** such as RepositoryService, SearchService, and HybridSearchService for business logic.
|
||||
- Mapping and entity layers separate raw database rows from domain objects through **mapper/entity pairs** such as RepositoryMapper and RepositoryEntity.
|
||||
- Pipeline startup uses **module-level singletons** for JobQueue, IndexingPipeline, WorkerPool, and ProgressBroadcaster lifecycle management, with accessor functions (getQueue, getPool, getBroadcaster) for route handlers.
|
||||
- Worker message protocols use **TypeScript discriminated unions** (`type` field) for type-safe worker ↔ parent communication.
|
||||
|
||||
## Key Components
|
||||
|
||||
### SvelteKit server bootstrap
|
||||
|
||||
src/hooks.server.ts initializes the database, loads persisted embedding configuration, creates the optional EmbeddingService, starts the indexing pipeline, and applies CORS headers to all /api routes.
|
||||
src/hooks.server.ts initializes the database, loads persisted embedding configuration, creates the optional EmbeddingService, reads indexing concurrency settings from the database, starts the indexing pipeline with WorkerPool and ProgressBroadcaster via `initializePipeline(db, embeddingService, { concurrency, dbPath })`, and applies CORS headers to all /api routes.
|
||||
|
||||
### Database layer
|
||||
|
||||
@@ -80,6 +91,22 @@ src/lib/server/services/repository.service.ts provides CRUD and statistics for i
|
||||
|
||||
src/mcp/index.ts creates the MCP server, registers the two supported tools, and exposes them over stdio or streamable HTTP.
|
||||
|
||||
### Worker thread pool
|
||||
|
||||
src/lib/server/pipeline/worker-pool.ts manages a pool of Node.js worker threads. Parse workers run the full crawl → parse → store pipeline inside isolated threads with their own better-sqlite3 connections (WAL mode enables concurrent readers). An optional embed worker handles embedding generation in a separate thread. The pool enforces per-repository serialisation, auto-respawns crashed workers, and supports runtime concurrency changes persisted through the settings table.
|
||||
|
||||
### SSE streaming
|
||||
|
||||
src/lib/server/pipeline/progress-broadcaster.ts provides real-time Server-Sent Event streaming of indexing progress. Route handlers in src/routes/api/v1/jobs/stream and src/routes/api/v1/jobs/[id]/stream expose SSE endpoints. The broadcaster supports per-job, per-repository, and global subscriptions, with last-event caching for reconnect via the `Last-Event-ID` header.
|
||||
|
||||
### Job control
|
||||
|
||||
src/routes/api/v1/jobs/[id]/pause, resume, and cancel endpoints allow runtime control of indexing jobs. The JobQueue supports pause/resume/cancel state transitions persisted to SQLite.
|
||||
|
||||
### Indexing settings
|
||||
|
||||
src/routes/api/v1/settings/indexing exposes GET and PUT for indexing concurrency. PUT validates and clamps the value to `max(cpus - 1, 1)`, persists it to the settings table, and live-updates the WorkerPool via `setMaxConcurrency()`.
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Production
|
||||
@@ -93,6 +120,7 @@ src/mcp/index.ts creates the MCP server, registers the two supported tools, and
|
||||
|
||||
- @sveltejs/kit and @sveltejs/adapter-node: application framework and Node deployment target
|
||||
- drizzle-kit and drizzle-orm: schema management and typed database access
|
||||
- esbuild: worker thread entry point bundling (build/workers/)
|
||||
- vite and @tailwindcss/vite: bundling and Tailwind integration
|
||||
- vitest and @vitest/browser-playwright: server and browser test execution
|
||||
- eslint, typescript-eslint, eslint-plugin-svelte, prettier, prettier-plugin-svelte, prettier-plugin-tailwindcss: linting and formatting
|
||||
@@ -116,12 +144,13 @@ The frontend and backend share the same SvelteKit repository, but most non-UI be
|
||||
|
||||
### Indexing flow
|
||||
|
||||
1. Server startup runs initializeDatabase() and initializePipeline() from src/hooks.server.ts.
|
||||
2. The pipeline recovers stale jobs, initializes crawler/parser infrastructure, and resumes queued work.
|
||||
3. Crawlers ingest GitHub or local repository contents.
|
||||
4. Parsers split files into document and snippet records with token counts and metadata.
|
||||
5. Database modules persist repositories, documents, snippets, versions, configs, and job state.
|
||||
6. If an embedding provider is configured, embedding services generate vectors for snippet search.
|
||||
1. Server startup runs initializeDatabase() and initializePipeline() from src/hooks.server.ts, which creates the WorkerPool, ProgressBroadcaster, and JobQueue singletons.
|
||||
2. The pipeline recovers stale jobs (marks running → failed, indexing → error), reads concurrency settings, and resumes queued work.
|
||||
3. When a job is enqueued, the JobQueue delegates to the WorkerPool, which dispatches work to an idle parse worker thread.
|
||||
4. Each parse worker opens its own better-sqlite3 connection (WAL mode) and runs the full crawl → parse → store pipeline, posting progress messages back to the parent thread.
|
||||
5. The parent thread updates job progress in the database and broadcasts SSE events through the ProgressBroadcaster.
|
||||
6. On parse completion, if an embedding provider is configured, the WorkerPool enqueues an embed request to the dedicated embed worker, which generates vectors in its own thread.
|
||||
7. Job control endpoints allow pausing, resuming, or cancelling jobs at runtime.
|
||||
|
||||
### Retrieval flow
|
||||
|
||||
@@ -135,7 +164,8 @@ The frontend and backend share the same SvelteKit repository, but most non-UI be
|
||||
|
||||
## Build System
|
||||
|
||||
- Build command: npm run build
|
||||
- Build command: npm run build (runs `vite build` then `node scripts/build-workers.mjs`)
|
||||
- Worker bundling: scripts/build-workers.mjs uses esbuild to compile worker-entry.ts and embed-worker-entry.ts into build/workers/ as ESM bundles (.mjs), with $lib path aliases resolved and better-sqlite3/@xenova/transformers marked external
|
||||
- Test command: npm run test
|
||||
- Primary local run command from package.json: npm run dev
|
||||
- MCP entry points: npm run mcp:start and npm run mcp:http
|
||||
|
||||
Reference in New Issue
Block a user