chore(TRUEREF-0022): fix lint errors and update architecture docs

- Fix 15 ESLint errors across pipeline workers, SSE endpoints, and UI
- Replace explicit any with proper entity types in worker entries
- Remove unused imports and variables (basename, SSEEvent, getBroadcasterFn, seedRules)
- Use empty catch clauses instead of unused error variables
- Use SvelteSet for reactive Set state in repository page
- Fix operator precedence in nullish coalescing expression
- Replace $state+$effect with $derived for concurrency input
- Use resolve() directly in href for navigation lint rule
- Update ARCHITECTURE.md and FINDINGS.md for worker-thread architecture
This commit is contained in:
Giancarmine Salucci
2026-03-30 17:28:38 +02:00
parent 7630740403
commit 6297edf109
11 changed files with 85 additions and 69 deletions

View File

@@ -1,15 +1,16 @@
# Architecture
Last Updated: 2026-03-27T00:24:13.000Z
Last Updated: 2026-03-30T00:00:00.000Z
## Overview
TrueRef is a TypeScript-first, self-hosted documentation retrieval platform built on SvelteKit. The repository contains a Node-targeted web application, a REST API, a Model Context Protocol server, and a server-side indexing pipeline backed by SQLite via better-sqlite3 and Drizzle ORM.
TrueRef is a TypeScript-first, self-hosted documentation retrieval platform built on SvelteKit. The repository contains a Node-targeted web application, a REST API, a Model Context Protocol server, and a multi-threaded server-side indexing pipeline backed by SQLite via better-sqlite3 and Drizzle ORM.
- Primary language: TypeScript (110 files) with a small amount of JavaScript configuration (2 files)
- Application type: Full-stack SvelteKit application with server-side indexing and retrieval services
- Primary language: TypeScript (141 files) with a small amount of JavaScript configuration (2 files)
- Application type: Full-stack SvelteKit application with worker-threaded indexing and retrieval services
- Runtime framework: SvelteKit with adapter-node
- Storage: SQLite with Drizzle-managed schema plus hand-written FTS5 setup
- Storage: SQLite (WAL mode) with Drizzle-managed schema plus hand-written FTS5 setup
- Concurrency: Node.js worker_threads for parse and embedding work
- Testing: Vitest with separate client and server projects
## Project Structure
@@ -25,7 +26,7 @@ TrueRef is a TypeScript-first, self-hosted documentation retrieval platform buil
### src/routes
Contains the UI entry points and API routes. The API tree under src/routes/api/v1 is the public HTTP contract for repository management, indexing jobs, search/context retrieval, settings, filesystem browsing, and JSON schema discovery.
Contains the UI entry points and API routes. The API tree under src/routes/api/v1 is the public HTTP contract for repository management, indexing jobs, search/context retrieval, settings, filesystem browsing, JSON schema discovery, real-time SSE progress streaming, and job control (pause/resume/cancel).
### src/lib/server/db
@@ -33,7 +34,15 @@ Owns SQLite schema definitions, migration bootstrapping, and FTS initialization.
### src/lib/server/pipeline
Coordinates crawl, parse, chunk, store, and optional embedding generation work. Startup recovery marks stale jobs as failed, resets repositories stuck in indexing state, initializes singleton queue/pipeline instances, and drains queued work after restart.
Coordinates crawl, parse, chunk, store, and optional embedding generation work using a worker thread pool. The pipeline module consists of:
- **WorkerPool** (`worker-pool.ts`): Manages a configurable number of Node.js `worker_threads` for parse jobs and an optional dedicated embed worker. Dispatches jobs round-robin to idle workers, enforces per-repository serialisation (one active job per repo), auto-respawns crashed workers, and supports runtime concurrency adjustment via `setMaxConcurrency()`. Falls back to main-thread execution when worker scripts are not found.
- **Parse worker** (`worker-entry.ts`): Runs in a worker thread. Opens its own `better-sqlite3` connection (WAL mode, `busy_timeout = 5000`), constructs a local `IndexingPipeline` instance, and processes jobs by posting `progress`, `done`, or `failed` messages back to the parent.
- **Embed worker** (`embed-worker-entry.ts`): Dedicated worker for embedding generation. Loads the embedding profile from the database, creates an `EmbeddingService`, and processes embed requests after the parse worker finishes a job.
- **ProgressBroadcaster** (`progress-broadcaster.ts`): Server-side pub/sub for real-time SSE streaming. Supports per-job, per-repository, and global subscriptions. Caches the last event per job for reconnect support.
- **Worker types** (`worker-types.ts`): Shared TypeScript discriminated union types for `ParseWorkerRequest`/`ParseWorkerResponse` and `EmbedWorkerRequest`/`EmbedWorkerResponse` message protocols.
- **Startup** (`startup.ts`): Recovers stale jobs, constructs singleton `JobQueue`, `IndexingPipeline`, `WorkerPool`, and `ProgressBroadcaster` instances, reads concurrency settings from the database, and drains queued work after restart.
- **JobQueue** (`job-queue.ts`): SQLite-backed queue that delegates to the `WorkerPool` when available, with pause/resume/cancel support.
### src/lib/server/search
@@ -49,16 +58,18 @@ Provides a thin compatibility layer over the HTTP API. The MCP server exposes re
## Design Patterns
- No explicit design patterns detected from semantic analysis.
- The implementation does consistently use service classes such as RepositoryService, SearchService, and HybridSearchService for business logic.
- Mapping and entity layers separate raw database rows from domain objects through mapper/entity pairs such as RepositoryMapper and RepositoryEntity.
- Pipeline startup uses module-level singleton state for JobQueue and IndexingPipeline lifecycle management.
- The WorkerPool implements an **observer/callback pattern**: the pool owner provides `onProgress`, `onJobDone`, `onJobFailed`, `onEmbedDone`, and `onEmbedFailed` callbacks at construction time, and the pool invokes them when workers post messages.
- ProgressBroadcaster implements a **pub/sub pattern** with three subscription tiers (per-job, per-repository, global) and last-event caching for SSE reconnect.
- The implementation consistently uses **service classes** such as RepositoryService, SearchService, and HybridSearchService for business logic.
- Mapping and entity layers separate raw database rows from domain objects through **mapper/entity pairs** such as RepositoryMapper and RepositoryEntity.
- Pipeline startup uses **module-level singletons** for JobQueue, IndexingPipeline, WorkerPool, and ProgressBroadcaster lifecycle management, with accessor functions (getQueue, getPool, getBroadcaster) for route handlers.
- Worker message protocols use **TypeScript discriminated unions** (`type` field) for type-safe worker ↔ parent communication.
## Key Components
### SvelteKit server bootstrap
src/hooks.server.ts initializes the database, loads persisted embedding configuration, creates the optional EmbeddingService, starts the indexing pipeline, and applies CORS headers to all /api routes.
src/hooks.server.ts initializes the database, loads persisted embedding configuration, creates the optional EmbeddingService, reads indexing concurrency settings from the database, starts the indexing pipeline with WorkerPool and ProgressBroadcaster via `initializePipeline(db, embeddingService, { concurrency, dbPath })`, and applies CORS headers to all /api routes.
### Database layer
@@ -80,6 +91,22 @@ src/lib/server/services/repository.service.ts provides CRUD and statistics for i
src/mcp/index.ts creates the MCP server, registers the two supported tools, and exposes them over stdio or streamable HTTP.
### Worker thread pool
src/lib/server/pipeline/worker-pool.ts manages a pool of Node.js worker threads. Parse workers run the full crawl → parse → store pipeline inside isolated threads with their own better-sqlite3 connections (WAL mode enables concurrent readers). An optional embed worker handles embedding generation in a separate thread. The pool enforces per-repository serialisation, auto-respawns crashed workers, and supports runtime concurrency changes persisted through the settings table.
### SSE streaming
src/lib/server/pipeline/progress-broadcaster.ts provides real-time Server-Sent Event streaming of indexing progress. Route handlers in src/routes/api/v1/jobs/stream and src/routes/api/v1/jobs/[id]/stream expose SSE endpoints. The broadcaster supports per-job, per-repository, and global subscriptions, with last-event caching for reconnect via the `Last-Event-ID` header.
### Job control
src/routes/api/v1/jobs/[id]/pause, resume, and cancel endpoints allow runtime control of indexing jobs. The JobQueue supports pause/resume/cancel state transitions persisted to SQLite.
### Indexing settings
src/routes/api/v1/settings/indexing exposes GET and PUT for indexing concurrency. PUT validates and clamps the value to `max(cpus - 1, 1)`, persists it to the settings table, and live-updates the WorkerPool via `setMaxConcurrency()`.
## Dependencies
### Production
@@ -93,6 +120,7 @@ src/mcp/index.ts creates the MCP server, registers the two supported tools, and
- @sveltejs/kit and @sveltejs/adapter-node: application framework and Node deployment target
- drizzle-kit and drizzle-orm: schema management and typed database access
- esbuild: worker thread entry point bundling (build/workers/)
- vite and @tailwindcss/vite: bundling and Tailwind integration
- vitest and @vitest/browser-playwright: server and browser test execution
- eslint, typescript-eslint, eslint-plugin-svelte, prettier, prettier-plugin-svelte, prettier-plugin-tailwindcss: linting and formatting
@@ -116,12 +144,13 @@ The frontend and backend share the same SvelteKit repository, but most non-UI be
### Indexing flow
1. Server startup runs initializeDatabase() and initializePipeline() from src/hooks.server.ts.
2. The pipeline recovers stale jobs, initializes crawler/parser infrastructure, and resumes queued work.
3. Crawlers ingest GitHub or local repository contents.
4. Parsers split files into document and snippet records with token counts and metadata.
5. Database modules persist repositories, documents, snippets, versions, configs, and job state.
6. If an embedding provider is configured, embedding services generate vectors for snippet search.
1. Server startup runs initializeDatabase() and initializePipeline() from src/hooks.server.ts, which creates the WorkerPool, ProgressBroadcaster, and JobQueue singletons.
2. The pipeline recovers stale jobs (marks running → failed, indexing → error), reads concurrency settings, and resumes queued work.
3. When a job is enqueued, the JobQueue delegates to the WorkerPool, which dispatches work to an idle parse worker thread.
4. Each parse worker opens its own better-sqlite3 connection (WAL mode) and runs the full crawl → parse → store pipeline, posting progress messages back to the parent thread.
5. The parent thread updates job progress in the database and broadcasts SSE events through the ProgressBroadcaster.
6. On parse completion, if an embedding provider is configured, the WorkerPool enqueues an embed request to the dedicated embed worker, which generates vectors in its own thread.
7. Job control endpoints allow pausing, resuming, or cancelling jobs at runtime.
### Retrieval flow
@@ -135,7 +164,8 @@ The frontend and backend share the same SvelteKit repository, but most non-UI be
## Build System
- Build command: npm run build
- Build command: npm run build (runs `vite build` then `node scripts/build-workers.mjs`)
- Worker bundling: scripts/build-workers.mjs uses esbuild to compile worker-entry.ts and embed-worker-entry.ts into build/workers/ as ESM bundles (.mjs), with $lib path aliases resolved and better-sqlite3/@xenova/transformers marked external
- Test command: npm run test
- Primary local run command from package.json: npm run dev
- MCP entry points: npm run mcp:start and npm run mcp:http

View File

@@ -1,25 +1,29 @@
# Findings
Last Updated: 2026-03-27T00:24:13.000Z
Last Updated: 2026-03-30T00:00:00.000Z
## Initializer Summary
- JIRA: FEEDBACK-0001
- JIRA: TRUEREF-0022
- Refresh mode: REFRESH_IF_REQUIRED
- Result: refreshed affected documentation only. ARCHITECTURE.md and FINDINGS.md were updated from current repository analysis; CODE_STYLE.md remained trusted and unchanged because the documented conventions still match the codebase.
- Result: Refreshed ARCHITECTURE.md and FINDINGS.md. CODE_STYLE.md remained trusted — new worker thread code follows established conventions.
## Research Performed
- Discovered source-language distribution, dependency manifest, import patterns, and project structure.
- Read the retrieval, formatter, token-budget, parser, mapper, and response-model modules affected by the latest implementation changes.
- Compared the trusted cache state with current behavior to identify which documentation files were actually stale.
- Confirmed package scripts for build and test.
- Confirmed Linux-native md5sum availability for documentation trust metadata.
- Discovered 141 TypeScript/JavaScript source files (up from 110), with new pipeline worker, broadcaster, and SSE endpoint files.
- Read worker-pool.ts, worker-entry.ts, embed-worker-entry.ts, worker-types.ts, progress-broadcaster.ts, startup.ts, job-queue.ts to understand the new worker thread architecture.
- Read SSE endpoints (jobs/stream, jobs/[id]/stream) and job control endpoints (pause, resume, cancel).
- Read indexing settings endpoint and hooks.server.ts to verify startup wiring changes.
- Read build-workers.mjs and package.json to verify build system and dependency changes.
- Compared trusted cache state with current codebase to identify ARCHITECTURE.md as stale.
- Confirmed CODE_STYLE.md conventions still match the codebase — new code uses PascalCase classes, camelCase functions, tab indentation, ESM imports, and TypeScript discriminated unions consistent with existing style.
## Open Questions For Planner
- Verify whether the retrieval response contract should document the new repository and version metadata fields formally in a public API reference beyond the architecture summary.
- Verify whether parser chunking should evolve further from file-level and declaration-level boundaries to member-level semantic chunks for class-heavy codebases.
- Verify whether the SSE streaming contract (event names, data shapes) should be documented in a dedicated API reference for external consumers.
- Assess whether the WorkerPool fallback mode (main-thread execution when worker scripts are missing) needs explicit test coverage or should be removed in favour of a hard build requirement.
## Planner Notes Template