mozempk/trueref

Fork 0

Files

Giancarmine Salucci 16436bfab2 fix(FEEDBACK-0001): complete iteration 0 - harden context search

2026-03-27 01:25:46 +01:00

7.9 KiB

Raw Blame History

Architecture

Last Updated: 2026-03-27T00:24:13.000Z

Overview

TrueRef is a TypeScript-first, self-hosted documentation retrieval platform built on SvelteKit. The repository contains a Node-targeted web application, a REST API, a Model Context Protocol server, and a server-side indexing pipeline backed by SQLite via better-sqlite3 and Drizzle ORM.

Primary language: TypeScript (110 files) with a small amount of JavaScript configuration (2 files)
Application type: Full-stack SvelteKit application with server-side indexing and retrieval services
Runtime framework: SvelteKit with adapter-node
Storage: SQLite with Drizzle-managed schema plus hand-written FTS5 setup
Testing: Vitest with separate client and server projects

Project Structure

src/routes: SvelteKit pages and HTTP endpoints, including the public UI and /api/v1 surface
src/lib/server: Backend implementation grouped by concern: api, config, crawler, db, embeddings, mappers, models, parser, pipeline, search, services, utils
src/mcp: Standalone MCP server entry point and tool handlers
static: Static assets such as robots.txt
docs/features: Feature-level implementation notes and product documentation
build: Generated SvelteKit output

Key Directories

src/routes

Contains the UI entry points and API routes. The API tree under src/routes/api/v1 is the public HTTP contract for repository management, indexing jobs, search/context retrieval, settings, filesystem browsing, and JSON schema discovery.

src/lib/server/db

Owns SQLite schema definitions, migration bootstrapping, and FTS initialization. Database startup runs through initializeDatabase(), which executes Drizzle migrations and then applies FTS5 SQL that cannot be expressed directly in the ORM.

src/lib/server/pipeline

Coordinates crawl, parse, chunk, store, and optional embedding generation work. Startup recovery marks stale jobs as failed, resets repositories stuck in indexing state, initializes singleton queue/pipeline instances, and drains queued work after restart.

src/lib/server/search

Implements keyword, vector, and hybrid retrieval. The keyword path uses SQLite FTS5 and BM25; the hybrid path blends FTS and vector search with reciprocal rank fusion.

src/lib/server/crawler and src/lib/server/parser

Convert GitHub repositories and local folders into normalized snippet records. Crawlers fetch repository contents, parsers split Markdown, code, config, HTML-like, and plain-text files into chunks, and downstream services persist searchable content.

src/mcp

Provides a thin compatibility layer over the HTTP API. The MCP server exposes resolve-library-id and query-docs over stdio or HTTP and forwards work to local tool handlers.

Design Patterns

No explicit design patterns detected from semantic analysis.
The implementation does consistently use service classes such as RepositoryService, SearchService, and HybridSearchService for business logic.
Mapping and entity layers separate raw database rows from domain objects through mapper/entity pairs such as RepositoryMapper and RepositoryEntity.
Pipeline startup uses module-level singleton state for JobQueue and IndexingPipeline lifecycle management.

Key Components

SvelteKit server bootstrap

src/hooks.server.ts initializes the database, loads persisted embedding configuration, creates the optional EmbeddingService, starts the indexing pipeline, and applies CORS headers to all /api routes.

Database layer

src/lib/server/db/schema.ts defines repositories, repository_versions, documents, snippets, embedding_profiles, snippet_embeddings, indexing_jobs, repository_configs, and settings. This schema models the indexed library catalog, retrieval corpus, embedding state, and job tracking.

Retrieval API

src/routes/api/v1/context/+server.ts validates input, resolves repository and optional version IDs, chooses keyword, semantic, or hybrid retrieval, applies token budgeting that skips oversized snippets instead of stopping early, prepends repository rules, and formats JSON or text responses with repository and version metadata.

Search engine

src/lib/server/search/search.service.ts preprocesses raw user input into FTS5-safe MATCH expressions before keyword search and repository lookup. src/lib/server/search/hybrid.search.service.ts supports explicit keyword, semantic, and hybrid modes, falls back to vector retrieval when FTS yields no candidates and an embedding provider is configured, and uses reciprocal rank fusion for blended ranking.

Repository management

src/lib/server/services/repository.service.ts provides CRUD and statistics for indexed repositories, including canonical ID generation for GitHub and local sources.

MCP surface

src/mcp/index.ts creates the MCP server, registers the two supported tools, and exposes them over stdio or streamable HTTP.

Dependencies

Production

@modelcontextprotocol/sdk: MCP server transport and protocol types
@xenova/transformers: local embedding support
better-sqlite3: synchronous SQLite driver
zod: runtime input validation for MCP tools and server helpers

Development

@sveltejs/kit and @sveltejs/adapter-node: application framework and Node deployment target
drizzle-kit and drizzle-orm: schema management and typed database access
vite and @tailwindcss/vite: bundling and Tailwind integration
vitest and @vitest/browser-playwright: server and browser test execution
eslint, typescript-eslint, eslint-plugin-svelte, prettier, prettier-plugin-svelte, prettier-plugin-tailwindcss: linting and formatting
typescript and @types/node: type-checking and Node typings

Module Organization

The backend is organized by responsibility rather than by route. HTTP handlers in src/routes/api/v1 are intentionally thin and delegate to library modules in src/lib/server. Within src/lib/server, concerns are separated into:

models and mappers for entity translation
services for repository/version operations
search for retrieval strategies
crawler and parser for indexing input transformation
pipeline for orchestration and job execution
embeddings for provider abstraction and embedding generation
api and utils for response formatting, validation, and shared helpers

The frontend and backend share the same SvelteKit repository, but most non-UI behavior is implemented on the server side.

Data Flow

Indexing flow

Server startup runs initializeDatabase() and initializePipeline() from src/hooks.server.ts.
The pipeline recovers stale jobs, initializes crawler/parser infrastructure, and resumes queued work.
Crawlers ingest GitHub or local repository contents.
Parsers split files into document and snippet records with token counts and metadata.
Database modules persist repositories, documents, snippets, versions, configs, and job state.
If an embedding provider is configured, embedding services generate vectors for snippet search.

Retrieval flow

Clients call /api/v1/libs/search, /api/v1/context, or the MCP tools.
Route handlers validate input and load the SQLite client.
Keyword search uses FTS5 via SearchService; hybrid search optionally adds vector results via HybridSearchService.
Query preprocessing normalizes punctuation-heavy or code-like input before FTS search, while semantic mode bypasses FTS and auto or hybrid mode can fall back to vector retrieval when keyword search produces no candidates.
Token budgeting walks ranked snippets in order and skips individual over-budget snippets so later matches can still be returned.
Formatters emit repository and version metadata in JSON responses and origin-aware or explicit no-result text output for plain-text responses.
MCP handlers expose the same retrieval behavior over stdio or HTTP transports.

Build System

Build command: npm run build
Test command: npm run test
Primary local run command from package.json: npm run dev
MCP entry points: npm run mcp:start and npm run mcp:http

7.9 KiB Raw Blame History