- Move IndexingPipeline.run() into Worker Threads via WorkerPool
- Add dedicated embedding worker thread with single model instance
- Add stage/stageDetail columns to indexing_jobs schema
- Create ProgressBroadcaster for SSE channel management
- Add SSE endpoints: GET /api/v1/jobs/:id/stream, GET /api/v1/jobs/stream
- Replace UI polling with EventSource on repo detail and admin pages
- Add concurrency settings UI and API endpoint
- Build worker entries separately via esbuild
When a versioned query is made, getRules() now returns only the
version-specific repository_configs row. The NULL (HEAD/repo-wide)
row is no longer merged in, preventing v4 rules from bleeding into
v1/v2/v3 versioned context responses.
Tests updated to assert the isolation: versioned queries return only
their own rules row; a new test verifies that a version with no
config row returns an empty rules array even when a NULL row exists.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Version jobs now write rules only to the version-specific (repo, versionId)
row. Previously every version job unconditionally wrote to the (repo, NULL)
row as well, causing whichever version indexed last to contaminate the
repo-wide rules that the context API merges into every query response.
Adds a regression test (Bug5b) that indexes the main branch, then indexes a
version with different rules, and asserts the NULL row still holds the
main-branch rules.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When trueref.json specifies a `folders` allowlist (e.g. ["src/"]),
shouldIndexFile() excludes trueref.json itself because it lives at the
repo root. The indexing pipeline then searches crawlResult.files for the
config file, finds nothing, and never writes rules to repository_configs.
Fix (Option B): add a `config` field to CrawlResult so LocalCrawler
returns the pre-parsed config directly. The indexing pipeline now reads
crawlResult.config first instead of scanning files[], which resolves the
regression for all repos with a folders allowlist.
- Add `config?: RepoConfig` to CrawlResult in crawler/types.ts
- Return `config` from LocalCrawler.crawlDirectory()
- Update IndexingPipeline.crawl() to propagate CrawlResult.config
- Update IndexingPipeline.run() to prefer crawlResult.config over files
- Add regression tests covering the folders-allowlist exclusion scenario
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add migration 0003: recreate repository_configs with nullable version_id
column and two partial unique indexes (repo-wide: version_id IS NULL,
per-version: (repository_id, version_id) WHERE version_id IS NOT NULL)
- Update schema.ts to reflect the new composite structure with uniqueIndex
partial constraints via drizzle-orm sql helper
- IndexingPipeline: parse trueref.json / context7.json after crawl, apply
excludeFiles filter before diff computation, update totalFiles accordingly
- IndexingPipeline: persist repo-wide rules (version_id=null) and
version-specific rules (when versionId set) via upsertRepoConfig helper
- Add matchesExcludePattern static helper supporting plain filename,
glob prefix (docs/legacy*), and exact path patterns
- context endpoint: split getRules into repo-wide + version-specific lookup
with dedup merge; pass versionId at call site
- Update test DB loaders to include migration 0003
- Add pipeline tests for excludeFiles, repo-wide rules persistence, and
per-version rules persistence
- Add integration tests for merged rules, repo-only rules, and dedup logic
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bug 1: Thread version tag from run() into crawl() via getVersionTag() helper so
LocalCrawler and GithubCrawler receive the correct ref when indexing a named
version instead of always crawling HEAD.
Bug 2: Return HTTP 404 with code VERSION_NOT_FOUND when a requested version tag
is not found in repository_versions, instead of silently falling back to a
cross-version mixed result set.
Bug 4: Before returning 404, attempt a commit_hash prefix match (min 7 chars)
so callers can request a version by full or short SHA.
Bug 3: Change HybridSearchService.search() to return
{ results, searchModeUsed } and propagate searchModeUsed through
ContextResponseMetadata and ContextJsonResponseDto so callers can see which
strategy (keyword / semantic / hybrid / keyword_fallback) was actually used.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add updateVersion() helper to IndexingPipeline that writes to repository_versions
- Set version state to indexing/indexed/error at the appropriate pipeline stages
- Add computeVersionStats() to count snippets for a specific version
- Replace Map<string,string> with Record<string,string|undefined> for activeVersionJobs to fix Svelte 5 reactivity edge cases
- Remove premature loadVersions() call from handleIndexVersion (oncomplete fires it instead)
- Add refreshRepo() to version oncomplete callback so stat badges update after indexing
- Disable Index button when activeVersionJobs has an entry for that tag (not just version.state)
- Add three pipeline test cases covering versionId indexing, error, and no-touch paths
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add POST /api/v1/libs/:id/versions/discover endpoint that calls
versionService.discoverTags() for local repos and returns empty tags
gracefully for GitHub repos or git failures
- Enhance POST /api/v1/libs/:id/index to also enqueue jobs for all
registered versions on default-branch re-index, returning versionJobs
in the response
- Replace read-only Indexed Versions section with interactive Versions
panel in the repo detail page: per-version state badges, Index/Remove
buttons, inline Add version form, and Discover tags flow for local repos
- Add unit tests for both new/changed backend endpoints (8 new test cases)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs prevented secondary versions from ever being indexed:
1. JobQueue.enqueue() and RepositoryService.createIndexingJob() deduplication
only checked repository_id, so a queued default-branch job blocked all
version-specific jobs for the same repo. Fix: include version_id in the
WHERE clause so only exact (repository_id, version_id) pairs are deduped.
2. POST /api/v1/libs/:id/versions used repoService.createIndexingJob() which
inserts a job record but never triggers queue processing. Fix: use
queue.enqueue() (same fallback pattern as the libs endpoint) so setImmediate
fires processNext() after the job is inserted.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Wire local embedding provider as the default on startup when no profile is configured
- Refactor embedding settings into dedicated service, DTOs, mappers and models
- Rebuild settings page with profile management UI and live test feedback
- Expose index summary (indexed versions + embedding count) on repo endpoints
- Harden indexing pipeline and context search with additional test coverage
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extend indexing_jobs schema to support 'paused' and 'cancelled' status
- Add JobQueue methods: pauseJob(), resumeJob(), cancelJob()
- Create POST /api/v1/jobs/[id]/{pause,resume,cancel} endpoints
- Implement /admin/jobs page with auto-refresh (3s polling)
- Add JobStatusBadge component with color-coded status display
- Action buttons appear contextually based on job status
- Optimistic UI updates with error handling
- All 477 existing tests pass, no regressions
- Fix schema.test.ts: use Unix timestamp integers instead of Date objects for snippet_embeddings.createdAt
- Fix embedding.service.test.ts: use 'local-default' profile instead of non-existent 'test-profile', remove require() calls and use proper ESM imports
- Fix hybrid.search.service.test.ts: update VectorSearch.vectorSearch() calls to use options object instead of positional parameters, remove manual FTS insert (triggers handle it automatically)
- Fix migration 0002: improve SQL formatting with line breaks after statement-breakpoint comments
All 459 tests now passing (18 skipped).
- Add embedding_profiles table with provider registry pattern
- Install @xenova/transformers as runtime dependency
- Update snippet_embeddings with composite PK (snippet_id, profile_id)
- Seed default local profile using Xenova/all-MiniLM-L6-v2
- Add provider registry (local-transformers, openai-compatible)
- Update EmbeddingService to persist and retrieve by profileId
- Add version-scoped VectorSearch with optional versionId filtering
- Add searchMode (auto|keyword|semantic|hybrid) to HybridSearchService
- Update API /context route to load active profile, support searchMode/alpha params
- Extend MCP query-docs tool with searchMode and alpha parameters
- Update settings API to work with embedding_profiles table
- Add comprehensive test coverage for profiles, registry, version scoping
Status: 445/451 tests passing, core feature complete
onMount is Svelte 4 idiom. In Svelte 5 runes mode $effect is the correct
primitive for side effects and it provides additional behaviour onMount
cannot:
- IndexingProgress: $effect re-runs when jobId prop changes, restarting
the polling loop for the new job. onMount would have missed prop changes.
- search/+page.svelte: $effect with untrack() reads page.url params once
on mount without tracking the URL as a reactive dependency, preventing
goto() calls inside searchDocs() from triggering an infinite re-run loop.
Restores the page store import from $app/state.
- settings/+page.svelte: $effect with no reactive reads in the body runs
exactly once on mount — equivalent to onMount but idiomatic Svelte 5.
All three verified with svelte-autofixer: no issues.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add *.db-shm and *.db-wal to cover SQLite WAL mode files (*.db alone
was not sufficient)
- Replace blanket .claude/ ignore with .claude/ + !.claude/rules/ so
project-level Claude Code rules remain tracked while local settings
(settings.local.json, projects/, etc.) stay ignored
Machine-specific files removed from tracking in the previous refactor
commit: .claude/settings.local.json, local.db-shm, local.db-wal.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers two related concerns:
Part 1 — Git-native version indexing: resolve version tags to commit
hashes via git rev-parse, store commit_hash in the versions table, auto-
discover tags on repo registration, extract per-version file trees with
git archive to avoid disturbing the working directory, and support
explicit commitHash overrides in trueref.json.
Part 2 — Corporate deployment support: per-host HTTPS credential helpers
for Bitbucket Server and GitLab, SSH key mounting with Windows permission
fix, CA certificate handling (PEM/DER auto-detection), and the full
docker-compose and entrypoint reference configuration.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the full Docker deployment workflow including docker compose
quickstart, environment variable reference, web-only and MCP-only run
modes, health check endpoints, IDE MCP config snippets (VS Code,
IntelliJ, Claude Code), and mounting local repositories as read-only
volumes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Multi-stage Dockerfile produces a lean image with the compiled SvelteKit
app (adapter-node) and the MCP server TypeScript source. A single image
supports two run modes selected via CMD: web (default) and mcp.
- docker-entrypoint.sh handles CA certificate install (PEM/DER auto-detected
via openssl), SSH key permission fix for Windows-mounted keys, per-host
HTTPS credential helpers for Bitbucket and GitLab, DB migrations, then
starts the requested service
- docker-compose.yml runs web on :3000 and the MCP HTTP server on :3001,
with the MCP container pointed at the web service via internal DNS
- .dockerignore excludes node_modules, build output, .env files, and *.db*
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
$effect runs during SSR and re-runs on every reactive dependency change,
causing polling loops and URL reads to fire at the wrong time. onMount
runs once on the client after first render, which is the correct lifecycle
for polling, URL param reads, and async data loads.
- IndexingProgress: polling loop now starts on mount, not on reactive trigger
- search/+page.svelte: URL param init moved to onMount; use window.location
directly instead of the page store to avoid reactive re-runs
- settings/+page.svelte: config load and local provider probe moved to onMount
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace ad-hoc inline row casting (snake_case → camelCase) spread across
services, routes, and the indexing pipeline with explicit model classes
(Repository, IndexingJob, RepositoryVersion, Snippet, SearchResult) and
dedicated mapper classes that own the DB → domain conversion.
- Add src/lib/server/models/ with typed model classes for all domain entities
- Add src/lib/server/mappers/ with mapper classes per entity
- Remove duplicated RawRow interfaces and inline map functions from
job-queue, repository.service, indexing.pipeline, and all API routes
- Add dtoJsonResponse helper to standardise JSON responses via SvelteKit json()
- Add api-contract.integration.test.ts as a regression baseline
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
better-sqlite3 returns raw snake_case column names from SELECT *, so
trust_score, total_snippets etc. were not matching the camelCase keys
(trustScore, totalSnippets) that RepositoryCard.svelte reads. Added a
mapRepo() helper in the GET /api/v1/libs handler to normalise the shape
before JSON serialisation, fixing the trust score and snippet count
display on repository cards.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace onMount with $effect for initial config loading. Add auto-dismissing
green success banner (3 s) and persistent red error banner after save attempts.
Remove inline status text in favour of full-width banners with icons.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All three trigger-indexing routes were calling service.createIndexingJob()
directly which only inserts the DB record but never calls processNext().
Fixed to route through getQueue().enqueue() so the job queue actually
picks up and runs the job immediately.
Affected routes:
- POST /api/v1/libs (autoIndex on add)
- POST /api/v1/libs/:id/index
- POST /api/v1/libs/:id/versions/:tag/index
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix Svelte state_referenced_locally warning in +page.svelte and repos/[id]/+page.svelte
by initializing $state with empty defaults and syncing via $effect
- Add FolderPicker component with server-side filesystem browser
(single-click to navigate, double-click or "Select This Folder" to confirm)
- Git repos highlighted with orange folder icon and "git" badge
- Add GET /api/v1/fs/browse endpoint listing subdirectories
- Wire FolderPicker into AddRepositoryModal for local source type
- Auto-fills title from the selected folder name
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Full settings page replacing placeholder with embedding provider selector
- Provider presets: OpenAI, Ollama, Azure OpenAI
- Test Connection button via POST /api/v1/settings/embedding/test
- Warning banner for FTS5-only mode when provider=none
- Local model availability probe (@xenova/transformers)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- computeDiff classifies files into added/modified/deleted/unchanged buckets
- Only changed and new files are parsed and re-embedded on re-runs
- Deleted files removed atomically from DB
- Progress counts all files including unchanged for accurate reporting
- ~20x speedup for re-indexing large repositories with few changes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Repository list with state badges, stats, and action buttons
- Add repository modal for GitHub URLs and local paths
- Live indexing progress bar polling every 2s
- Confirm dialog for destructive actions
- Repository detail page with versions and recent jobs
- Settings page placeholder
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- VersionService with list, add, remove, getByTag, registerFromConfig
- GitHub tag discovery helper for validating tags before indexing
- Version ID format: /owner/repo/tag (e.g. /facebook/react/v18.3.0)
- GET/POST /api/v1/libs/:id/versions
- DELETE /api/v1/libs/:id/versions/:tag
- POST /api/v1/libs/:id/versions/:tag/index
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Lenient parser for trueref.json and context7.json (trueref.json takes precedence)
- Validates folders, excludeFolders, excludeFiles, rules, previousVersions
- Stores config in repository_configs table
- JSON Schema served at GET /api/v1/schema/trueref-config.json for IDE validation
- Rules injected at top of every query-docs response
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- resolve-library-id and query-docs tools with context7-identical schemas
- stdio transport for Claude Code, Cursor, and other MCP clients
- HTTP transport via StreamableHTTPServerTransport on configurable port
- /mcp endpoint with CORS and /ping health check
- mcp:start and mcp:http npm scripts
- Claude Code rule file at .claude/rules/trueref.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- SQLite-backed job queue with sequential processing and startup recovery
- Atomic snippet replacement in single transaction
- context7-compatible GET /api/v1/libs/search and GET /api/v1/context
- Token budget limiting and JSON/txt response format support
- CORS headers on all API routes via SvelteKit handle hook
- Library ID parser supporting /owner/repo and /owner/repo/version
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- BM25 ranking via SQLite FTS5 bm25() function
- Query preprocessor with wildcard expansion and special char escaping
- Library search with composite scoring (name match, trust score, snippet count)
- Trust score computation from stars, coverage, and source type
- Response formatters for library and snippet results
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>