Commit Graph

37 Commits

Author SHA1 Message Date
Giancarmine Salucci
9525c58e9a feat(TRUEREF-0023): add sqlite-vec search pipeline 2026-04-01 14:09:19 +02:00
Giancarmine Salucci
ec6140e3bb TRUEREF-0022 fix, more tests 2026-03-30 19:11:09 +02:00
Giancarmine Salucci
6297edf109 chore(TRUEREF-0022): fix lint errors and update architecture docs
- Fix 15 ESLint errors across pipeline workers, SSE endpoints, and UI
- Replace explicit any with proper entity types in worker entries
- Remove unused imports and variables (basename, SSEEvent, getBroadcasterFn, seedRules)
- Use empty catch clauses instead of unused error variables
- Use SvelteSet for reactive Set state in repository page
- Fix operator precedence in nullish coalescing expression
- Replace $state+$effect with $derived for concurrency input
- Use resolve() directly in href for navigation lint rule
- Update ARCHITECTURE.md and FINDINGS.md for worker-thread architecture
2026-03-30 17:28:38 +02:00
Giancarmine Salucci
7630740403 feat(TRUEREF-0022): complete iteration 0 — worker-thread indexing, parallel jobs, SSE progress
- Move IndexingPipeline.run() into Worker Threads via WorkerPool
- Add dedicated embedding worker thread with single model instance
- Add stage/stageDetail columns to indexing_jobs schema
- Create ProgressBroadcaster for SSE channel management
- Add SSE endpoints: GET /api/v1/jobs/:id/stream, GET /api/v1/jobs/stream
- Replace UI polling with EventSource on repo detail and admin pages
- Add concurrency settings UI and API endpoint
- Build worker entries separately via esbuild
2026-03-30 17:08:23 +02:00
U811073
6f3f4db19b fix(TRUEREF-0021): reduce event loop blocking, add busy_timeout, and add TRUEREF-0022 PRD 2026-03-30 15:46:15 +02:00
U811073
f4fe8c6043 feat(TRUEREF-0021): implement differential tag indexing 2026-03-30 15:46:15 +02:00
Giancarmine Salucci
0bf01e3057 last fix 2026-03-29 12:44:06 +02:00
Giancarmine Salucci
bbc67f8064 fix(MULTIVERSION-0001): prevent version jobs from overwriting repo-wide NULL rules entry
Version jobs now write rules only to the version-specific (repo, versionId)
row. Previously every version job unconditionally wrote to the (repo, NULL)
row as well, causing whichever version indexed last to contaminate the
repo-wide rules that the context API merges into every query response.

Adds a regression test (Bug5b) that indexes the main branch, then indexes a
version with different rules, and asserts the NULL row still holds the
main-branch rules.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 01:15:58 +01:00
Giancarmine Salucci
cd4ea7112c fix(MULTIVERSION-0001): surface pre-parsed config in CrawlResult to fix rules persistence
When trueref.json specifies a `folders` allowlist (e.g. ["src/"]),
shouldIndexFile() excludes trueref.json itself because it lives at the
repo root. The indexing pipeline then searches crawlResult.files for the
config file, finds nothing, and never writes rules to repository_configs.

Fix (Option B): add a `config` field to CrawlResult so LocalCrawler
returns the pre-parsed config directly. The indexing pipeline now reads
crawlResult.config first instead of scanning files[], which resolves the
regression for all repos with a folders allowlist.

- Add `config?: RepoConfig` to CrawlResult in crawler/types.ts
- Return `config` from LocalCrawler.crawlDirectory()
- Update IndexingPipeline.crawl() to propagate CrawlResult.config
- Update IndexingPipeline.run() to prefer crawlResult.config over files
- Add regression tests covering the folders-allowlist exclusion scenario

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 17:27:53 +01:00
Giancarmine Salucci
666ec7d55f feat(MULTIVERSION-0001): wire trueref.json into pipeline + per-version rules
- Add migration 0003: recreate repository_configs with nullable version_id
  column and two partial unique indexes (repo-wide: version_id IS NULL,
  per-version: (repository_id, version_id) WHERE version_id IS NOT NULL)
- Update schema.ts to reflect the new composite structure with uniqueIndex
  partial constraints via drizzle-orm sql helper
- IndexingPipeline: parse trueref.json / context7.json after crawl, apply
  excludeFiles filter before diff computation, update totalFiles accordingly
- IndexingPipeline: persist repo-wide rules (version_id=null) and
  version-specific rules (when versionId set) via upsertRepoConfig helper
- Add matchesExcludePattern static helper supporting plain filename,
  glob prefix (docs/legacy*), and exact path patterns
- context endpoint: split getRules into repo-wide + version-specific lookup
  with dedup merge; pass versionId at call site
- Update test DB loaders to include migration 0003
- Add pipeline tests for excludeFiles, repo-wide rules persistence, and
  per-version rules persistence
- Add integration tests for merged rules, repo-only rules, and dedup logic

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:44:30 +01:00
Giancarmine Salucci
255838dcc0 fix(MULTIVERSION-0001): fix version isolation, 404 on unknown version, commit-hash lookup, and searchModeUsed
Bug 1: Thread version tag from run() into crawl() via getVersionTag() helper so
LocalCrawler and GithubCrawler receive the correct ref when indexing a named
version instead of always crawling HEAD.

Bug 2: Return HTTP 404 with code VERSION_NOT_FOUND when a requested version tag
is not found in repository_versions, instead of silently falling back to a
cross-version mixed result set.

Bug 4: Before returning 404, attempt a commit_hash prefix match (min 7 chars)
so callers can request a version by full or short SHA.

Bug 3: Change HybridSearchService.search() to return
{ results, searchModeUsed } and propagate searchModeUsed through
ContextResponseMetadata and ContextJsonResponseDto so callers can see which
strategy (keyword / semantic / hybrid / keyword_fallback) was actually used.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:31:15 +01:00
Giancarmine Salucci
417c6fd072 fix(MULTIVERSION-0001): fix version indexing pipeline state and UI reactivity
- Add updateVersion() helper to IndexingPipeline that writes to repository_versions
- Set version state to indexing/indexed/error at the appropriate pipeline stages
- Add computeVersionStats() to count snippets for a specific version
- Replace Map<string,string> with Record<string,string|undefined> for activeVersionJobs to fix Svelte 5 reactivity edge cases
- Remove premature loadVersions() call from handleIndexVersion (oncomplete fires it instead)
- Add refreshRepo() to version oncomplete callback so stat badges update after indexing
- Disable Index button when activeVersionJobs has an entry for that tag (not just version.state)
- Add three pipeline test cases covering versionId indexing, error, and no-touch paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:03:44 +01:00
Giancarmine Salucci
1c5b634ea4 fix(MULTIVERSION-0001): fix multi-version indexing — jobs never created or triggered for secondary versions
Two bugs prevented secondary versions from ever being indexed:

1. JobQueue.enqueue() and RepositoryService.createIndexingJob() deduplication
   only checked repository_id, so a queued default-branch job blocked all
   version-specific jobs for the same repo. Fix: include version_id in the
   WHERE clause so only exact (repository_id, version_id) pairs are deduped.

2. POST /api/v1/libs/:id/versions used repoService.createIndexingJob() which
   inserts a job record but never triggers queue processing. Fix: use
   queue.enqueue() (same fallback pattern as the libs endpoint) so setImmediate
   fires processNext() after the job is inserted.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 09:32:27 +01:00
Giancarmine Salucci
781d224adc feat(EMBEDDINGS-0001): enable local embedder by default and overhaul settings page
- Wire local embedding provider as the default on startup when no profile is configured
- Refactor embedding settings into dedicated service, DTOs, mappers and models
- Rebuild settings page with profile management UI and live test feedback
- Expose index summary (indexed versions + embedding count) on repo endpoints
- Harden indexing pipeline and context search with additional test coverage

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 09:28:01 +01:00
Giancarmine Salucci
da661efc91 chore(LINT-0001) fix lint errors 2026-03-27 03:01:37 +01:00
Giancarmine Salucci
5a3c27224d chore(FEEDBACK-0001): linting 2026-03-27 02:23:01 +01:00
Giancarmine Salucci
16436bfab2 fix(FEEDBACK-0001): complete iteration 0 - harden context search 2026-03-27 01:25:46 +01:00
Giancarmine Salucci
e7a2a83cdb feat(TRUEREF-0020): add job status page with pause/resume/cancel controls
- Extend indexing_jobs schema to support 'paused' and 'cancelled' status
- Add JobQueue methods: pauseJob(), resumeJob(), cancelJob()
- Create POST /api/v1/jobs/[id]/{pause,resume,cancel} endpoints
- Implement /admin/jobs page with auto-refresh (3s polling)
- Add JobStatusBadge component with color-coded status display
- Action buttons appear contextually based on job status
- Optimistic UI updates with error handling
- All 477 existing tests pass, no regressions
2026-03-25 20:38:14 +01:00
Giancarmine Salucci
9519a66cef test(embeddings): fix 6 remaining test failures
- Fix schema.test.ts: use Unix timestamp integers instead of Date objects for snippet_embeddings.createdAt
- Fix embedding.service.test.ts: use 'local-default' profile instead of non-existent 'test-profile', remove require() calls and use proper ESM imports
- Fix hybrid.search.service.test.ts: update VectorSearch.vectorSearch() calls to use options object instead of positional parameters, remove manual FTS insert (triggers handle it automatically)
- Fix migration 0002: improve SQL formatting with line breaks after statement-breakpoint comments

All 459 tests now passing (18 skipped).
2026-03-25 19:41:24 +01:00
Giancarmine Salucci
169df4d984 feat(TRUEREF-0020): add embedding profiles, default local embeddings, and version-scoped semantic retrieval
- Add embedding_profiles table with provider registry pattern
- Install @xenova/transformers as runtime dependency
- Update snippet_embeddings with composite PK (snippet_id, profile_id)
- Seed default local profile using Xenova/all-MiniLM-L6-v2
- Add provider registry (local-transformers, openai-compatible)
- Update EmbeddingService to persist and retrieve by profileId
- Add version-scoped VectorSearch with optional versionId filtering
- Add searchMode (auto|keyword|semantic|hybrid) to HybridSearchService
- Update API /context route to load active profile, support searchMode/alpha params
- Extend MCP query-docs tool with searchMode and alpha parameters
- Update settings API to work with embedding_profiles table
- Add comprehensive test coverage for profiles, registry, version scoping

Status: 445/451 tests passing, core feature complete
2026-03-25 19:16:37 +01:00
Giancarmine Salucci
fef6f66930 wip(TRUEREF-0018): commit version-scoped indexing work 2026-03-25 19:03:22 +01:00
Giancarmine Salucci
59628dd408 feat(crawler): ignore .gitingore files and folders, fallback to common ignored deps 2026-03-25 15:10:44 +01:00
Giancarmine Salucci
215cadf070 refactor: introduce domain model classes and mapper layer
Replace ad-hoc inline row casting (snake_case → camelCase) spread across
services, routes, and the indexing pipeline with explicit model classes
(Repository, IndexingJob, RepositoryVersion, Snippet, SearchResult) and
dedicated mapper classes that own the DB → domain conversion.

- Add src/lib/server/models/ with typed model classes for all domain entities
- Add src/lib/server/mappers/ with mapper classes per entity
- Remove duplicated RawRow interfaces and inline map functions from
  job-queue, repository.service, indexing.pipeline, and all API routes
- Add dtoJsonResponse helper to standardise JSON responses via SvelteKit json()
- Add api-contract.integration.test.ts as a regression baseline

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 14:29:49 +01:00
Giancarmine Salucci
7994254e23 fix(svelte) fix svelte 2026-03-24 18:41:28 +01:00
Giancarmine Salucci
9e3f62e329 feat(TRUEREF-0017): implement incremental re-indexing with checksum diff
- computeDiff classifies files into added/modified/deleted/unchanged buckets
- Only changed and new files are parsed and re-embedded on re-runs
- Deleted files removed atomically from DB
- Progress counts all files including unchanged for accurate reporting
- ~20x speedup for re-indexing large repositories with few changes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 09:07:20 +01:00
Giancarmine Salucci
542f4ce66c feat(TRUEREF-0014): implement repository version management
- VersionService with list, add, remove, getByTag, registerFromConfig
- GitHub tag discovery helper for validating tags before indexing
- Version ID format: /owner/repo/tag (e.g. /facebook/react/v18.3.0)
- GET/POST /api/v1/libs/:id/versions
- DELETE /api/v1/libs/:id/versions/:tag
- POST /api/v1/libs/:id/versions/:tag/index

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 09:06:59 +01:00
Giancarmine Salucci
f31db2db2c feat(TRUEREF-0013): implement trueref.json config file support
- Lenient parser for trueref.json and context7.json (trueref.json takes precedence)
- Validates folders, excludeFolders, excludeFiles, rules, previousVersions
- Stores config in repository_configs table
- JSON Schema served at GET /api/v1/schema/trueref-config.json for IDE validation
- Rules injected at top of every query-docs response

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 09:06:50 +01:00
Giancarmine Salucci
21f6acbfa3 feat(TRUEREF-0009-0010): implement indexing pipeline job queue and public REST API
- SQLite-backed job queue with sequential processing and startup recovery
- Atomic snippet replacement in single transaction
- context7-compatible GET /api/v1/libs/search and GET /api/v1/context
- Token budget limiting and JSON/txt response format support
- CORS headers on all API routes via SvelteKit handle hook
- Library ID parser supporting /owner/repo and /owner/repo/version

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 09:06:35 +01:00
Giancarmine Salucci
d3d577a2e2 feat(TRUEREF-0008): implement hybrid semantic search with RRF
- Cosine similarity vector search over stored embeddings
- Reciprocal Rank Fusion (K=60) combining FTS5 + vector rankings
- Configurable alpha weight between keyword and semantic search
- Graceful degradation to FTS5-only when no embedding provider configured

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 09:06:25 +01:00
Giancarmine Salucci
33bdf30709 feat(TRUEREF-0006): implement SQLite FTS5 full-text search engine
- BM25 ranking via SQLite FTS5 bm25() function
- Query preprocessor with wildcard expansion and special char escaping
- Library search with composite scoring (name match, trust score, snippet count)
- Trust score computation from stars, coverage, and source type
- Response formatters for library and snippet results

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 09:06:18 +01:00
Giancarmine Salucci
f6be3cfd47 feat(TRUEREF-0005): implement document parser and chunker
- Markdown parser with heading-based section splitting and code block extraction
- Code file parser with regex boundary detection for 10+ languages
- Sliding window chunker with configurable token limits and overlap
- Language detection from file extensions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 09:06:12 +01:00
Giancarmine Salucci
1c15d6c474 feat(TRUEREF-0003-0004): implement GitHub and local filesystem crawlers
- GitHub crawler with rate limiting, semaphore concurrency, retry logic
- File filtering by extension, size, and trueref.json rules
- Local filesystem crawler with SHA-256 checksums and progress callbacks
- Shared types and file filter logic between both crawlers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 09:06:07 +01:00
Giancarmine Salucci
956b2a3a62 feat(TRUEREF-0009): implement indexing pipeline and job queue
Implements the end-to-end indexing pipeline with a SQLite-backed job
queue, startup recovery, and REST API endpoints for job status.

- IndexingPipeline: orchestrates crawl → parse → atomic replace → embed
  → repo stats update with progress tracking at each stage
- JobQueue: sequential SQLite-backed queue (no external broker), deduplicates
  active jobs per repository, drains queued jobs on startup
- startup.ts: stale job recovery (running→failed), repo state reset, singleton
  initialization wired from hooks.server.ts
- GET /api/v1/jobs with repositoryId/status/limit filtering
- GET /api/v1/jobs/[id] single job lookup
- hooks.server.ts: initializes DB and pipeline on server start
- 18 unit tests covering queue, pipeline stages, recovery, and atomicity

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 18:22:20 +01:00
Giancarmine Salucci
bf4caf5e3b feat(TRUEREF-0007): implement pluggable embedding generation and vector storage
Add EmbeddingProvider interface with OpenAI-compatible, local (optional
@xenova/transformers via dynamic import), and Noop (FTS5-only fallback)
implementations. EmbeddingService batches requests and persists Float32Array
blobs to snippet_embeddings. GET/PUT /api/v1/settings/embedding endpoints
read and write embedding config from the settings table.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 18:07:26 +01:00
Giancarmine Salucci
3d1bef5003 feat(TRUEREF-0002): implement repository management service and REST API
Add RepositoryService with full CRUD, ID resolution helpers, input
validation, six SvelteKit API routes (GET/POST /api/v1/libs,
GET/PATCH/DELETE /api/v1/libs/:id, POST /api/v1/libs/:id/index), and
37 unit tests covering all service operations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 17:43:06 +01:00
Giancarmine Salucci
f57b622505 feat(TRUEREF-0001): implement complete database schema and core data models
Define all SQLite tables via Drizzle ORM (repositories, repository_versions,
documents, snippets, snippet_embeddings, indexing_jobs, repository_configs,
settings), generate the initial migration, create FTS5 virtual table and
sync triggers in fts.sql, add shared TypeScript types in src/lib/types.ts,
and write 21 unit tests covering insertions, cascade deletes, FK constraints,
blob storage, JSON fields, and FTS5 trigger behaviour.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 17:18:01 +01:00
Giancarmine Salucci
18437dfa7c chore: initial project scaffold 2026-03-22 17:08:15 +01:00