Files
trueref/docs/FINDINGS.md
Giancarmine Salucci 6297edf109 chore(TRUEREF-0022): fix lint errors and update architecture docs
- Fix 15 ESLint errors across pipeline workers, SSE endpoints, and UI
- Replace explicit any with proper entity types in worker entries
- Remove unused imports and variables (basename, SSEEvent, getBroadcasterFn, seedRules)
- Use empty catch clauses instead of unused error variables
- Use SvelteSet for reactive Set state in repository page
- Fix operator precedence in nullish coalescing expression
- Replace $state+$effect with $derived for concurrency input
- Use resolve() directly in href for navigation lint rule
- Update ARCHITECTURE.md and FINDINGS.md for worker-thread architecture
2026-03-30 17:28:38 +02:00

195 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Findings
Last Updated: 2026-03-30T00:00:00.000Z
## Initializer Summary
- JIRA: TRUEREF-0022
- Refresh mode: REFRESH_IF_REQUIRED
- Result: Refreshed ARCHITECTURE.md and FINDINGS.md. CODE_STYLE.md remained trusted — new worker thread code follows established conventions.
## Research Performed
- Discovered 141 TypeScript/JavaScript source files (up from 110), with new pipeline worker, broadcaster, and SSE endpoint files.
- Read worker-pool.ts, worker-entry.ts, embed-worker-entry.ts, worker-types.ts, progress-broadcaster.ts, startup.ts, job-queue.ts to understand the new worker thread architecture.
- Read SSE endpoints (jobs/stream, jobs/[id]/stream) and job control endpoints (pause, resume, cancel).
- Read indexing settings endpoint and hooks.server.ts to verify startup wiring changes.
- Read build-workers.mjs and package.json to verify build system and dependency changes.
- Compared trusted cache state with current codebase to identify ARCHITECTURE.md as stale.
- Confirmed CODE_STYLE.md conventions still match the codebase — new code uses PascalCase classes, camelCase functions, tab indentation, ESM imports, and TypeScript discriminated unions consistent with existing style.
## Open Questions For Planner
- Verify whether the retrieval response contract should document the new repository and version metadata fields formally in a public API reference beyond the architecture summary.
- Verify whether parser chunking should evolve further from file-level and declaration-level boundaries to member-level semantic chunks for class-heavy codebases.
- Verify whether the SSE streaming contract (event names, data shapes) should be documented in a dedicated API reference for external consumers.
- Assess whether the WorkerPool fallback mode (main-thread execution when worker scripts are missing) needs explicit test coverage or should be removed in favour of a hard build requirement.
## Planner Notes Template
Add subsequent research below this section.
### Entry Template
- Date:
- Task:
- Files inspected:
- Findings:
- Risks / follow-ups:
### 2026-03-27 — FEEDBACK-0001 initializer refresh audit
- Task: Refresh only stale documentation after changes to retrieval, formatters, token budgeting, and parser behavior.
- Files inspected:
- `docs/docs_cache_state.yaml`
- `docs/ARCHITECTURE.md`
- `docs/CODE_STYLE.md`
- `docs/FINDINGS.md`
- `package.json`
- `src/routes/api/v1/context/+server.ts`
- `src/lib/server/api/formatters.ts`
- `src/lib/server/api/token-budget.ts`
- `src/lib/server/search/query-preprocessor.ts`
- `src/lib/server/search/search.service.ts`
- `src/lib/server/search/hybrid.search.service.ts`
- `src/lib/server/mappers/context-response.mapper.ts`
- `src/lib/server/models/context-response.ts`
- `src/lib/server/models/search-result.ts`
- `src/lib/server/parser/index.ts`
- `src/lib/server/parser/code.parser.ts`
- `src/lib/server/parser/markdown.parser.ts`
- Findings:
- The documentation cache was trusted, but the architecture summary no longer captured current retrieval behavior: query preprocessing now sanitizes punctuation-heavy input for FTS5, semantic mode can bypass FTS entirely, and auto or hybrid retrieval can fall back to vector search when keyword search returns no candidates.
- Plain-text and JSON context formatting now carry repository and version metadata, and the text formatter emits an explicit no-results section instead of an empty body.
- Token budgeting now skips individual over-budget snippets and continues evaluating lower-ranked candidates, which changes the response-selection behavior described at the architecture level.
- Parser coverage now explicitly includes Markdown, code, config, HTML-like, and plain-text inputs, so the architecture summary needed to reflect that broader file-type handling.
- The conventions documented in CODE_STYLE.md still match the current repository: strict TypeScript, tab indentation, ESM imports, Prettier and ESLint flat config, and pragmatic service-oriented server modules.
- Risks / follow-ups:
- Future cache invalidation should continue to distinguish between behavioral changes that affect architecture docs and localized implementation changes that do not affect the style guide.
- If the public API contract becomes externally versioned, the new context metadata fields likely deserve a dedicated API document instead of only architecture-level coverage.
### 2026-03-27 — FEEDBACK-0001 planning research
- Task: Plan the retrieval-fix iteration covering FTS query safety, hybrid fallback, empty-result behavior, result metadata, token budgeting, and parser chunking.
- Files inspected:
- `package.json`
- `src/routes/api/v1/context/+server.ts`
- `src/lib/server/search/query-preprocessor.ts`
- `src/lib/server/search/search.service.ts`
- `src/lib/server/search/hybrid.search.service.ts`
- `src/lib/server/search/vector.search.ts`
- `src/lib/server/api/token-budget.ts`
- `src/lib/server/api/formatters.ts`
- `src/lib/server/mappers/context-response.mapper.ts`
- `src/lib/server/models/context-response.ts`
- `src/lib/server/models/search-result.ts`
- `src/lib/server/parser/code.parser.ts`
- `src/lib/server/search/search.service.test.ts`
- `src/lib/server/search/hybrid.search.service.test.ts`
- `src/lib/server/api/formatters.test.ts`
- `src/lib/server/parser/code.parser.test.ts`
- `src/routes/api/v1/api-contract.integration.test.ts`
- `src/mcp/tools/query-docs.ts`
- `src/mcp/client.ts`
- Findings:
- `better-sqlite3` `^12.6.2` backs the affected search path; the code already uses bound parameters for `MATCH`, so the practical fix belongs in query normalization and fallback handling rather than SQL string construction.
- `query-preprocessor.ts` only strips parentheses and appends a trailing wildcard. Other code-like punctuation currently reaches the FTS execution path unsanitized.
- `search.service.ts` sends the preprocessed text directly to `snippets_fts MATCH ?` and already returns `[]` for blank processed queries.
- `hybrid.search.service.ts` always executes keyword search before semantic branching. In the current flow, an FTS parse failure can abort `auto`, `hybrid`, and `semantic` requests before vector retrieval runs.
- `vector.search.ts` already preserves `repositoryId`, `versionId`, and `profileId` filtering and does not need architectural changes for this iteration.
- `token-budget.ts` stops at the first over-budget snippet instead of skipping that item and continuing through later ranked results.
- `formatContextTxt([], [])` returns an empty string, so `/api/v1/context?type=txt` can emit an empty `200 OK` body today.
- `context-response.mapper.ts` and `context-response.ts` expose snippet content and breadcrumb/page title but do not identify local TrueRef origin, repository source metadata, or normalized snippet origin labels.
- `code.parser.ts` splits primarily at top-level declarations; class/object member functions remain in coarse chunks, which limits method-level recall for camelCase API queries.
- Existing relevant automated coverage is concentrated in the search, formatter, and parser unit tests; `/api/v1/context` contract coverage currently omits the context endpoint entirely.
- Risks / follow-ups:
- Response-shape changes must be additive because `src/mcp/client.ts`, `src/mcp/tools/query-docs.ts`, and UI consumers expect the current top-level keys to remain present.
- Parser improvements should stay inside `parseCodeFile()` and existing chunking helpers to avoid turning this fix iteration into a schema or pipeline redesign.
### 2026-03-27 — FEEDBACK-0001 SQLite FTS5 syntax research
- Task: Verify the FTS5 query-grammar constraints that affect punctuation-heavy local search queries.
- Files inspected:
- `package.json`
- `src/lib/server/search/query-preprocessor.ts`
- `src/lib/server/search/search.service.ts`
- `src/lib/server/search/hybrid.search.service.ts`
- Findings:
- `better-sqlite3` is pinned at `^12.6.2` in `package.json`, and the application binds the `MATCH` string as a parameter instead of interpolating SQL directly.
- The canonical SQLite FTS5 docs state that barewords may contain letters, digits, underscore, non-ASCII characters, and the substitute character; strings containing other punctuation must be quoted or they become syntax errors in `MATCH` expressions.
- The same docs state that prefix search is expressed by placing `*` after the token or phrase, not inside quotes, which matches the current trailing-wildcard strategy in `query-preprocessor.ts`.
- SQLite documents that FTS5 is stricter than FTS3/4 about unrecognized punctuation in query strings, which confirms that code-like user input should be normalized before it reaches `snippets_fts MATCH ?`.
- Based on the current code path, the practical fix remains application-side sanitization and fallback behavior in `query-preprocessor.ts` and `hybrid.search.service.ts`, not SQL construction changes.
- Risks / follow-ups:
- Over-sanitizing punctuation-heavy inputs could erase useful identifiers, so the implementation should preserve searchable alphanumeric and underscore tokens while discarding grammar-breaking punctuation.
- Prefix expansion should remain on the final searchable token only so the fix preserves current query-cost expectations and test semantics.
### 2026-03-27 — LINT-0001 planning research
- Task: Plan the lint-fix iteration covering the reported ESLint and eslint-plugin-svelte violations across Svelte UI, SvelteKit routes, server modules, and Vitest suites.
- Files inspected:
- `package.json`
- `eslint.config.js`
- `docs/FINDINGS.md`
- `prompts/LINT-0001/prompt.yaml`
- `prompts/LINT-0001/progress.yaml`
- `src/lib/components/FolderPicker.svelte`
- `src/lib/components/RepositoryCard.svelte`
- `src/lib/components/search/SnippetCard.svelte`
- `src/lib/server/crawler/local.crawler.test.ts`
- `src/lib/server/embeddings/embedding.service.test.ts`
- `src/lib/server/embeddings/local.provider.ts`
- `src/lib/server/embeddings/provider.ts`
- `src/lib/server/embeddings/registry.ts`
- `src/lib/server/models/context-response.ts`
- `src/lib/server/parser/code.parser.ts`
- `src/lib/server/pipeline/indexing.pipeline.ts`
- `src/lib/server/search/hybrid.search.service.test.ts`
- `src/lib/server/search/query-preprocessor.ts`
- `src/lib/server/services/repository.service.test.ts`
- `src/lib/server/services/version.service.test.ts`
- `src/lib/server/services/version.service.ts`
- `src/routes/+layout.svelte`
- `src/routes/+page.svelte`
- `src/routes/api/v1/libs/search/+server.ts`
- `src/routes/api/v1/settings/embedding/+server.ts`
- `src/routes/repos/[id]/+page.svelte`
- `src/routes/search/+page.svelte`
- `src/routes/settings/+page.svelte`
- Findings:
- The project lint stack is ESLint `^9.39.2` with `typescript-eslint` recommended rules and `eslint-plugin-svelte` recommended plus SvelteKit-aware rules, running over Svelte `^5.51.0` and SvelteKit `^2.50.2`.
- Context7 documentation for `eslint-plugin-svelte` confirms `svelte/no-navigation-without-base` flags root-relative `<a href="/...">` links and `goto('/...')` calls in SvelteKit projects; compliant fixes must use `$app/paths` base-aware links or base-prefixed `goto` calls.
- Context7 documentation for Svelte 5 confirms event handlers are regular element properties such as `onclick`, while side effects belong in `$effect`; repo memory also records that client-only fetch bootstrap should not be moved indiscriminately into `$effect` when `onMount` or load is the correct lifecycle boundary.
- Concrete navigation violations already exist in `src/routes/+layout.svelte`, `src/routes/repos/[id]/+page.svelte`, `src/routes/search/+page.svelte`, and `src/lib/components/RepositoryCard.svelte`, each using hard-coded root-relative internal navigation.
- Static diagnostics currently expose at least one direct TypeScript lint error in `src/lib/server/embeddings/registry.ts`, where `_config` is defined but never used.
- `src/routes/api/v1/libs/search/+server.ts` imports `json` from `@sveltejs/kit` without using it, making that endpoint a concrete unused-import cleanup target.
- `src/lib/server/services/version.service.ts` still uses CommonJS `require(...)` to reach git utilities from TypeScript, which is inconsistent with the repository's ESM style and is a likely lint target under the current ESLint stack.
- The affected Svelte pages and settings UI already use Svelte 5 event-property syntax, so the lint work should preserve that syntax and focus on base-aware navigation, lifecycle correctness, and unused-symbol cleanup rather than regressing to legacy `on:` directives.
- Existing automated coverage for the lint-touching backend areas already lives in `src/lib/server/crawler/local.crawler.test.ts`, `src/lib/server/embeddings/embedding.service.test.ts`, `src/lib/server/search/hybrid.search.service.test.ts`, `src/lib/server/services/repository.service.test.ts`, and `src/lib/server/services/version.service.test.ts`; route and component changes rely on build and lint validation rather than dedicated browser tests in this iteration.
- Risks / follow-ups:
- Base-aware navigation fixes must preserve internal app routing semantics and should not replace intentional external navigation, because SvelteKit `goto(...)` no longer accepts external URLs.
- Settings and search page lifecycle changes must avoid reintroducing SSR-triggered fetches or self-triggered URL loops; client-only bootstrap logic should remain mounted once and URL-sync effects must stay idempotent.
### 2026-03-27 — ROUTING-0001 planning research
- Task: Plan the repository-detail routing fix for slash-bearing repository IDs causing homepage SSR failures and invalid `/repos/[id]` navigation.
- Files inspected:
- `package.json`
- `src/lib/components/RepositoryCard.svelte`
- `src/routes/+page.svelte`
- `src/routes/+page.server.ts`
- `src/routes/repos/[id]/+page.server.ts`
- `src/routes/repos/[id]/+page.svelte`
- `src/routes/api/v1/api-contract.integration.test.ts`
- `src/lib/types.ts`
- Findings:
- The app is on SvelteKit `^2.50.2` and uses `$app/paths.resolve(...)` for internal navigation, including `resolveRoute('/repos/[id]', { id: repo.id })` in `RepositoryCard.svelte`.
- SvelteKits `[id]` route is a single-segment dynamic parameter. Context7 routing docs show slash-containing values belong to rest parameters like `[...param]`, so raw repository IDs containing `/` are invalid inputs for `resolveRoute('/repos/[id]', ...)`.
- The repository model intentionally stores slash-bearing IDs such as `/facebook/react`, and the existing API surface consistently treats those IDs as percent-encoded path segments. The integration contract already passes `params.id = encodeURIComponent('/facebook/react')` for `/api/v1/libs/[id]` handlers, which then call `decodeURIComponent(params.id)`.
- The homepage SSR failure is therefore rooted in UI link generation, not repository listing fetches: rendering `RepositoryCard.svelte` with a raw slash-bearing `repo.id` can throw before page load completes, which explains repeated `500` responses on `/`.
- The repo detail page currently forwards `params.id` directly into `encodeURIComponent(...)` for downstream API requests. Once detail links are generated as encoded single segments, the page loader and client-side refresh/delete/reindex flows need one normalization step so API calls continue targeting the stored repository ID instead of a doubly encoded value.
- No existing browser-facing test covers homepage card navigation or `/repos/[id]` loader behavior; the closest current evidence is the API contract test file, which already exercises encoded repository IDs on HTTP endpoints and provides reusable fixtures for slash-bearing IDs.
- Risks / follow-ups:
- The fix should preserve the existing `/repos/[id]` route shape instead of redesigning it to a rest route unless a broader navigation contract change is explicitly requested.
- Any normalization helper introduced for the repo detail page should be reused consistently across server load and client event handlers to avoid mixed encoded and decoded repository IDs during navigation and fetches.