Files
trueref/docs/FINDINGS.md
2026-03-27 03:01:37 +01:00

168 lines
13 KiB
Markdown

# Findings
Last Updated: 2026-03-27T00:24:13.000Z
## Initializer Summary
- JIRA: FEEDBACK-0001
- Refresh mode: REFRESH_IF_REQUIRED
- Result: refreshed affected documentation only. ARCHITECTURE.md and FINDINGS.md were updated from current repository analysis; CODE_STYLE.md remained trusted and unchanged because the documented conventions still match the codebase.
## Research Performed
- Discovered source-language distribution, dependency manifest, import patterns, and project structure.
- Read the retrieval, formatter, token-budget, parser, mapper, and response-model modules affected by the latest implementation changes.
- Compared the trusted cache state with current behavior to identify which documentation files were actually stale.
- Confirmed package scripts for build and test.
- Confirmed Linux-native md5sum availability for documentation trust metadata.
## Open Questions For Planner
- Verify whether the retrieval response contract should document the new repository and version metadata fields formally in a public API reference beyond the architecture summary.
- Verify whether parser chunking should evolve further from file-level and declaration-level boundaries to member-level semantic chunks for class-heavy codebases.
## Planner Notes Template
Add subsequent research below this section.
### Entry Template
- Date:
- Task:
- Files inspected:
- Findings:
- Risks / follow-ups:
### 2026-03-27 — FEEDBACK-0001 initializer refresh audit
- Task: Refresh only stale documentation after changes to retrieval, formatters, token budgeting, and parser behavior.
- Files inspected:
- `docs/docs_cache_state.yaml`
- `docs/ARCHITECTURE.md`
- `docs/CODE_STYLE.md`
- `docs/FINDINGS.md`
- `package.json`
- `src/routes/api/v1/context/+server.ts`
- `src/lib/server/api/formatters.ts`
- `src/lib/server/api/token-budget.ts`
- `src/lib/server/search/query-preprocessor.ts`
- `src/lib/server/search/search.service.ts`
- `src/lib/server/search/hybrid.search.service.ts`
- `src/lib/server/mappers/context-response.mapper.ts`
- `src/lib/server/models/context-response.ts`
- `src/lib/server/models/search-result.ts`
- `src/lib/server/parser/index.ts`
- `src/lib/server/parser/code.parser.ts`
- `src/lib/server/parser/markdown.parser.ts`
- Findings:
- The documentation cache was trusted, but the architecture summary no longer captured current retrieval behavior: query preprocessing now sanitizes punctuation-heavy input for FTS5, semantic mode can bypass FTS entirely, and auto or hybrid retrieval can fall back to vector search when keyword search returns no candidates.
- Plain-text and JSON context formatting now carry repository and version metadata, and the text formatter emits an explicit no-results section instead of an empty body.
- Token budgeting now skips individual over-budget snippets and continues evaluating lower-ranked candidates, which changes the response-selection behavior described at the architecture level.
- Parser coverage now explicitly includes Markdown, code, config, HTML-like, and plain-text inputs, so the architecture summary needed to reflect that broader file-type handling.
- The conventions documented in CODE_STYLE.md still match the current repository: strict TypeScript, tab indentation, ESM imports, Prettier and ESLint flat config, and pragmatic service-oriented server modules.
- Risks / follow-ups:
- Future cache invalidation should continue to distinguish between behavioral changes that affect architecture docs and localized implementation changes that do not affect the style guide.
- If the public API contract becomes externally versioned, the new context metadata fields likely deserve a dedicated API document instead of only architecture-level coverage.
### 2026-03-27 — FEEDBACK-0001 planning research
- Task: Plan the retrieval-fix iteration covering FTS query safety, hybrid fallback, empty-result behavior, result metadata, token budgeting, and parser chunking.
- Files inspected:
- `package.json`
- `src/routes/api/v1/context/+server.ts`
- `src/lib/server/search/query-preprocessor.ts`
- `src/lib/server/search/search.service.ts`
- `src/lib/server/search/hybrid.search.service.ts`
- `src/lib/server/search/vector.search.ts`
- `src/lib/server/api/token-budget.ts`
- `src/lib/server/api/formatters.ts`
- `src/lib/server/mappers/context-response.mapper.ts`
- `src/lib/server/models/context-response.ts`
- `src/lib/server/models/search-result.ts`
- `src/lib/server/parser/code.parser.ts`
- `src/lib/server/search/search.service.test.ts`
- `src/lib/server/search/hybrid.search.service.test.ts`
- `src/lib/server/api/formatters.test.ts`
- `src/lib/server/parser/code.parser.test.ts`
- `src/routes/api/v1/api-contract.integration.test.ts`
- `src/mcp/tools/query-docs.ts`
- `src/mcp/client.ts`
- Findings:
- `better-sqlite3` `^12.6.2` backs the affected search path; the code already uses bound parameters for `MATCH`, so the practical fix belongs in query normalization and fallback handling rather than SQL string construction.
- `query-preprocessor.ts` only strips parentheses and appends a trailing wildcard. Other code-like punctuation currently reaches the FTS execution path unsanitized.
- `search.service.ts` sends the preprocessed text directly to `snippets_fts MATCH ?` and already returns `[]` for blank processed queries.
- `hybrid.search.service.ts` always executes keyword search before semantic branching. In the current flow, an FTS parse failure can abort `auto`, `hybrid`, and `semantic` requests before vector retrieval runs.
- `vector.search.ts` already preserves `repositoryId`, `versionId`, and `profileId` filtering and does not need architectural changes for this iteration.
- `token-budget.ts` stops at the first over-budget snippet instead of skipping that item and continuing through later ranked results.
- `formatContextTxt([], [])` returns an empty string, so `/api/v1/context?type=txt` can emit an empty `200 OK` body today.
- `context-response.mapper.ts` and `context-response.ts` expose snippet content and breadcrumb/page title but do not identify local TrueRef origin, repository source metadata, or normalized snippet origin labels.
- `code.parser.ts` splits primarily at top-level declarations; class/object member functions remain in coarse chunks, which limits method-level recall for camelCase API queries.
- Existing relevant automated coverage is concentrated in the search, formatter, and parser unit tests; `/api/v1/context` contract coverage currently omits the context endpoint entirely.
- Risks / follow-ups:
- Response-shape changes must be additive because `src/mcp/client.ts`, `src/mcp/tools/query-docs.ts`, and UI consumers expect the current top-level keys to remain present.
- Parser improvements should stay inside `parseCodeFile()` and existing chunking helpers to avoid turning this fix iteration into a schema or pipeline redesign.
### 2026-03-27 — FEEDBACK-0001 SQLite FTS5 syntax research
- Task: Verify the FTS5 query-grammar constraints that affect punctuation-heavy local search queries.
- Files inspected:
- `package.json`
- `src/lib/server/search/query-preprocessor.ts`
- `src/lib/server/search/search.service.ts`
- `src/lib/server/search/hybrid.search.service.ts`
- Findings:
- `better-sqlite3` is pinned at `^12.6.2` in `package.json`, and the application binds the `MATCH` string as a parameter instead of interpolating SQL directly.
- The canonical SQLite FTS5 docs state that barewords may contain letters, digits, underscore, non-ASCII characters, and the substitute character; strings containing other punctuation must be quoted or they become syntax errors in `MATCH` expressions.
- The same docs state that prefix search is expressed by placing `*` after the token or phrase, not inside quotes, which matches the current trailing-wildcard strategy in `query-preprocessor.ts`.
- SQLite documents that FTS5 is stricter than FTS3/4 about unrecognized punctuation in query strings, which confirms that code-like user input should be normalized before it reaches `snippets_fts MATCH ?`.
- Based on the current code path, the practical fix remains application-side sanitization and fallback behavior in `query-preprocessor.ts` and `hybrid.search.service.ts`, not SQL construction changes.
- Risks / follow-ups:
- Over-sanitizing punctuation-heavy inputs could erase useful identifiers, so the implementation should preserve searchable alphanumeric and underscore tokens while discarding grammar-breaking punctuation.
- Prefix expansion should remain on the final searchable token only so the fix preserves current query-cost expectations and test semantics.
### 2026-03-27 — LINT-0001 planning research
- Task: Plan the lint-fix iteration covering the reported ESLint and eslint-plugin-svelte violations across Svelte UI, SvelteKit routes, server modules, and Vitest suites.
- Files inspected:
- `package.json`
- `eslint.config.js`
- `docs/FINDINGS.md`
- `prompts/LINT-0001/prompt.yaml`
- `prompts/LINT-0001/progress.yaml`
- `src/lib/components/FolderPicker.svelte`
- `src/lib/components/RepositoryCard.svelte`
- `src/lib/components/search/SnippetCard.svelte`
- `src/lib/server/crawler/local.crawler.test.ts`
- `src/lib/server/embeddings/embedding.service.test.ts`
- `src/lib/server/embeddings/local.provider.ts`
- `src/lib/server/embeddings/provider.ts`
- `src/lib/server/embeddings/registry.ts`
- `src/lib/server/models/context-response.ts`
- `src/lib/server/parser/code.parser.ts`
- `src/lib/server/pipeline/indexing.pipeline.ts`
- `src/lib/server/search/hybrid.search.service.test.ts`
- `src/lib/server/search/query-preprocessor.ts`
- `src/lib/server/services/repository.service.test.ts`
- `src/lib/server/services/version.service.test.ts`
- `src/lib/server/services/version.service.ts`
- `src/routes/+layout.svelte`
- `src/routes/+page.svelte`
- `src/routes/api/v1/libs/search/+server.ts`
- `src/routes/api/v1/settings/embedding/+server.ts`
- `src/routes/repos/[id]/+page.svelte`
- `src/routes/search/+page.svelte`
- `src/routes/settings/+page.svelte`
- Findings:
- The project lint stack is ESLint `^9.39.2` with `typescript-eslint` recommended rules and `eslint-plugin-svelte` recommended plus SvelteKit-aware rules, running over Svelte `^5.51.0` and SvelteKit `^2.50.2`.
- Context7 documentation for `eslint-plugin-svelte` confirms `svelte/no-navigation-without-base` flags root-relative `<a href="/...">` links and `goto('/...')` calls in SvelteKit projects; compliant fixes must use `$app/paths` base-aware links or base-prefixed `goto` calls.
- Context7 documentation for Svelte 5 confirms event handlers are regular element properties such as `onclick`, while side effects belong in `$effect`; repo memory also records that client-only fetch bootstrap should not be moved indiscriminately into `$effect` when `onMount` or load is the correct lifecycle boundary.
- Concrete navigation violations already exist in `src/routes/+layout.svelte`, `src/routes/repos/[id]/+page.svelte`, `src/routes/search/+page.svelte`, and `src/lib/components/RepositoryCard.svelte`, each using hard-coded root-relative internal navigation.
- Static diagnostics currently expose at least one direct TypeScript lint error in `src/lib/server/embeddings/registry.ts`, where `_config` is defined but never used.
- `src/routes/api/v1/libs/search/+server.ts` imports `json` from `@sveltejs/kit` without using it, making that endpoint a concrete unused-import cleanup target.
- `src/lib/server/services/version.service.ts` still uses CommonJS `require(...)` to reach git utilities from TypeScript, which is inconsistent with the repository's ESM style and is a likely lint target under the current ESLint stack.
- The affected Svelte pages and settings UI already use Svelte 5 event-property syntax, so the lint work should preserve that syntax and focus on base-aware navigation, lifecycle correctness, and unused-symbol cleanup rather than regressing to legacy `on:` directives.
- Existing automated coverage for the lint-touching backend areas already lives in `src/lib/server/crawler/local.crawler.test.ts`, `src/lib/server/embeddings/embedding.service.test.ts`, `src/lib/server/search/hybrid.search.service.test.ts`, `src/lib/server/services/repository.service.test.ts`, and `src/lib/server/services/version.service.test.ts`; route and component changes rely on build and lint validation rather than dedicated browser tests in this iteration.
- Risks / follow-ups:
- Base-aware navigation fixes must preserve internal app routing semantics and should not replace intentional external navigation, because SvelteKit `goto(...)` no longer accepts external URLs.
- Settings and search page lifecycle changes must avoid reintroducing SSR-triggered fetches or self-triggered URL loops; client-only bootstrap logic should remain mounted once and URL-sync effects must stay idempotent.