Files
trueref/docs/FINDINGS.md
2026-03-27 02:23:01 +01:00

8.5 KiB

Findings

Last Updated: 2026-03-27T00:24:13.000Z

Initializer Summary

  • JIRA: FEEDBACK-0001
  • Refresh mode: REFRESH_IF_REQUIRED
  • Result: refreshed affected documentation only. ARCHITECTURE.md and FINDINGS.md were updated from current repository analysis; CODE_STYLE.md remained trusted and unchanged because the documented conventions still match the codebase.

Research Performed

  • Discovered source-language distribution, dependency manifest, import patterns, and project structure.
  • Read the retrieval, formatter, token-budget, parser, mapper, and response-model modules affected by the latest implementation changes.
  • Compared the trusted cache state with current behavior to identify which documentation files were actually stale.
  • Confirmed package scripts for build and test.
  • Confirmed Linux-native md5sum availability for documentation trust metadata.

Open Questions For Planner

  • Verify whether the retrieval response contract should document the new repository and version metadata fields formally in a public API reference beyond the architecture summary.
  • Verify whether parser chunking should evolve further from file-level and declaration-level boundaries to member-level semantic chunks for class-heavy codebases.

Planner Notes Template

Add subsequent research below this section.

Entry Template

  • Date:
  • Task:
  • Files inspected:
  • Findings:
  • Risks / follow-ups:

2026-03-27 — FEEDBACK-0001 initializer refresh audit

  • Task: Refresh only stale documentation after changes to retrieval, formatters, token budgeting, and parser behavior.
  • Files inspected:
    • docs/docs_cache_state.yaml
    • docs/ARCHITECTURE.md
    • docs/CODE_STYLE.md
    • docs/FINDINGS.md
    • package.json
    • src/routes/api/v1/context/+server.ts
    • src/lib/server/api/formatters.ts
    • src/lib/server/api/token-budget.ts
    • src/lib/server/search/query-preprocessor.ts
    • src/lib/server/search/search.service.ts
    • src/lib/server/search/hybrid.search.service.ts
    • src/lib/server/mappers/context-response.mapper.ts
    • src/lib/server/models/context-response.ts
    • src/lib/server/models/search-result.ts
    • src/lib/server/parser/index.ts
    • src/lib/server/parser/code.parser.ts
    • src/lib/server/parser/markdown.parser.ts
  • Findings:
    • The documentation cache was trusted, but the architecture summary no longer captured current retrieval behavior: query preprocessing now sanitizes punctuation-heavy input for FTS5, semantic mode can bypass FTS entirely, and auto or hybrid retrieval can fall back to vector search when keyword search returns no candidates.
    • Plain-text and JSON context formatting now carry repository and version metadata, and the text formatter emits an explicit no-results section instead of an empty body.
    • Token budgeting now skips individual over-budget snippets and continues evaluating lower-ranked candidates, which changes the response-selection behavior described at the architecture level.
    • Parser coverage now explicitly includes Markdown, code, config, HTML-like, and plain-text inputs, so the architecture summary needed to reflect that broader file-type handling.
    • The conventions documented in CODE_STYLE.md still match the current repository: strict TypeScript, tab indentation, ESM imports, Prettier and ESLint flat config, and pragmatic service-oriented server modules.
  • Risks / follow-ups:
    • Future cache invalidation should continue to distinguish between behavioral changes that affect architecture docs and localized implementation changes that do not affect the style guide.
    • If the public API contract becomes externally versioned, the new context metadata fields likely deserve a dedicated API document instead of only architecture-level coverage.

2026-03-27 — FEEDBACK-0001 planning research

  • Task: Plan the retrieval-fix iteration covering FTS query safety, hybrid fallback, empty-result behavior, result metadata, token budgeting, and parser chunking.
  • Files inspected:
    • package.json
    • src/routes/api/v1/context/+server.ts
    • src/lib/server/search/query-preprocessor.ts
    • src/lib/server/search/search.service.ts
    • src/lib/server/search/hybrid.search.service.ts
    • src/lib/server/search/vector.search.ts
    • src/lib/server/api/token-budget.ts
    • src/lib/server/api/formatters.ts
    • src/lib/server/mappers/context-response.mapper.ts
    • src/lib/server/models/context-response.ts
    • src/lib/server/models/search-result.ts
    • src/lib/server/parser/code.parser.ts
    • src/lib/server/search/search.service.test.ts
    • src/lib/server/search/hybrid.search.service.test.ts
    • src/lib/server/api/formatters.test.ts
    • src/lib/server/parser/code.parser.test.ts
    • src/routes/api/v1/api-contract.integration.test.ts
    • src/mcp/tools/query-docs.ts
    • src/mcp/client.ts
  • Findings:
    • better-sqlite3 ^12.6.2 backs the affected search path; the code already uses bound parameters for MATCH, so the practical fix belongs in query normalization and fallback handling rather than SQL string construction.
    • query-preprocessor.ts only strips parentheses and appends a trailing wildcard. Other code-like punctuation currently reaches the FTS execution path unsanitized.
    • search.service.ts sends the preprocessed text directly to snippets_fts MATCH ? and already returns [] for blank processed queries.
    • hybrid.search.service.ts always executes keyword search before semantic branching. In the current flow, an FTS parse failure can abort auto, hybrid, and semantic requests before vector retrieval runs.
    • vector.search.ts already preserves repositoryId, versionId, and profileId filtering and does not need architectural changes for this iteration.
    • token-budget.ts stops at the first over-budget snippet instead of skipping that item and continuing through later ranked results.
    • formatContextTxt([], []) returns an empty string, so /api/v1/context?type=txt can emit an empty 200 OK body today.
    • context-response.mapper.ts and context-response.ts expose snippet content and breadcrumb/page title but do not identify local TrueRef origin, repository source metadata, or normalized snippet origin labels.
    • code.parser.ts splits primarily at top-level declarations; class/object member functions remain in coarse chunks, which limits method-level recall for camelCase API queries.
    • Existing relevant automated coverage is concentrated in the search, formatter, and parser unit tests; /api/v1/context contract coverage currently omits the context endpoint entirely.
  • Risks / follow-ups:
    • Response-shape changes must be additive because src/mcp/client.ts, src/mcp/tools/query-docs.ts, and UI consumers expect the current top-level keys to remain present.
    • Parser improvements should stay inside parseCodeFile() and existing chunking helpers to avoid turning this fix iteration into a schema or pipeline redesign.

2026-03-27 — FEEDBACK-0001 SQLite FTS5 syntax research

  • Task: Verify the FTS5 query-grammar constraints that affect punctuation-heavy local search queries.
  • Files inspected:
    • package.json
    • src/lib/server/search/query-preprocessor.ts
    • src/lib/server/search/search.service.ts
    • src/lib/server/search/hybrid.search.service.ts
  • Findings:
    • better-sqlite3 is pinned at ^12.6.2 in package.json, and the application binds the MATCH string as a parameter instead of interpolating SQL directly.
    • The canonical SQLite FTS5 docs state that barewords may contain letters, digits, underscore, non-ASCII characters, and the substitute character; strings containing other punctuation must be quoted or they become syntax errors in MATCH expressions.
    • The same docs state that prefix search is expressed by placing * after the token or phrase, not inside quotes, which matches the current trailing-wildcard strategy in query-preprocessor.ts.
    • SQLite documents that FTS5 is stricter than FTS3/4 about unrecognized punctuation in query strings, which confirms that code-like user input should be normalized before it reaches snippets_fts MATCH ?.
    • Based on the current code path, the practical fix remains application-side sanitization and fallback behavior in query-preprocessor.ts and hybrid.search.service.ts, not SQL construction changes.
  • Risks / follow-ups:
    • Over-sanitizing punctuation-heavy inputs could erase useful identifiers, so the implementation should preserve searchable alphanumeric and underscore tokens while discarding grammar-breaking punctuation.
    • Prefix expansion should remain on the final searchable token only so the fix preserves current query-cost expectations and test semantics.