chore(FEEDBACK-0001): linting

This commit is contained in:
Giancarmine Salucci
2026-03-27 02:23:01 +01:00
parent 16436bfab2
commit 5a3c27224d
102 changed files with 5108 additions and 4976 deletions

View File

@@ -38,13 +38,13 @@ TrueRef is under active development. The current codebase already includes:
TrueRef is organized into four main layers: TrueRef is organized into four main layers:
1. Web UI 1. Web UI
SvelteKit application for adding repositories, monitoring indexing, searching content, and configuring embeddings. SvelteKit application for adding repositories, monitoring indexing, searching content, and configuring embeddings.
2. REST API 2. REST API
Endpoints under `/api/v1/*` for repository management, search, schema discovery, job status, and settings. Endpoints under `/api/v1/*` for repository management, search, schema discovery, job status, and settings.
3. Indexing pipeline 3. Indexing pipeline
Crawlers, parsers, chunking logic, snippet storage, and optional embedding generation. Crawlers, parsers, chunking logic, snippet storage, and optional embedding generation.
4. MCP server 4. MCP server
A thin compatibility layer that forwards `resolve-library-id` and `query-docs` requests to the TrueRef REST API. A thin compatibility layer that forwards `resolve-library-id` and `query-docs` requests to the TrueRef REST API.
At runtime, the app uses SQLite via `better-sqlite3` and Drizzle, plus optional embedding providers for semantic retrieval. At runtime, the app uses SQLite via `better-sqlite3` and Drizzle, plus optional embedding providers for semantic retrieval.
@@ -367,9 +367,9 @@ The tool names and argument shapes intentionally mirror context7 so existing wor
The MCP server uses: The MCP server uses:
- `TRUEREF_API_URL` - `TRUEREF_API_URL`
Base URL of the TrueRef web app. Default: `http://localhost:5173` Base URL of the TrueRef web app. Default: `http://localhost:5173`
- `PORT` - `PORT`
Used only for HTTP transport. Default: `3001` Used only for HTTP transport. Default: `3001`
### Start MCP over stdio ### Start MCP over stdio
@@ -602,6 +602,7 @@ alwaysApply: true
--- ---
When answering questions about indexed libraries, always use the TrueRef MCP tools: When answering questions about indexed libraries, always use the TrueRef MCP tools:
1. Call `resolve-library-id` with the library name and the user's question to get the library ID. 1. Call `resolve-library-id` with the library name and the user's question to get the library ID.
2. Call `query-docs` with the library ID and question to retrieve relevant documentation. 2. Call `query-docs` with the library ID and question to retrieve relevant documentation.
3. Use the returned documentation to answer accurately. 3. Use the returned documentation to answer accurately.
@@ -614,9 +615,9 @@ Never rely on training data alone for library APIs that may have changed.
Whether you are using VS Code, IntelliJ, or Claude Code, the expected retrieval flow is: Whether you are using VS Code, IntelliJ, or Claude Code, the expected retrieval flow is:
1. `resolve-library-id` 1. `resolve-library-id`
Find the correct repository or version identifier. Find the correct repository or version identifier.
2. `query-docs` 2. `query-docs`
Retrieve the actual documentation and code snippets for the user question. Retrieve the actual documentation and code snippets for the user question.
Example: Example:
@@ -638,10 +639,10 @@ docker compose up --build
This builds the image and starts two services: This builds the image and starts two services:
| Service | Default port | Purpose | | Service | Default port | Purpose |
|---------|-------------|---------| | ------- | ------------ | ----------------------------- |
| `web` | `3000` | SvelteKit web UI and REST API | | `web` | `3000` | SvelteKit web UI and REST API |
| `mcp` | `3001` | MCP HTTP server | | `mcp` | `3001` | MCP HTTP server |
The SQLite database is stored in a named Docker volume (`trueref-data`) and persists across restarts. The SQLite database is stored in a named Docker volume (`trueref-data`) and persists across restarts.
@@ -687,10 +688,10 @@ services:
- ${USERPROFILE:-$HOME}/.gitconfig:/root/.gitconfig:ro - ${USERPROFILE:-$HOME}/.gitconfig:/root/.gitconfig:ro
- ${CORP_CA_CERT}:/certs/corp-ca.crt:ro - ${CORP_CA_CERT}:/certs/corp-ca.crt:ro
environment: environment:
BITBUCKET_HOST: "${BITBUCKET_HOST}" BITBUCKET_HOST: '${BITBUCKET_HOST}'
GITLAB_HOST: "${GITLAB_HOST}" GITLAB_HOST: '${GITLAB_HOST}'
GIT_TOKEN_BITBUCKET: "${GIT_TOKEN_BITBUCKET}" GIT_TOKEN_BITBUCKET: '${GIT_TOKEN_BITBUCKET}'
GIT_TOKEN_GITLAB: "${GIT_TOKEN_GITLAB}" GIT_TOKEN_GITLAB: '${GIT_TOKEN_GITLAB}'
``` ```
5. **Start the services**: 5. **Start the services**:
@@ -708,6 +709,7 @@ The Docker entrypoint script (`docker-entrypoint.sh`) runs these steps in order:
3. **Configure git credentials**: Sets up per-host credential helpers that provide the correct username and token for each remote. 3. **Configure git credentials**: Sets up per-host credential helpers that provide the correct username and token for each remote.
This setup works for: This setup works for:
- HTTPS cloning with personal access tokens - HTTPS cloning with personal access tokens
- SSH cloning with mounted SSH keys - SSH cloning with mounted SSH keys
- On-premise servers with custom CA certificates - On-premise servers with custom CA certificates
@@ -718,6 +720,7 @@ This setup works for:
For long-lived deployments, SSH authentication is recommended: For long-lived deployments, SSH authentication is recommended:
1. Generate an SSH key pair if you don't have one: 1. Generate an SSH key pair if you don't have one:
```sh ```sh
ssh-keygen -t ed25519 -C "trueref@your-company.com" ssh-keygen -t ed25519 -C "trueref@your-company.com"
``` ```
@@ -727,6 +730,7 @@ For long-lived deployments, SSH authentication is recommended:
- GitLab: User Settings → SSH Keys - GitLab: User Settings → SSH Keys
3. Ensure your `~/.ssh/config` has the correct host entries: 3. Ensure your `~/.ssh/config` has the correct host entries:
``` ```
Host bitbucket.corp.example.com Host bitbucket.corp.example.com
IdentityFile ~/.ssh/id_ed25519 IdentityFile ~/.ssh/id_ed25519
@@ -737,13 +741,13 @@ For long-lived deployments, SSH authentication is recommended:
### Environment variables ### Environment variables
| Variable | Default | Description | | Variable | Default | Description |
|----------|---------|-------------| | ----------------- | ----------------------- | -------------------------------------------------- |
| `DATABASE_URL` | `/data/trueref.db` | Path to the SQLite database inside the container | | `DATABASE_URL` | `/data/trueref.db` | Path to the SQLite database inside the container |
| `PORT` | `3000` | Port the web app listens on | | `PORT` | `3000` | Port the web app listens on |
| `HOST` | `0.0.0.0` | Bind address for the web app | | `HOST` | `0.0.0.0` | Bind address for the web app |
| `TRUEREF_API_URL` | `http://localhost:3000` | Base URL the MCP server uses to reach the REST API | | `TRUEREF_API_URL` | `http://localhost:3000` | Base URL the MCP server uses to reach the REST API |
| `MCP_PORT` | `3001` | Port the MCP HTTP server listens on | | `MCP_PORT` | `3001` | Port the MCP HTTP server listens on |
Override them in `docker-compose.yml` or pass them with `-e` flags. Override them in `docker-compose.yml` or pass them with `-e` flags.
@@ -770,12 +774,12 @@ Once both containers are running, point VS Code at the MCP HTTP endpoint:
```json ```json
{ {
"servers": { "servers": {
"trueref": { "trueref": {
"type": "http", "type": "http",
"url": "http://localhost:3001/mcp" "url": "http://localhost:3001/mcp"
} }
} }
} }
``` ```
@@ -783,12 +787,12 @@ Once both containers are running, point VS Code at the MCP HTTP endpoint:
```json ```json
{ {
"mcpServers": { "mcpServers": {
"trueref": { "trueref": {
"type": "http", "type": "http",
"url": "http://localhost:3001/mcp" "url": "http://localhost:3001/mcp"
} }
} }
} }
``` ```
@@ -806,10 +810,10 @@ Verify the connection inside Claude Code:
### Health checks ### Health checks
| Endpoint | Expected response | | Endpoint | Expected response |
|----------|------------------| | ----------------------------------- | ------------------------------- |
| `http://localhost:3000/api/v1/libs` | JSON array of indexed libraries | | `http://localhost:3000/api/v1/libs` | JSON array of indexed libraries |
| `http://localhost:3001/ping` | `{"ok":true}` | | `http://localhost:3001/ping` | `{"ok":true}` |
### Mounting a local repository ### Mounting a local repository

View File

@@ -2,7 +2,7 @@ services:
web: web:
build: . build: .
ports: ports:
- "3000:3000" - '3000:3000'
volumes: volumes:
- trueref-data:/data - trueref-data:/data
# Corporate deployment support (TRUEREF-0019) # Corporate deployment support (TRUEREF-0019)
@@ -24,10 +24,10 @@ services:
build: . build: .
command: mcp command: mcp
ports: ports:
- "3001:3001" - '3001:3001'
environment: environment:
TRUEREF_API_URL: http://web:3000 TRUEREF_API_URL: http://web:3000
MCP_PORT: "3001" MCP_PORT: '3001'
depends_on: depends_on:
- web - web
restart: unless-stopped restart: unless-stopped

View File

@@ -37,85 +37,85 @@ Add subsequent research below this section.
- Task: Refresh only stale documentation after changes to retrieval, formatters, token budgeting, and parser behavior. - Task: Refresh only stale documentation after changes to retrieval, formatters, token budgeting, and parser behavior.
- Files inspected: - Files inspected:
- `docs/docs_cache_state.yaml` - `docs/docs_cache_state.yaml`
- `docs/ARCHITECTURE.md` - `docs/ARCHITECTURE.md`
- `docs/CODE_STYLE.md` - `docs/CODE_STYLE.md`
- `docs/FINDINGS.md` - `docs/FINDINGS.md`
- `package.json` - `package.json`
- `src/routes/api/v1/context/+server.ts` - `src/routes/api/v1/context/+server.ts`
- `src/lib/server/api/formatters.ts` - `src/lib/server/api/formatters.ts`
- `src/lib/server/api/token-budget.ts` - `src/lib/server/api/token-budget.ts`
- `src/lib/server/search/query-preprocessor.ts` - `src/lib/server/search/query-preprocessor.ts`
- `src/lib/server/search/search.service.ts` - `src/lib/server/search/search.service.ts`
- `src/lib/server/search/hybrid.search.service.ts` - `src/lib/server/search/hybrid.search.service.ts`
- `src/lib/server/mappers/context-response.mapper.ts` - `src/lib/server/mappers/context-response.mapper.ts`
- `src/lib/server/models/context-response.ts` - `src/lib/server/models/context-response.ts`
- `src/lib/server/models/search-result.ts` - `src/lib/server/models/search-result.ts`
- `src/lib/server/parser/index.ts` - `src/lib/server/parser/index.ts`
- `src/lib/server/parser/code.parser.ts` - `src/lib/server/parser/code.parser.ts`
- `src/lib/server/parser/markdown.parser.ts` - `src/lib/server/parser/markdown.parser.ts`
- Findings: - Findings:
- The documentation cache was trusted, but the architecture summary no longer captured current retrieval behavior: query preprocessing now sanitizes punctuation-heavy input for FTS5, semantic mode can bypass FTS entirely, and auto or hybrid retrieval can fall back to vector search when keyword search returns no candidates. - The documentation cache was trusted, but the architecture summary no longer captured current retrieval behavior: query preprocessing now sanitizes punctuation-heavy input for FTS5, semantic mode can bypass FTS entirely, and auto or hybrid retrieval can fall back to vector search when keyword search returns no candidates.
- Plain-text and JSON context formatting now carry repository and version metadata, and the text formatter emits an explicit no-results section instead of an empty body. - Plain-text and JSON context formatting now carry repository and version metadata, and the text formatter emits an explicit no-results section instead of an empty body.
- Token budgeting now skips individual over-budget snippets and continues evaluating lower-ranked candidates, which changes the response-selection behavior described at the architecture level. - Token budgeting now skips individual over-budget snippets and continues evaluating lower-ranked candidates, which changes the response-selection behavior described at the architecture level.
- Parser coverage now explicitly includes Markdown, code, config, HTML-like, and plain-text inputs, so the architecture summary needed to reflect that broader file-type handling. - Parser coverage now explicitly includes Markdown, code, config, HTML-like, and plain-text inputs, so the architecture summary needed to reflect that broader file-type handling.
- The conventions documented in CODE_STYLE.md still match the current repository: strict TypeScript, tab indentation, ESM imports, Prettier and ESLint flat config, and pragmatic service-oriented server modules. - The conventions documented in CODE_STYLE.md still match the current repository: strict TypeScript, tab indentation, ESM imports, Prettier and ESLint flat config, and pragmatic service-oriented server modules.
- Risks / follow-ups: - Risks / follow-ups:
- Future cache invalidation should continue to distinguish between behavioral changes that affect architecture docs and localized implementation changes that do not affect the style guide. - Future cache invalidation should continue to distinguish between behavioral changes that affect architecture docs and localized implementation changes that do not affect the style guide.
- If the public API contract becomes externally versioned, the new context metadata fields likely deserve a dedicated API document instead of only architecture-level coverage. - If the public API contract becomes externally versioned, the new context metadata fields likely deserve a dedicated API document instead of only architecture-level coverage.
### 2026-03-27 — FEEDBACK-0001 planning research ### 2026-03-27 — FEEDBACK-0001 planning research
- Task: Plan the retrieval-fix iteration covering FTS query safety, hybrid fallback, empty-result behavior, result metadata, token budgeting, and parser chunking. - Task: Plan the retrieval-fix iteration covering FTS query safety, hybrid fallback, empty-result behavior, result metadata, token budgeting, and parser chunking.
- Files inspected: - Files inspected:
- `package.json` - `package.json`
- `src/routes/api/v1/context/+server.ts` - `src/routes/api/v1/context/+server.ts`
- `src/lib/server/search/query-preprocessor.ts` - `src/lib/server/search/query-preprocessor.ts`
- `src/lib/server/search/search.service.ts` - `src/lib/server/search/search.service.ts`
- `src/lib/server/search/hybrid.search.service.ts` - `src/lib/server/search/hybrid.search.service.ts`
- `src/lib/server/search/vector.search.ts` - `src/lib/server/search/vector.search.ts`
- `src/lib/server/api/token-budget.ts` - `src/lib/server/api/token-budget.ts`
- `src/lib/server/api/formatters.ts` - `src/lib/server/api/formatters.ts`
- `src/lib/server/mappers/context-response.mapper.ts` - `src/lib/server/mappers/context-response.mapper.ts`
- `src/lib/server/models/context-response.ts` - `src/lib/server/models/context-response.ts`
- `src/lib/server/models/search-result.ts` - `src/lib/server/models/search-result.ts`
- `src/lib/server/parser/code.parser.ts` - `src/lib/server/parser/code.parser.ts`
- `src/lib/server/search/search.service.test.ts` - `src/lib/server/search/search.service.test.ts`
- `src/lib/server/search/hybrid.search.service.test.ts` - `src/lib/server/search/hybrid.search.service.test.ts`
- `src/lib/server/api/formatters.test.ts` - `src/lib/server/api/formatters.test.ts`
- `src/lib/server/parser/code.parser.test.ts` - `src/lib/server/parser/code.parser.test.ts`
- `src/routes/api/v1/api-contract.integration.test.ts` - `src/routes/api/v1/api-contract.integration.test.ts`
- `src/mcp/tools/query-docs.ts` - `src/mcp/tools/query-docs.ts`
- `src/mcp/client.ts` - `src/mcp/client.ts`
- Findings: - Findings:
- `better-sqlite3` `^12.6.2` backs the affected search path; the code already uses bound parameters for `MATCH`, so the practical fix belongs in query normalization and fallback handling rather than SQL string construction. - `better-sqlite3` `^12.6.2` backs the affected search path; the code already uses bound parameters for `MATCH`, so the practical fix belongs in query normalization and fallback handling rather than SQL string construction.
- `query-preprocessor.ts` only strips parentheses and appends a trailing wildcard. Other code-like punctuation currently reaches the FTS execution path unsanitized. - `query-preprocessor.ts` only strips parentheses and appends a trailing wildcard. Other code-like punctuation currently reaches the FTS execution path unsanitized.
- `search.service.ts` sends the preprocessed text directly to `snippets_fts MATCH ?` and already returns `[]` for blank processed queries. - `search.service.ts` sends the preprocessed text directly to `snippets_fts MATCH ?` and already returns `[]` for blank processed queries.
- `hybrid.search.service.ts` always executes keyword search before semantic branching. In the current flow, an FTS parse failure can abort `auto`, `hybrid`, and `semantic` requests before vector retrieval runs. - `hybrid.search.service.ts` always executes keyword search before semantic branching. In the current flow, an FTS parse failure can abort `auto`, `hybrid`, and `semantic` requests before vector retrieval runs.
- `vector.search.ts` already preserves `repositoryId`, `versionId`, and `profileId` filtering and does not need architectural changes for this iteration. - `vector.search.ts` already preserves `repositoryId`, `versionId`, and `profileId` filtering and does not need architectural changes for this iteration.
- `token-budget.ts` stops at the first over-budget snippet instead of skipping that item and continuing through later ranked results. - `token-budget.ts` stops at the first over-budget snippet instead of skipping that item and continuing through later ranked results.
- `formatContextTxt([], [])` returns an empty string, so `/api/v1/context?type=txt` can emit an empty `200 OK` body today. - `formatContextTxt([], [])` returns an empty string, so `/api/v1/context?type=txt` can emit an empty `200 OK` body today.
- `context-response.mapper.ts` and `context-response.ts` expose snippet content and breadcrumb/page title but do not identify local TrueRef origin, repository source metadata, or normalized snippet origin labels. - `context-response.mapper.ts` and `context-response.ts` expose snippet content and breadcrumb/page title but do not identify local TrueRef origin, repository source metadata, or normalized snippet origin labels.
- `code.parser.ts` splits primarily at top-level declarations; class/object member functions remain in coarse chunks, which limits method-level recall for camelCase API queries. - `code.parser.ts` splits primarily at top-level declarations; class/object member functions remain in coarse chunks, which limits method-level recall for camelCase API queries.
- Existing relevant automated coverage is concentrated in the search, formatter, and parser unit tests; `/api/v1/context` contract coverage currently omits the context endpoint entirely. - Existing relevant automated coverage is concentrated in the search, formatter, and parser unit tests; `/api/v1/context` contract coverage currently omits the context endpoint entirely.
- Risks / follow-ups: - Risks / follow-ups:
- Response-shape changes must be additive because `src/mcp/client.ts`, `src/mcp/tools/query-docs.ts`, and UI consumers expect the current top-level keys to remain present. - Response-shape changes must be additive because `src/mcp/client.ts`, `src/mcp/tools/query-docs.ts`, and UI consumers expect the current top-level keys to remain present.
- Parser improvements should stay inside `parseCodeFile()` and existing chunking helpers to avoid turning this fix iteration into a schema or pipeline redesign. - Parser improvements should stay inside `parseCodeFile()` and existing chunking helpers to avoid turning this fix iteration into a schema or pipeline redesign.
### 2026-03-27 — FEEDBACK-0001 SQLite FTS5 syntax research ### 2026-03-27 — FEEDBACK-0001 SQLite FTS5 syntax research
- Task: Verify the FTS5 query-grammar constraints that affect punctuation-heavy local search queries. - Task: Verify the FTS5 query-grammar constraints that affect punctuation-heavy local search queries.
- Files inspected: - Files inspected:
- `package.json` - `package.json`
- `src/lib/server/search/query-preprocessor.ts` - `src/lib/server/search/query-preprocessor.ts`
- `src/lib/server/search/search.service.ts` - `src/lib/server/search/search.service.ts`
- `src/lib/server/search/hybrid.search.service.ts` - `src/lib/server/search/hybrid.search.service.ts`
- Findings: - Findings:
- `better-sqlite3` is pinned at `^12.6.2` in `package.json`, and the application binds the `MATCH` string as a parameter instead of interpolating SQL directly. - `better-sqlite3` is pinned at `^12.6.2` in `package.json`, and the application binds the `MATCH` string as a parameter instead of interpolating SQL directly.
- The canonical SQLite FTS5 docs state that barewords may contain letters, digits, underscore, non-ASCII characters, and the substitute character; strings containing other punctuation must be quoted or they become syntax errors in `MATCH` expressions. - The canonical SQLite FTS5 docs state that barewords may contain letters, digits, underscore, non-ASCII characters, and the substitute character; strings containing other punctuation must be quoted or they become syntax errors in `MATCH` expressions.
- The same docs state that prefix search is expressed by placing `*` after the token or phrase, not inside quotes, which matches the current trailing-wildcard strategy in `query-preprocessor.ts`. - The same docs state that prefix search is expressed by placing `*` after the token or phrase, not inside quotes, which matches the current trailing-wildcard strategy in `query-preprocessor.ts`.
- SQLite documents that FTS5 is stricter than FTS3/4 about unrecognized punctuation in query strings, which confirms that code-like user input should be normalized before it reaches `snippets_fts MATCH ?`. - SQLite documents that FTS5 is stricter than FTS3/4 about unrecognized punctuation in query strings, which confirms that code-like user input should be normalized before it reaches `snippets_fts MATCH ?`.
- Based on the current code path, the practical fix remains application-side sanitization and fallback behavior in `query-preprocessor.ts` and `hybrid.search.service.ts`, not SQL construction changes. - Based on the current code path, the practical fix remains application-side sanitization and fallback behavior in `query-preprocessor.ts` and `hybrid.search.service.ts`, not SQL construction changes.
- Risks / follow-ups: - Risks / follow-ups:
- Over-sanitizing punctuation-heavy inputs could erase useful identifiers, so the implementation should preserve searchable alphanumeric and underscore tokens while discarding grammar-breaking punctuation. - Over-sanitizing punctuation-heavy inputs could erase useful identifiers, so the implementation should preserve searchable alphanumeric and underscore tokens while discarding grammar-breaking punctuation.
- Prefix expansion should remain on the final searchable token only so the fix preserves current query-cost expectations and test semantics. - Prefix expansion should remain on the final searchable token only so the fix preserves current query-cost expectations and test semantics.

View File

@@ -17,6 +17,7 @@ The core use case is enabling AI coding assistants (Claude Code, Cursor, Zed, et
## 2. Problem Statement ## 2. Problem Statement
### 2.1 Context7's Limitations ### 2.1 Context7's Limitations
- The indexing and crawling backend is entirely private and closed-source. - The indexing and crawling backend is entirely private and closed-source.
- Only public libraries already in the context7.com catalog are available. - Only public libraries already in the context7.com catalog are available.
- Private, internal, or niche repositories cannot be added. - Private, internal, or niche repositories cannot be added.
@@ -24,6 +25,7 @@ The core use case is enabling AI coding assistants (Claude Code, Cursor, Zed, et
- No way to self-host for air-gapped or compliance-constrained environments. - No way to self-host for air-gapped or compliance-constrained environments.
### 2.2 The Gap ### 2.2 The Gap
Teams with internal SDKs, private libraries, proprietary documentation, or a need for data sovereignty have no tooling that provides context7-equivalent LLM documentation retrieval. Teams with internal SDKs, private libraries, proprietary documentation, or a need for data sovereignty have no tooling that provides context7-equivalent LLM documentation retrieval.
--- ---
@@ -31,6 +33,7 @@ Teams with internal SDKs, private libraries, proprietary documentation, or a nee
## 3. Goals & Non-Goals ## 3. Goals & Non-Goals
### Goals ### Goals
- Replicate all context7 capabilities: library search, documentation retrieval, MCP tools (`resolve-library-id`, `query-docs`). - Replicate all context7 capabilities: library search, documentation retrieval, MCP tools (`resolve-library-id`, `query-docs`).
- Support both GitHub-hosted and local filesystem repositories. - Support both GitHub-hosted and local filesystem repositories.
- Provide a full indexing pipeline: crawl → parse → chunk → embed → store → query. - Provide a full indexing pipeline: crawl → parse → chunk → embed → store → query.
@@ -42,6 +45,7 @@ Teams with internal SDKs, private libraries, proprietary documentation, or a nee
- Self-hostable with minimal dependencies (SQLite-first, no external vector DB required). - Self-hostable with minimal dependencies (SQLite-first, no external vector DB required).
### Non-Goals (v1) ### Non-Goals (v1)
- Authentication & authorization (deferred to a future version). - Authentication & authorization (deferred to a future version).
- Skill generation (context7 CLI skill feature). - Skill generation (context7 CLI skill feature).
- Multi-tenant SaaS mode. - Multi-tenant SaaS mode.
@@ -54,9 +58,11 @@ Teams with internal SDKs, private libraries, proprietary documentation, or a nee
## 4. Users & Personas ## 4. Users & Personas
### Primary: The Developer / Tech Lead ### Primary: The Developer / Tech Lead
Configures TrueRef, adds repositories, integrates the MCP server with their AI coding assistant. Technical, comfortable with CLI and config files. Configures TrueRef, adds repositories, integrates the MCP server with their AI coding assistant. Technical, comfortable with CLI and config files.
### Secondary: The AI Coding Assistant ### Secondary: The AI Coding Assistant
The "user" at query time. Calls `resolve-library-id` and `query-docs` via MCP to retrieve documentation snippets for code generation. The "user" at query time. Calls `resolve-library-id` and `query-docs` via MCP to retrieve documentation snippets for code generation.
--- ---
@@ -100,25 +106,27 @@ The "user" at query time. Calls `resolve-library-id` and `query-docs` via MCP to
``` ```
### Technology Stack ### Technology Stack
| Layer | Technology |
|-------|-----------| | Layer | Technology |
| Framework | SvelteKit (Node adapter) | | ---------------- | ------------------------------------------------------------------ |
| Language | TypeScript | | Framework | SvelteKit (Node adapter) |
| Database | SQLite via better-sqlite3 + drizzle-orm | | Language | TypeScript |
| Full-Text Search | SQLite FTS5 | | Database | SQLite via better-sqlite3 + drizzle-orm |
| Vector Search | SQLite `sqlite-vec` extension (cosine similarity) | | Full-Text Search | SQLite FTS5 |
| Embeddings | Pluggable: local (transformers.js / ONNX) or OpenAI-compatible API | | Vector Search | SQLite `sqlite-vec` extension (cosine similarity) |
| MCP Protocol | `@modelcontextprotocol/sdk` | | Embeddings | Pluggable: local (transformers.js / ONNX) or OpenAI-compatible API |
| HTTP | SvelteKit API routes + optional standalone MCP HTTP server | | MCP Protocol | `@modelcontextprotocol/sdk` |
| CSS | TailwindCSS v4 | | HTTP | SvelteKit API routes + optional standalone MCP HTTP server |
| Testing | Vitest | | CSS | TailwindCSS v4 |
| Linting | ESLint + Prettier | | Testing | Vitest |
| Linting | ESLint + Prettier |
--- ---
## 6. Data Model ## 6. Data Model
### 6.1 Repositories ### 6.1 Repositories
A `Repository` is the top-level entity. It maps to a GitHub repo or local directory. A `Repository` is the top-level entity. It maps to a GitHub repo or local directory.
``` ```
@@ -141,6 +149,7 @@ Repository {
``` ```
### 6.2 Repository Versions ### 6.2 Repository Versions
``` ```
RepositoryVersion { RepositoryVersion {
id TEXT PRIMARY KEY id TEXT PRIMARY KEY
@@ -153,6 +162,7 @@ RepositoryVersion {
``` ```
### 6.3 Documents (parsed files) ### 6.3 Documents (parsed files)
``` ```
Document { Document {
id TEXT PRIMARY KEY id TEXT PRIMARY KEY
@@ -169,6 +179,7 @@ Document {
``` ```
### 6.4 Snippets (indexed chunks) ### 6.4 Snippets (indexed chunks)
``` ```
Snippet { Snippet {
id TEXT PRIMARY KEY id TEXT PRIMARY KEY
@@ -186,6 +197,7 @@ Snippet {
``` ```
### 6.5 Indexing Jobs ### 6.5 Indexing Jobs
``` ```
IndexingJob { IndexingJob {
id TEXT PRIMARY KEY id TEXT PRIMARY KEY
@@ -203,6 +215,7 @@ IndexingJob {
``` ```
### 6.6 Repository Configuration (`trueref.json`) ### 6.6 Repository Configuration (`trueref.json`)
``` ```
RepositoryConfig { RepositoryConfig {
repositoryId TEXT FK → Repository repositoryId TEXT FK → Repository
@@ -221,15 +234,19 @@ RepositoryConfig {
## 7. Core Features ## 7. Core Features
### F1: Repository Management ### F1: Repository Management
Add, remove, update, and list repositories. Support GitHub (public/private via token) and local filesystem sources. Trigger indexing on demand or on schedule. Add, remove, update, and list repositories. Support GitHub (public/private via token) and local filesystem sources. Trigger indexing on demand or on schedule.
### F2: GitHub Crawler ### F2: GitHub Crawler
Fetch repository file trees via GitHub Trees API. Download file contents. Respect `trueref.json` include/exclude rules. Support rate limiting and incremental re-indexing (checksum-based). Fetch repository file trees via GitHub Trees API. Download file contents. Respect `trueref.json` include/exclude rules. Support rate limiting and incremental re-indexing (checksum-based).
### F3: Local Filesystem Crawler ### F3: Local Filesystem Crawler
Walk directory trees. Apply include/exclude rules from `trueref.json`. Watch for file changes (optional). Walk directory trees. Apply include/exclude rules from `trueref.json`. Watch for file changes (optional).
### F4: Document Parser & Chunker ### F4: Document Parser & Chunker
- Parse Markdown files into sections (heading-based splitting). - Parse Markdown files into sections (heading-based splitting).
- Extract code blocks from Markdown. - Extract code blocks from Markdown.
- Parse standalone code files into function/class-level chunks. - Parse standalone code files into function/class-level chunks.
@@ -237,16 +254,19 @@ Walk directory trees. Apply include/exclude rules from `trueref.json`. Watch for
- Produce structured `Snippet` records (type: "code" or "info"). - Produce structured `Snippet` records (type: "code" or "info").
### F5: Embedding & Vector Storage ### F5: Embedding & Vector Storage
- Generate embeddings for each snippet using a pluggable embeddings backend. - Generate embeddings for each snippet using a pluggable embeddings backend.
- Store embeddings as binary blobs in SQLite (sqlite-vec). - Store embeddings as binary blobs in SQLite (sqlite-vec).
- Support fallback to FTS5-only search when no embedding provider is configured. - Support fallback to FTS5-only search when no embedding provider is configured.
### F6: Semantic Search Engine ### F6: Semantic Search Engine
- Hybrid search: vector similarity + FTS5 keyword matching (BM25) with reciprocal rank fusion. - Hybrid search: vector similarity + FTS5 keyword matching (BM25) with reciprocal rank fusion.
- Query-time retrieval: given `libraryId + query`, return ranked snippets. - Query-time retrieval: given `libraryId + query`, return ranked snippets.
- Library search: given `libraryName + query`, return matching repositories. - Library search: given `libraryName + query`, return matching repositories.
### F7: REST API (`/api/v1/*`) ### F7: REST API (`/api/v1/*`)
- `GET /api/v1/libs/search?query=&libraryName=` — search libraries (context7-compatible) - `GET /api/v1/libs/search?query=&libraryName=` — search libraries (context7-compatible)
- `GET /api/v1/context?query=&libraryId=&type=json|txt` — fetch documentation - `GET /api/v1/context?query=&libraryId=&type=json|txt` — fetch documentation
- `GET /api/v1/libs` — list all indexed libraries - `GET /api/v1/libs` — list all indexed libraries
@@ -256,12 +276,14 @@ Walk directory trees. Apply include/exclude rules from `trueref.json`. Watch for
- `GET /api/v1/jobs/:id` — get indexing job status - `GET /api/v1/jobs/:id` — get indexing job status
### F8: MCP Server ### F8: MCP Server
- Tool: `resolve-library-id` — search for libraries by name - Tool: `resolve-library-id` — search for libraries by name
- Tool: `query-docs` — fetch documentation by libraryId + query - Tool: `query-docs` — fetch documentation by libraryId + query
- Transport: stdio (primary), HTTP (optional) - Transport: stdio (primary), HTTP (optional)
- Compatible with Claude Code, Cursor, and other MCP-aware tools - Compatible with Claude Code, Cursor, and other MCP-aware tools
### F9: Web UI — Repository Dashboard ### F9: Web UI — Repository Dashboard
- List all repositories with status, snippet count, last indexed date - List all repositories with status, snippet count, last indexed date
- Add/remove repositories (GitHub URL or local path) - Add/remove repositories (GitHub URL or local path)
- Trigger re-indexing - Trigger re-indexing
@@ -269,23 +291,27 @@ Walk directory trees. Apply include/exclude rules from `trueref.json`. Watch for
- View repository config (`trueref.json`) - View repository config (`trueref.json`)
### F10: Web UI — Search Explorer ### F10: Web UI — Search Explorer
- Interactive search interface (resolve library → query docs) - Interactive search interface (resolve library → query docs)
- Preview snippets with syntax highlighting - Preview snippets with syntax highlighting
- View raw document content - View raw document content
### F11: `trueref.json` Config Support ### F11: `trueref.json` Config Support
- Parse `trueref.json` from repo root (or `context7.json` for compatibility) - Parse `trueref.json` from repo root (or `context7.json` for compatibility)
- Apply `folders`, `excludeFolders`, `excludeFiles` during crawling - Apply `folders`, `excludeFolders`, `excludeFiles` during crawling
- Inject `rules` into LLM context alongside snippets - Inject `rules` into LLM context alongside snippets
- Support `previousVersions` for versioned documentation - Support `previousVersions` for versioned documentation
### F12: Indexing Pipeline & Job Queue ### F12: Indexing Pipeline & Job Queue
- SQLite-backed job queue (no external message broker required) - SQLite-backed job queue (no external message broker required)
- Sequential processing with progress tracking - Sequential processing with progress tracking
- Error recovery and retry logic - Error recovery and retry logic
- Incremental re-indexing using file checksums - Incremental re-indexing using file checksums
### F13: Version Support ### F13: Version Support
- Index specific git tags/branches per repository - Index specific git tags/branches per repository
- Serve version-specific context when libraryId includes version (`/owner/repo/v1.2.3`) - Serve version-specific context when libraryId includes version (`/owner/repo/v1.2.3`)
- UI for managing available versions - UI for managing available versions
@@ -296,12 +322,13 @@ Walk directory trees. Apply include/exclude rules from `trueref.json`. Watch for
TrueRef's REST API mirrors context7's `/api/v2/*` interface to allow drop-in compatibility: TrueRef's REST API mirrors context7's `/api/v2/*` interface to allow drop-in compatibility:
| context7 Endpoint | TrueRef Endpoint | Notes | | context7 Endpoint | TrueRef Endpoint | Notes |
|-------------------|-----------------|-------| | ------------------------- | ------------------------- | -------------------------------------- |
| `GET /api/v2/libs/search` | `GET /api/v1/libs/search` | Same query params | | `GET /api/v2/libs/search` | `GET /api/v1/libs/search` | Same query params |
| `GET /api/v2/context` | `GET /api/v1/context` | Same query params, same response shape | | `GET /api/v2/context` | `GET /api/v1/context` | Same query params, same response shape |
The MCP tool names and input schemas are identical: The MCP tool names and input schemas are identical:
- `resolve-library-id` with `libraryName` + `query` - `resolve-library-id` with `libraryName` + `query`
- `query-docs` with `libraryId` + `query` - `query-docs` with `libraryId` + `query`
@@ -312,20 +339,24 @@ Library IDs follow the same convention: `/owner/repo` or `/owner/repo/version`.
## 9. Non-Functional Requirements ## 9. Non-Functional Requirements
### Performance ### Performance
- Library search: < 200ms p99 - Library search: < 200ms p99
- Documentation retrieval: < 500ms p99 for 20 snippets - Documentation retrieval: < 500ms p99 for 20 snippets
- Indexing throughput: > 1,000 files/minute (GitHub API rate-limited) - Indexing throughput: > 1,000 files/minute (GitHub API rate-limited)
### Reliability ### Reliability
- Failed indexing jobs must not corrupt existing indexed data - Failed indexing jobs must not corrupt existing indexed data
- Atomic snippet replacement during re-indexing - Atomic snippet replacement during re-indexing
### Portability ### Portability
- Single SQLite file for all data - Single SQLite file for all data
- Runs on Linux, macOS, Windows (Node.js 20+) - Runs on Linux, macOS, Windows (Node.js 20+)
- No required external services beyond optional embedding API - No required external services beyond optional embedding API
### Scalability (v1 constraints) ### Scalability (v1 constraints)
- Designed for single-node deployment - Designed for single-node deployment
- SQLite suitable for up to ~500 repositories, ~500k snippets - SQLite suitable for up to ~500 repositories, ~500k snippets
@@ -333,26 +364,26 @@ Library IDs follow the same convention: `/owner/repo` or `/owner/repo/version`.
## 10. Milestones & Feature Order ## 10. Milestones & Feature Order
| ID | Feature | Priority | Depends On | | ID | Feature | Priority | Depends On |
|----|---------|----------|-----------| | ------------ | ---------------------------------------- | -------- | -------------------------- |
| TRUEREF-0001 | Database schema & core data models | P0 | — | | TRUEREF-0001 | Database schema & core data models | P0 | — |
| TRUEREF-0002 | Repository management service & REST API | P0 | TRUEREF-0001 | | TRUEREF-0002 | Repository management service & REST API | P0 | TRUEREF-0001 |
| TRUEREF-0003 | GitHub repository crawler | P0 | TRUEREF-0001 | | TRUEREF-0003 | GitHub repository crawler | P0 | TRUEREF-0001 |
| TRUEREF-0004 | Local filesystem crawler | P1 | TRUEREF-0001 | | TRUEREF-0004 | Local filesystem crawler | P1 | TRUEREF-0001 |
| TRUEREF-0005 | Document parser & chunker | P0 | TRUEREF-0001 | | TRUEREF-0005 | Document parser & chunker | P0 | TRUEREF-0001 |
| TRUEREF-0006 | SQLite FTS5 full-text search | P0 | TRUEREF-0005 | | TRUEREF-0006 | SQLite FTS5 full-text search | P0 | TRUEREF-0005 |
| TRUEREF-0007 | Embedding generation & vector storage | P1 | TRUEREF-0005 | | TRUEREF-0007 | Embedding generation & vector storage | P1 | TRUEREF-0005 |
| TRUEREF-0008 | Hybrid semantic search engine | P1 | TRUEREF-0006, TRUEREF-0007 | | TRUEREF-0008 | Hybrid semantic search engine | P1 | TRUEREF-0006, TRUEREF-0007 |
| TRUEREF-0009 | Indexing pipeline & job queue | P0 | TRUEREF-0003, TRUEREF-0005 | | TRUEREF-0009 | Indexing pipeline & job queue | P0 | TRUEREF-0003, TRUEREF-0005 |
| TRUEREF-0010 | REST API (search + context endpoints) | P0 | TRUEREF-0006, TRUEREF-0009 | | TRUEREF-0010 | REST API (search + context endpoints) | P0 | TRUEREF-0006, TRUEREF-0009 |
| TRUEREF-0011 | MCP server (stdio transport) | P0 | TRUEREF-0010 | | TRUEREF-0011 | MCP server (stdio transport) | P0 | TRUEREF-0010 |
| TRUEREF-0012 | MCP server (HTTP transport) | P1 | TRUEREF-0011 | | TRUEREF-0012 | MCP server (HTTP transport) | P1 | TRUEREF-0011 |
| TRUEREF-0013 | `trueref.json` config file support | P0 | TRUEREF-0003 | | TRUEREF-0013 | `trueref.json` config file support | P0 | TRUEREF-0003 |
| TRUEREF-0014 | Repository version management | P1 | TRUEREF-0003 | | TRUEREF-0014 | Repository version management | P1 | TRUEREF-0003 |
| TRUEREF-0015 | Web UI — repository dashboard | P1 | TRUEREF-0002, TRUEREF-0009 | | TRUEREF-0015 | Web UI — repository dashboard | P1 | TRUEREF-0002, TRUEREF-0009 |
| TRUEREF-0016 | Web UI — search explorer | P2 | TRUEREF-0010, TRUEREF-0015 | | TRUEREF-0016 | Web UI — search explorer | P2 | TRUEREF-0010, TRUEREF-0015 |
| TRUEREF-0017 | Incremental re-indexing (checksum diff) | P1 | TRUEREF-0009 | | TRUEREF-0017 | Incremental re-indexing (checksum diff) | P1 | TRUEREF-0009 |
| TRUEREF-0018 | Embedding provider configuration UI | P2 | TRUEREF-0007, TRUEREF-0015 | | TRUEREF-0018 | Embedding provider configuration UI | P2 | TRUEREF-0007, TRUEREF-0015 |
--- ---

View File

@@ -31,24 +31,26 @@ Represents an indexed library source (GitHub repo or local directory).
```typescript ```typescript
export const repositories = sqliteTable('repositories', { export const repositories = sqliteTable('repositories', {
id: text('id').primaryKey(), // e.g. "/facebook/react" or "/local/my-sdk" id: text('id').primaryKey(), // e.g. "/facebook/react" or "/local/my-sdk"
title: text('title').notNull(), title: text('title').notNull(),
description: text('description'), description: text('description'),
source: text('source', { enum: ['github', 'local'] }).notNull(), source: text('source', { enum: ['github', 'local'] }).notNull(),
sourceUrl: text('source_url').notNull(), // GitHub URL or absolute local path sourceUrl: text('source_url').notNull(), // GitHub URL or absolute local path
branch: text('branch').default('main'), branch: text('branch').default('main'),
state: text('state', { state: text('state', {
enum: ['pending', 'indexing', 'indexed', 'error'] enum: ['pending', 'indexing', 'indexed', 'error']
}).notNull().default('pending'), })
totalSnippets: integer('total_snippets').default(0), .notNull()
totalTokens: integer('total_tokens').default(0), .default('pending'),
trustScore: real('trust_score').default(0), // 0.010.0 totalSnippets: integer('total_snippets').default(0),
benchmarkScore: real('benchmark_score').default(0), // 0.0100.0 totalTokens: integer('total_tokens').default(0),
stars: integer('stars'), trustScore: real('trust_score').default(0), // 0.010.0
githubToken: text('github_token'), // encrypted PAT for private repos benchmarkScore: real('benchmark_score').default(0), // 0.0100.0
lastIndexedAt: integer('last_indexed_at', { mode: 'timestamp' }), stars: integer('stars'),
createdAt: integer('created_at', { mode: 'timestamp' }).notNull(), githubToken: text('github_token'), // encrypted PAT for private repos
updatedAt: integer('updated_at', { mode: 'timestamp' }).notNull(), lastIndexedAt: integer('last_indexed_at', { mode: 'timestamp' }),
createdAt: integer('created_at', { mode: 'timestamp' }).notNull(),
updatedAt: integer('updated_at', { mode: 'timestamp' }).notNull()
}); });
``` ```
@@ -58,17 +60,20 @@ Tracks indexed git tags/branches beyond the default branch.
```typescript ```typescript
export const repositoryVersions = sqliteTable('repository_versions', { export const repositoryVersions = sqliteTable('repository_versions', {
id: text('id').primaryKey(), // e.g. "/facebook/react/v18.3.0" id: text('id').primaryKey(), // e.g. "/facebook/react/v18.3.0"
repositoryId: text('repository_id').notNull() repositoryId: text('repository_id')
.references(() => repositories.id, { onDelete: 'cascade' }), .notNull()
tag: text('tag').notNull(), // git tag or branch name .references(() => repositories.id, { onDelete: 'cascade' }),
title: text('title'), tag: text('tag').notNull(), // git tag or branch name
state: text('state', { title: text('title'),
enum: ['pending', 'indexing', 'indexed', 'error'] state: text('state', {
}).notNull().default('pending'), enum: ['pending', 'indexing', 'indexed', 'error']
totalSnippets: integer('total_snippets').default(0), })
indexedAt: integer('indexed_at', { mode: 'timestamp' }), .notNull()
createdAt: integer('created_at', { mode: 'timestamp' }).notNull(), .default('pending'),
totalSnippets: integer('total_snippets').default(0),
indexedAt: integer('indexed_at', { mode: 'timestamp' }),
createdAt: integer('created_at', { mode: 'timestamp' }).notNull()
}); });
``` ```
@@ -78,17 +83,17 @@ A parsed source file within a repository.
```typescript ```typescript
export const documents = sqliteTable('documents', { export const documents = sqliteTable('documents', {
id: text('id').primaryKey(), // UUID id: text('id').primaryKey(), // UUID
repositoryId: text('repository_id').notNull() repositoryId: text('repository_id')
.references(() => repositories.id, { onDelete: 'cascade' }), .notNull()
versionId: text('version_id') .references(() => repositories.id, { onDelete: 'cascade' }),
.references(() => repositoryVersions.id, { onDelete: 'cascade' }), versionId: text('version_id').references(() => repositoryVersions.id, { onDelete: 'cascade' }),
filePath: text('file_path').notNull(), // relative path within repo filePath: text('file_path').notNull(), // relative path within repo
title: text('title'), title: text('title'),
language: text('language'), // e.g. "typescript", "markdown" language: text('language'), // e.g. "typescript", "markdown"
tokenCount: integer('token_count').default(0), tokenCount: integer('token_count').default(0),
checksum: text('checksum').notNull(), // SHA-256 of file content checksum: text('checksum').notNull(), // SHA-256 of file content
indexedAt: integer('indexed_at', { mode: 'timestamp' }).notNull(), indexedAt: integer('indexed_at', { mode: 'timestamp' }).notNull()
}); });
``` ```
@@ -98,20 +103,21 @@ An indexed chunk of content, the atomic unit of search.
```typescript ```typescript
export const snippets = sqliteTable('snippets', { export const snippets = sqliteTable('snippets', {
id: text('id').primaryKey(), // UUID id: text('id').primaryKey(), // UUID
documentId: text('document_id').notNull() documentId: text('document_id')
.references(() => documents.id, { onDelete: 'cascade' }), .notNull()
repositoryId: text('repository_id').notNull() .references(() => documents.id, { onDelete: 'cascade' }),
.references(() => repositories.id, { onDelete: 'cascade' }), repositoryId: text('repository_id')
versionId: text('version_id') .notNull()
.references(() => repositoryVersions.id, { onDelete: 'cascade' }), .references(() => repositories.id, { onDelete: 'cascade' }),
type: text('type', { enum: ['code', 'info'] }).notNull(), versionId: text('version_id').references(() => repositoryVersions.id, { onDelete: 'cascade' }),
title: text('title'), type: text('type', { enum: ['code', 'info'] }).notNull(),
content: text('content').notNull(), // searchable text / code title: text('title'),
language: text('language'), content: text('content').notNull(), // searchable text / code
breadcrumb: text('breadcrumb'), // e.g. "Installation > Getting Started" language: text('language'),
tokenCount: integer('token_count').default(0), breadcrumb: text('breadcrumb'), // e.g. "Installation > Getting Started"
createdAt: integer('created_at', { mode: 'timestamp' }).notNull(), tokenCount: integer('token_count').default(0),
createdAt: integer('created_at', { mode: 'timestamp' }).notNull()
}); });
``` ```
@@ -121,12 +127,13 @@ Stores vector embeddings separately to keep snippets table lean.
```typescript ```typescript
export const snippetEmbeddings = sqliteTable('snippet_embeddings', { export const snippetEmbeddings = sqliteTable('snippet_embeddings', {
snippetId: text('snippet_id').primaryKey() snippetId: text('snippet_id')
.references(() => snippets.id, { onDelete: 'cascade' }), .primaryKey()
model: text('model').notNull(), // embedding model identifier .references(() => snippets.id, { onDelete: 'cascade' }),
dimensions: integer('dimensions').notNull(), model: text('model').notNull(), // embedding model identifier
embedding: blob('embedding').notNull(), // Float32Array as binary blob dimensions: integer('dimensions').notNull(),
createdAt: integer('created_at', { mode: 'timestamp' }).notNull(), embedding: blob('embedding').notNull(), // Float32Array as binary blob
createdAt: integer('created_at', { mode: 'timestamp' }).notNull()
}); });
``` ```
@@ -136,20 +143,23 @@ Tracks asynchronous indexing operations.
```typescript ```typescript
export const indexingJobs = sqliteTable('indexing_jobs', { export const indexingJobs = sqliteTable('indexing_jobs', {
id: text('id').primaryKey(), // UUID id: text('id').primaryKey(), // UUID
repositoryId: text('repository_id').notNull() repositoryId: text('repository_id')
.references(() => repositories.id, { onDelete: 'cascade' }), .notNull()
versionId: text('version_id'), .references(() => repositories.id, { onDelete: 'cascade' }),
status: text('status', { versionId: text('version_id'),
enum: ['queued', 'running', 'done', 'failed'] status: text('status', {
}).notNull().default('queued'), enum: ['queued', 'running', 'done', 'failed']
progress: integer('progress').default(0), // 0100 })
totalFiles: integer('total_files').default(0), .notNull()
processedFiles: integer('processed_files').default(0), .default('queued'),
error: text('error'), progress: integer('progress').default(0), // 0100
startedAt: integer('started_at', { mode: 'timestamp' }), totalFiles: integer('total_files').default(0),
completedAt: integer('completed_at', { mode: 'timestamp' }), processedFiles: integer('processed_files').default(0),
createdAt: integer('created_at', { mode: 'timestamp' }).notNull(), error: text('error'),
startedAt: integer('started_at', { mode: 'timestamp' }),
completedAt: integer('completed_at', { mode: 'timestamp' }),
createdAt: integer('created_at', { mode: 'timestamp' }).notNull()
}); });
``` ```
@@ -159,17 +169,19 @@ Stores parsed `trueref.json` / `context7.json` configuration.
```typescript ```typescript
export const repositoryConfigs = sqliteTable('repository_configs', { export const repositoryConfigs = sqliteTable('repository_configs', {
repositoryId: text('repository_id').primaryKey() repositoryId: text('repository_id')
.references(() => repositories.id, { onDelete: 'cascade' }), .primaryKey()
projectTitle: text('project_title'), .references(() => repositories.id, { onDelete: 'cascade' }),
description: text('description'), projectTitle: text('project_title'),
folders: text('folders', { mode: 'json' }).$type<string[]>(), description: text('description'),
excludeFolders: text('exclude_folders', { mode: 'json' }).$type<string[]>(), folders: text('folders', { mode: 'json' }).$type<string[]>(),
excludeFiles: text('exclude_files', { mode: 'json' }).$type<string[]>(), excludeFolders: text('exclude_folders', { mode: 'json' }).$type<string[]>(),
rules: text('rules', { mode: 'json' }).$type<string[]>(), excludeFiles: text('exclude_files', { mode: 'json' }).$type<string[]>(),
previousVersions: text('previous_versions', { mode: 'json' }) rules: text('rules', { mode: 'json' }).$type<string[]>(),
.$type<{ tag: string; title: string }[]>(), previousVersions: text('previous_versions', { mode: 'json' }).$type<
updatedAt: integer('updated_at', { mode: 'timestamp' }).notNull(), { tag: string; title: string }[]
>(),
updatedAt: integer('updated_at', { mode: 'timestamp' }).notNull()
}); });
``` ```
@@ -179,9 +191,9 @@ Key-value store for global application settings.
```typescript ```typescript
export const settings = sqliteTable('settings', { export const settings = sqliteTable('settings', {
key: text('key').primaryKey(), key: text('key').primaryKey(),
value: text('value', { mode: 'json' }), value: text('value', { mode: 'json' }),
updatedAt: integer('updated_at', { mode: 'timestamp' }).notNull(), updatedAt: integer('updated_at', { mode: 'timestamp' }).notNull()
}); });
``` ```

View File

@@ -31,10 +31,12 @@ Implement the core `RepositoryService` that handles CRUD operations for reposito
## Repository ID Generation ## Repository ID Generation
GitHub repositories: GitHub repositories:
- Input URL: `https://github.com/facebook/react` or `github.com/facebook/react` - Input URL: `https://github.com/facebook/react` or `github.com/facebook/react`
- Generated ID: `/facebook/react` - Generated ID: `/facebook/react`
Local repositories: Local repositories:
- Input path: `/home/user/projects/my-sdk` - Input path: `/home/user/projects/my-sdk`
- Generated ID: `/local/my-sdk` (basename of path, slugified) - Generated ID: `/local/my-sdk` (basename of path, slugified)
- Collision resolution: append `-2`, `-3`, etc. - Collision resolution: append `-2`, `-3`, etc.
@@ -49,44 +51,44 @@ Version-specific IDs: `/facebook/react/v18.3.0`
// src/lib/server/services/repository.service.ts // src/lib/server/services/repository.service.ts
export interface AddRepositoryInput { export interface AddRepositoryInput {
source: 'github' | 'local'; source: 'github' | 'local';
sourceUrl: string; // GitHub URL or absolute local path sourceUrl: string; // GitHub URL or absolute local path
title?: string; // override auto-detected title title?: string; // override auto-detected title
description?: string; description?: string;
branch?: string; // GitHub: default branch; Local: n/a branch?: string; // GitHub: default branch; Local: n/a
githubToken?: string; // for private GitHub repos githubToken?: string; // for private GitHub repos
} }
export interface UpdateRepositoryInput { export interface UpdateRepositoryInput {
title?: string; title?: string;
description?: string; description?: string;
branch?: string; branch?: string;
githubToken?: string; githubToken?: string;
} }
export class RepositoryService { export class RepositoryService {
constructor(private db: BetterSQLite3.Database) {} constructor(private db: BetterSQLite3.Database) {}
async list(options?: { async list(options?: {
state?: Repository['state']; state?: Repository['state'];
limit?: number; limit?: number;
offset?: number; offset?: number;
}): Promise<Repository[]> }): Promise<Repository[]>;
async get(id: string): Promise<Repository | null> async get(id: string): Promise<Repository | null>;
async add(input: AddRepositoryInput): Promise<Repository> async add(input: AddRepositoryInput): Promise<Repository>;
async update(id: string, input: UpdateRepositoryInput): Promise<Repository> async update(id: string, input: UpdateRepositoryInput): Promise<Repository>;
async remove(id: string): Promise<void> async remove(id: string): Promise<void>;
async getStats(id: string): Promise<{ async getStats(id: string): Promise<{
totalSnippets: number; totalSnippets: number;
totalTokens: number; totalTokens: number;
totalDocuments: number; totalDocuments: number;
lastIndexedAt: Date | null; lastIndexedAt: Date | null;
}> }>;
} }
``` ```
@@ -97,48 +99,52 @@ export class RepositoryService {
### `GET /api/v1/libs` ### `GET /api/v1/libs`
Query parameters: Query parameters:
- `state` (optional): filter by state (`pending`, `indexed`, `error`, etc.) - `state` (optional): filter by state (`pending`, `indexed`, `error`, etc.)
- `limit` (optional, default 50): max results - `limit` (optional, default 50): max results
- `offset` (optional, default 0): pagination offset - `offset` (optional, default 0): pagination offset
Response `200`: Response `200`:
```json ```json
{ {
"libraries": [ "libraries": [
{ {
"id": "/facebook/react", "id": "/facebook/react",
"title": "React", "title": "React",
"description": "...", "description": "...",
"source": "github", "source": "github",
"state": "indexed", "state": "indexed",
"totalSnippets": 1234, "totalSnippets": 1234,
"totalTokens": 98000, "totalTokens": 98000,
"trustScore": 8.5, "trustScore": 8.5,
"stars": 228000, "stars": 228000,
"lastIndexedAt": "2026-03-22T10:00:00Z", "lastIndexedAt": "2026-03-22T10:00:00Z",
"versions": ["v18.3.0", "v17.0.2"] "versions": ["v18.3.0", "v17.0.2"]
} }
], ],
"total": 12, "total": 12,
"limit": 50, "limit": 50,
"offset": 0 "offset": 0
} }
``` ```
### `POST /api/v1/libs` ### `POST /api/v1/libs`
Request body: Request body:
```json ```json
{ {
"source": "github", "source": "github",
"sourceUrl": "https://github.com/facebook/react", "sourceUrl": "https://github.com/facebook/react",
"branch": "main", "branch": "main",
"githubToken": "ghp_...", "githubToken": "ghp_...",
"autoIndex": true "autoIndex": true
} }
``` ```
Response `201`: Response `201`:
```json ```json
{ {
"library": { ...Repository }, "library": { ...Repository },
@@ -149,6 +155,7 @@ Response `201`:
`autoIndex: true` (default) immediately queues an indexing job. `autoIndex: true` (default) immediately queues an indexing job.
Response `409` if repository already exists: Response `409` if repository already exists:
```json ```json
{ "error": "Repository /facebook/react already exists" } { "error": "Repository /facebook/react already exists" }
``` ```
@@ -176,20 +183,22 @@ Response `404`: not found.
Triggers a new indexing job. If a job is already running for this repo, returns the existing job. Triggers a new indexing job. If a job is already running for this repo, returns the existing job.
Request body (optional): Request body (optional):
```json ```json
{ "version": "v18.3.0" } { "version": "v18.3.0" }
``` ```
Response `202`: Response `202`:
```json ```json
{ {
"job": { "job": {
"id": "uuid", "id": "uuid",
"repositoryId": "/facebook/react", "repositoryId": "/facebook/react",
"status": "queued", "status": "queued",
"progress": 0, "progress": 0,
"createdAt": "2026-03-22T10:00:00Z" "createdAt": "2026-03-22T10:00:00Z"
} }
} }
``` ```
@@ -198,15 +207,17 @@ Response `202`:
## Error Response Shape ## Error Response Shape
All error responses follow: All error responses follow:
```json ```json
{ {
"error": "Human-readable message", "error": "Human-readable message",
"code": "MACHINE_READABLE_CODE", "code": "MACHINE_READABLE_CODE",
"details": {} "details": {}
} }
``` ```
Error codes: Error codes:
- `NOT_FOUND` - `NOT_FOUND`
- `ALREADY_EXISTS` - `ALREADY_EXISTS`
- `INVALID_INPUT` - `INVALID_INPUT`
@@ -219,23 +230,23 @@ Error codes:
```typescript ```typescript
function resolveGitHubId(url: string): string { function resolveGitHubId(url: string): string {
// Parse owner/repo from URL variants: // Parse owner/repo from URL variants:
// https://github.com/facebook/react // https://github.com/facebook/react
// https://github.com/facebook/react.git // https://github.com/facebook/react.git
// github.com/facebook/react // github.com/facebook/react
const match = url.match(/github\.com\/([^/]+)\/([^/\s.]+)/); const match = url.match(/github\.com\/([^/]+)\/([^/\s.]+)/);
if (!match) throw new Error('Invalid GitHub URL'); if (!match) throw new Error('Invalid GitHub URL');
return `/${match[1]}/${match[2]}`; return `/${match[1]}/${match[2]}`;
} }
function resolveLocalId(path: string, existingIds: string[]): string { function resolveLocalId(path: string, existingIds: string[]): string {
const base = slugify(path.split('/').at(-1)!); const base = slugify(path.split('/').at(-1)!);
let id = `/local/${base}`; let id = `/local/${base}`;
let counter = 2; let counter = 2;
while (existingIds.includes(id)) { while (existingIds.includes(id)) {
id = `/local/${base}-${counter++}`; id = `/local/${base}-${counter++}`;
} }
return id; return id;
} }
``` ```

View File

@@ -37,17 +37,46 @@ The crawler only downloads files with these extensions:
```typescript ```typescript
const INDEXABLE_EXTENSIONS = new Set([ const INDEXABLE_EXTENSIONS = new Set([
// Documentation // Documentation
'.md', '.mdx', '.txt', '.rst', '.md',
// Code '.mdx',
'.ts', '.tsx', '.js', '.jsx', '.txt',
'.py', '.rb', '.go', '.rs', '.java', '.cs', '.cpp', '.c', '.h', '.rst',
'.swift', '.kt', '.php', '.scala', '.clj', '.ex', '.exs', // Code
'.sh', '.bash', '.zsh', '.fish', '.ts',
// Config / data '.tsx',
'.json', '.yaml', '.yml', '.toml', '.js',
// Web '.jsx',
'.html', '.css', '.svelte', '.vue', '.py',
'.rb',
'.go',
'.rs',
'.java',
'.cs',
'.cpp',
'.c',
'.h',
'.swift',
'.kt',
'.php',
'.scala',
'.clj',
'.ex',
'.exs',
'.sh',
'.bash',
'.zsh',
'.fish',
// Config / data
'.json',
'.yaml',
'.yml',
'.toml',
// Web
'.html',
'.css',
'.svelte',
'.vue'
]); ]);
const MAX_FILE_SIZE_BYTES = 500_000; // 500 KB — skip large generated files const MAX_FILE_SIZE_BYTES = 500_000; // 500 KB — skip large generated files
@@ -59,28 +88,28 @@ const MAX_FILE_SIZE_BYTES = 500_000; // 500 KB — skip large generated files
```typescript ```typescript
export interface CrawledFile { export interface CrawledFile {
path: string; // relative path within repo, e.g. "src/index.ts" path: string; // relative path within repo, e.g. "src/index.ts"
content: string; // UTF-8 file content content: string; // UTF-8 file content
size: number; // bytes size: number; // bytes
sha: string; // GitHub blob SHA (used as checksum) sha: string; // GitHub blob SHA (used as checksum)
language: string; // detected from extension language: string; // detected from extension
} }
export interface CrawlResult { export interface CrawlResult {
files: CrawledFile[]; files: CrawledFile[];
totalFiles: number; // files matching filters totalFiles: number; // files matching filters
skippedFiles: number; // filtered out or too large skippedFiles: number; // filtered out or too large
branch: string; // branch/tag that was crawled branch: string; // branch/tag that was crawled
commitSha: string; // HEAD commit SHA commitSha: string; // HEAD commit SHA
} }
export interface CrawlOptions { export interface CrawlOptions {
owner: string; owner: string;
repo: string; repo: string;
ref?: string; // branch, tag, or commit SHA; defaults to repo default branch ref?: string; // branch, tag, or commit SHA; defaults to repo default branch
token?: string; // GitHub PAT for private repos token?: string; // GitHub PAT for private repos
config?: RepoConfig; // parsed trueref.json config?: RepoConfig; // parsed trueref.json
onProgress?: (processed: number, total: number) => void; onProgress?: (processed: number, total: number) => void;
} }
``` ```
@@ -89,12 +118,14 @@ export interface CrawlOptions {
## GitHub API Usage ## GitHub API Usage
### Step 1: Get default branch (if ref not specified) ### Step 1: Get default branch (if ref not specified)
``` ```
GET https://api.github.com/repos/{owner}/{repo} GET https://api.github.com/repos/{owner}/{repo}
→ { default_branch: "main", stargazers_count: 12345 } → { default_branch: "main", stargazers_count: 12345 }
``` ```
### Step 2: Fetch file tree (recursive) ### Step 2: Fetch file tree (recursive)
``` ```
GET https://api.github.com/repos/{owner}/{repo}/git/trees/{ref}?recursive=1 GET https://api.github.com/repos/{owner}/{repo}/git/trees/{ref}?recursive=1
→ { → {
@@ -109,12 +140,14 @@ GET https://api.github.com/repos/{owner}/{repo}/git/trees/{ref}?recursive=1
If `truncated: true`, the tree has >100k items. Use `--depth` pagination or filter top-level directories first. If `truncated: true`, the tree has >100k items. Use `--depth` pagination or filter top-level directories first.
### Step 3: Download file contents (parallel) ### Step 3: Download file contents (parallel)
``` ```
GET https://api.github.com/repos/{owner}/{repo}/contents/{path}?ref={ref} GET https://api.github.com/repos/{owner}/{repo}/contents/{path}?ref={ref}
→ { content: "<base64>", encoding: "base64", size: 1234, sha: "abc123" } → { content: "<base64>", encoding: "base64", size: 1234, sha: "abc123" }
``` ```
Alternative for large repos: use raw content URL: Alternative for large repos: use raw content URL:
``` ```
GET https://raw.githubusercontent.com/{owner}/{repo}/{ref}/{path} GET https://raw.githubusercontent.com/{owner}/{repo}/{ref}/{path}
``` ```
@@ -124,48 +157,47 @@ GET https://raw.githubusercontent.com/{owner}/{repo}/{ref}/{path}
## Filtering Logic ## Filtering Logic
```typescript ```typescript
function shouldIndexFile( function shouldIndexFile(filePath: string, fileSize: number, config?: RepoConfig): boolean {
filePath: string, const ext = path.extname(filePath).toLowerCase();
fileSize: number, const base = path.basename(filePath);
config?: RepoConfig
): boolean {
const ext = path.extname(filePath).toLowerCase();
const base = path.basename(filePath);
// 1. Must have indexable extension // 1. Must have indexable extension
if (!INDEXABLE_EXTENSIONS.has(ext)) return false; if (!INDEXABLE_EXTENSIONS.has(ext)) return false;
// 2. Must not exceed size limit // 2. Must not exceed size limit
if (fileSize > MAX_FILE_SIZE_BYTES) return false; if (fileSize > MAX_FILE_SIZE_BYTES) return false;
// 3. Exclude lockfiles and other non-source artifacts // 3. Exclude lockfiles and other non-source artifacts
if (IGNORED_FILE_NAMES.has(base)) return false; if (IGNORED_FILE_NAMES.has(base)) return false;
// 4. Exclude minified and bundled assets // 4. Exclude minified and bundled assets
if (base.includes('.min.') || base.endsWith('.bundle.js') || base.endsWith('.bundle.css')) { if (base.includes('.min.') || base.endsWith('.bundle.js') || base.endsWith('.bundle.css')) {
return false; return false;
} }
// 5. Apply config excludeFiles (exact filename match) // 5. Apply config excludeFiles (exact filename match)
if (config?.excludeFiles?.includes(base)) return false; if (config?.excludeFiles?.includes(base)) return false;
// 6. Exclude common dependency/build/cache directories at any depth // 6. Exclude common dependency/build/cache directories at any depth
if (isInIgnoredDirectory(filePath)) return false; if (isInIgnoredDirectory(filePath)) return false;
// 7. Apply config excludeFolders (regex or prefix match) // 7. Apply config excludeFolders (regex or prefix match)
if (config?.excludeFolders?.some(folder => if (
filePath.startsWith(folder) || new RegExp(folder).test(filePath) config?.excludeFolders?.some(
)) return false; (folder) => filePath.startsWith(folder) || new RegExp(folder).test(filePath)
)
)
return false;
// 8. Apply config folders allowlist (if specified, only index those paths) // 8. Apply config folders allowlist (if specified, only index those paths)
if (config?.folders?.length) { if (config?.folders?.length) {
const inAllowedFolder = config.folders.some(folder => const inAllowedFolder = config.folders.some(
filePath.startsWith(folder) || new RegExp(folder).test(filePath) (folder) => filePath.startsWith(folder) || new RegExp(folder).test(filePath)
); );
if (!inAllowedFolder) return false; if (!inAllowedFolder) return false;
} }
return true; return true;
} }
``` ```
@@ -177,20 +209,20 @@ The shared ignored-directory list is intentionally broader than the original bas
```typescript ```typescript
class GitHubRateLimiter { class GitHubRateLimiter {
private remaining = 5000; private remaining = 5000;
private resetAt = Date.now(); private resetAt = Date.now();
updateFromHeaders(headers: Headers): void { updateFromHeaders(headers: Headers): void {
this.remaining = parseInt(headers.get('X-RateLimit-Remaining') ?? '5000'); this.remaining = parseInt(headers.get('X-RateLimit-Remaining') ?? '5000');
this.resetAt = parseInt(headers.get('X-RateLimit-Reset') ?? '0') * 1000; this.resetAt = parseInt(headers.get('X-RateLimit-Reset') ?? '0') * 1000;
} }
async waitIfNeeded(): Promise<void> { async waitIfNeeded(): Promise<void> {
if (this.remaining <= 10) { if (this.remaining <= 10) {
const waitMs = Math.max(0, this.resetAt - Date.now()) + 1000; const waitMs = Math.max(0, this.resetAt - Date.now()) + 1000;
await sleep(waitMs); await sleep(waitMs);
} }
} }
} }
``` ```
@@ -200,14 +232,14 @@ Requests are made with a concurrency limit of 10 parallel downloads using a sema
## Error Handling ## Error Handling
| Scenario | Behavior | | Scenario | Behavior |
|----------|---------| | ------------------------- | --------------------------------------------------------------------------- |
| 404 Not Found | Throw `RepositoryNotFoundError` | | 404 Not Found | Throw `RepositoryNotFoundError` |
| 401 Unauthorized | Throw `AuthenticationError` (invalid or missing token) | | 401 Unauthorized | Throw `AuthenticationError` (invalid or missing token) |
| 403 Forbidden | If `X-RateLimit-Remaining: 0`, wait and retry; else throw `PermissionError` | | 403 Forbidden | If `X-RateLimit-Remaining: 0`, wait and retry; else throw `PermissionError` |
| 422 Unprocessable | Tree too large; switch to directory-by-directory traversal | | 422 Unprocessable | Tree too large; switch to directory-by-directory traversal |
| Network error | Retry up to 3 times with exponential backoff | | Network error | Retry up to 3 times with exponential backoff |
| File content decode error | Skip file, log warning | | File content decode error | Skip file, log warning |
--- ---

View File

@@ -38,9 +38,9 @@ Reuses `CrawledFile` and `CrawlResult` from TRUEREF-0003 crawler types:
```typescript ```typescript
export interface LocalCrawlOptions { export interface LocalCrawlOptions {
rootPath: string; // absolute path to repository root rootPath: string; // absolute path to repository root
config?: RepoConfig; // parsed trueref.json config?: RepoConfig; // parsed trueref.json
onProgress?: (processed: number, total: number) => void; onProgress?: (processed: number, total: number) => void;
} }
``` ```
@@ -50,75 +50,73 @@ export interface LocalCrawlOptions {
```typescript ```typescript
export class LocalCrawler { export class LocalCrawler {
async crawl(options: LocalCrawlOptions): Promise<CrawlResult> { async crawl(options: LocalCrawlOptions): Promise<CrawlResult> {
// 1. Load root .gitignore if present // 1. Load root .gitignore if present
const gitignore = await this.loadGitignore(options.rootPath); const gitignore = await this.loadGitignore(options.rootPath);
// 2. Enumerate files recursively, pruning ignored directories early // 2. Enumerate files recursively, pruning ignored directories early
const allFiles = await this.walkDirectory(options.rootPath, '', gitignore); const allFiles = await this.walkDirectory(options.rootPath, '', gitignore);
// 3. Look for trueref.json / context7.json first // 3. Look for trueref.json / context7.json first
const configFile = allFiles.find(f => const configFile = allFiles.find((f) => f === 'trueref.json' || f === 'context7.json');
f === 'trueref.json' || f === 'context7.json' let config = options.config;
); if (configFile && !config) {
let config = options.config; config = await this.parseConfigFile(path.join(options.rootPath, configFile));
if (configFile && !config) { }
config = await this.parseConfigFile(
path.join(options.rootPath, configFile)
);
}
// 4. Filter files // 4. Filter files
const filteredFiles = allFiles.filter(relPath => { const filteredFiles = allFiles.filter((relPath) => {
const stat = statCache.get(relPath); const stat = statCache.get(relPath);
return shouldIndexFile(relPath, stat.size, config); return shouldIndexFile(relPath, stat.size, config);
}); });
// 5. Read and return file contents // 5. Read and return file contents
const crawledFiles: CrawledFile[] = []; const crawledFiles: CrawledFile[] = [];
for (const [i, relPath] of filteredFiles.entries()) { for (const [i, relPath] of filteredFiles.entries()) {
const absPath = path.join(options.rootPath, relPath); const absPath = path.join(options.rootPath, relPath);
const content = await fs.readFile(absPath, 'utf-8'); const content = await fs.readFile(absPath, 'utf-8');
const sha = computeSHA256(content); const sha = computeSHA256(content);
crawledFiles.push({ crawledFiles.push({
path: relPath, path: relPath,
content, content,
size: Buffer.byteLength(content, 'utf-8'), size: Buffer.byteLength(content, 'utf-8'),
sha, sha,
language: detectLanguage(relPath), language: detectLanguage(relPath)
}); });
options.onProgress?.(i + 1, filteredFiles.length); options.onProgress?.(i + 1, filteredFiles.length);
} }
return { return {
files: crawledFiles, files: crawledFiles,
totalFiles: filteredFiles.length, totalFiles: filteredFiles.length,
skippedFiles: allFiles.length - filteredFiles.length, skippedFiles: allFiles.length - filteredFiles.length,
branch: 'local', branch: 'local',
commitSha: computeSHA256(crawledFiles.map(f => f.sha).join('')), commitSha: computeSHA256(crawledFiles.map((f) => f.sha).join(''))
}; };
} }
private async walkDirectory(dir: string, rel = '', gitignore?: GitignoreFilter): Promise<string[]> { private async walkDirectory(
const entries = await fs.readdir(dir, { withFileTypes: true }); dir: string,
const files: string[] = []; rel = '',
for (const entry of entries) { gitignore?: GitignoreFilter
if (!entry.isFile() && !entry.isDirectory()) continue; // skip symlinks, devices ): Promise<string[]> {
const relPath = rel ? `${rel}/${entry.name}` : entry.name; const entries = await fs.readdir(dir, { withFileTypes: true });
if (entry.isDirectory()) { const files: string[] = [];
if (shouldPruneDirectory(relPath) || gitignore?.isIgnored(relPath, true)) { for (const entry of entries) {
continue; if (!entry.isFile() && !entry.isDirectory()) continue; // skip symlinks, devices
} const relPath = rel ? `${rel}/${entry.name}` : entry.name;
files.push(...await this.walkDirectory( if (entry.isDirectory()) {
path.join(dir, entry.name), relPath, gitignore if (shouldPruneDirectory(relPath) || gitignore?.isIgnored(relPath, true)) {
)); continue;
} else { }
if (gitignore?.isIgnored(relPath, false)) continue; files.push(...(await this.walkDirectory(path.join(dir, entry.name), relPath, gitignore)));
files.push(relPath); } else {
} if (gitignore?.isIgnored(relPath, false)) continue;
} files.push(relPath);
return files; }
} }
return files;
}
} }
``` ```
@@ -142,7 +140,7 @@ Directory pruning should happen during the walk so large dependency trees are ne
import { createHash } from 'crypto'; import { createHash } from 'crypto';
function computeSHA256(content: string): string { function computeSHA256(content: string): string {
return createHash('sha256').update(content, 'utf-8').digest('hex'); return createHash('sha256').update(content, 'utf-8').digest('hex');
} }
``` ```

View File

@@ -30,19 +30,19 @@ Implement the document parsing and chunking pipeline that transforms raw file co
## Supported File Types ## Supported File Types
| Extension | Parser Strategy | | Extension | Parser Strategy |
|-----------|----------------| | --------------------------------- | ------------------------------------------------------- |
| `.md`, `.mdx` | Heading-based section splitting + code block extraction | | `.md`, `.mdx` | Heading-based section splitting + code block extraction |
| `.txt`, `.rst` | Paragraph-based splitting | | `.txt`, `.rst` | Paragraph-based splitting |
| `.ts`, `.tsx`, `.js`, `.jsx` | AST-free: function/class boundary detection via regex | | `.ts`, `.tsx`, `.js`, `.jsx` | AST-free: function/class boundary detection via regex |
| `.py` | `def`/`class` boundary detection | | `.py` | `def`/`class` boundary detection |
| `.go` | `func`/`type` boundary detection | | `.go` | `func`/`type` boundary detection |
| `.rs` | `fn`/`impl`/`struct` boundary detection | | `.rs` | `fn`/`impl`/`struct` boundary detection |
| `.java`, `.cs`, `.kt`, `.swift` | Class/method boundary detection | | `.java`, `.cs`, `.kt`, `.swift` | Class/method boundary detection |
| `.rb` | `def`/`class` boundary detection | | `.rb` | `def`/`class` boundary detection |
| `.json`, `.yaml`, `.yml`, `.toml` | Structural chunking (top-level keys) | | `.json`, `.yaml`, `.yml`, `.toml` | Structural chunking (top-level keys) |
| `.html`, `.svelte`, `.vue` | Text content extraction + script block splitting | | `.html`, `.svelte`, `.vue` | Text content extraction + script block splitting |
| Other code | Line-count-based sliding window (200 lines per chunk) | | Other code | Line-count-based sliding window (200 lines per chunk) |
--- ---
@@ -52,9 +52,9 @@ Use a simple character-based approximation (no tokenizer library needed for v1):
```typescript ```typescript
function estimateTokens(text: string): number { function estimateTokens(text: string): number {
// Empirically: ~4 chars per token for English prose // Empirically: ~4 chars per token for English prose
// ~3 chars per token for code (more symbols) // ~3 chars per token for code (more symbols)
return Math.ceil(text.length / 3.5); return Math.ceil(text.length / 3.5);
} }
``` ```
@@ -74,49 +74,49 @@ The Markdown parser is the most important parser as most documentation is Markdo
```typescript ```typescript
interface MarkdownSection { interface MarkdownSection {
headings: string[]; // heading stack at this point headings: string[]; // heading stack at this point
content: string; // text content (sans code blocks) content: string; // text content (sans code blocks)
codeBlocks: { language: string; code: string }[]; codeBlocks: { language: string; code: string }[];
} }
function parseMarkdown(content: string, filePath: string): Snippet[] { function parseMarkdown(content: string, filePath: string): Snippet[] {
const sections = splitIntoSections(content); const sections = splitIntoSections(content);
const snippets: Snippet[] = []; const snippets: Snippet[] = [];
for (const section of sections) { for (const section of sections) {
const breadcrumb = section.headings.join(' > '); const breadcrumb = section.headings.join(' > ');
const title = section.headings.at(-1) ?? path.basename(filePath); const title = section.headings.at(-1) ?? path.basename(filePath);
// Emit info snippet for text content // Emit info snippet for text content
if (section.content.trim().length >= 20) { if (section.content.trim().length >= 20) {
const chunks = chunkText(section.content, MAX_TOKENS, OVERLAP_TOKENS); const chunks = chunkText(section.content, MAX_TOKENS, OVERLAP_TOKENS);
for (const chunk of chunks) { for (const chunk of chunks) {
snippets.push({ snippets.push({
type: 'info', type: 'info',
title, title,
content: chunk, content: chunk,
breadcrumb, breadcrumb,
tokenCount: estimateTokens(chunk), tokenCount: estimateTokens(chunk)
}); });
} }
} }
// Emit code snippets for each code block // Emit code snippets for each code block
for (const block of section.codeBlocks) { for (const block of section.codeBlocks) {
if (block.code.trim().length >= 20) { if (block.code.trim().length >= 20) {
snippets.push({ snippets.push({
type: 'code', type: 'code',
title, title,
content: block.code, content: block.code,
language: block.language || detectLanguage('.' + block.language), language: block.language || detectLanguage('.' + block.language),
breadcrumb, breadcrumb,
tokenCount: estimateTokens(block.code), tokenCount: estimateTokens(block.code)
}); });
} }
} }
} }
return snippets; return snippets;
} }
``` ```
@@ -135,43 +135,41 @@ For non-Markdown code files, use regex-based function/class boundary detection.
```typescript ```typescript
const BOUNDARY_PATTERNS: Record<string, RegExp> = { const BOUNDARY_PATTERNS: Record<string, RegExp> = {
typescript: /^(export\s+)?(async\s+)?(function|class|interface|type|const|let|var)\s+\w+/m, typescript: /^(export\s+)?(async\s+)?(function|class|interface|type|const|let|var)\s+\w+/m,
python: /^(async\s+)?(def|class)\s+\w+/m, python: /^(async\s+)?(def|class)\s+\w+/m,
go: /^(func|type|var|const)\s+\w+/m, go: /^(func|type|var|const)\s+\w+/m,
rust: /^(pub\s+)?(fn|impl|struct|enum|trait)\s+\w+/m, rust: /^(pub\s+)?(fn|impl|struct|enum|trait)\s+\w+/m,
java: /^(public|private|protected|static).*?(class|interface|enum|void|\w+)\s+\w+\s*[({]/m, java: /^(public|private|protected|static).*?(class|interface|enum|void|\w+)\s+\w+\s*[({]/m
}; };
function parseCodeFile( function parseCodeFile(content: string, filePath: string, language: string): Snippet[] {
content: string, const pattern = BOUNDARY_PATTERNS[language];
filePath: string, const breadcrumb = filePath;
language: string const title = path.basename(filePath);
): Snippet[] {
const pattern = BOUNDARY_PATTERNS[language];
const breadcrumb = filePath;
const title = path.basename(filePath);
if (!pattern) { if (!pattern) {
// Fallback: sliding window // Fallback: sliding window
return slidingWindowChunks(content, filePath, language); return slidingWindowChunks(content, filePath, language);
} }
const chunks = splitAtBoundaries(content, pattern); const chunks = splitAtBoundaries(content, pattern);
return chunks return chunks
.filter(chunk => chunk.trim().length >= 20) .filter((chunk) => chunk.trim().length >= 20)
.flatMap(chunk => { .flatMap((chunk) => {
if (estimateTokens(chunk) <= MAX_TOKENS) { if (estimateTokens(chunk) <= MAX_TOKENS) {
return [{ return [
type: 'code' as const, {
title, type: 'code' as const,
content: chunk, title,
language, content: chunk,
breadcrumb, language,
tokenCount: estimateTokens(chunk), breadcrumb,
}]; tokenCount: estimateTokens(chunk)
} }
return slidingWindowChunks(chunk, filePath, language); ];
}); }
return slidingWindowChunks(chunk, filePath, language);
});
} }
``` ```
@@ -188,27 +186,23 @@ const MIN_CONTENT_LENGTH = 20; // characters
### Sliding Window Chunker ### Sliding Window Chunker
```typescript ```typescript
function chunkText( function chunkText(text: string, maxTokens: number, overlapTokens: number): string[] {
text: string, const words = text.split(/\s+/);
maxTokens: number, const wordsPerToken = 0.75; // ~0.75 words per token
overlapTokens: number const maxWords = Math.floor(maxTokens * wordsPerToken);
): string[] { const overlapWords = Math.floor(overlapTokens * wordsPerToken);
const words = text.split(/\s+/);
const wordsPerToken = 0.75; // ~0.75 words per token
const maxWords = Math.floor(maxTokens * wordsPerToken);
const overlapWords = Math.floor(overlapTokens * wordsPerToken);
const chunks: string[] = []; const chunks: string[] = [];
let start = 0; let start = 0;
while (start < words.length) { while (start < words.length) {
const end = Math.min(start + maxWords, words.length); const end = Math.min(start + maxWords, words.length);
chunks.push(words.slice(start, end).join(' ')); chunks.push(words.slice(start, end).join(' '));
if (end === words.length) break; if (end === words.length) break;
start = end - overlapWords; start = end - overlapWords;
} }
return chunks; return chunks;
} }
``` ```
@@ -218,34 +212,42 @@ function chunkText(
```typescript ```typescript
const LANGUAGE_MAP: Record<string, string> = { const LANGUAGE_MAP: Record<string, string> = {
'.ts': 'typescript', '.tsx': 'typescript', '.ts': 'typescript',
'.js': 'javascript', '.jsx': 'javascript', '.tsx': 'typescript',
'.py': 'python', '.js': 'javascript',
'.rb': 'ruby', '.jsx': 'javascript',
'.go': 'go', '.py': 'python',
'.rs': 'rust', '.rb': 'ruby',
'.java': 'java', '.go': 'go',
'.cs': 'csharp', '.rs': 'rust',
'.cpp': 'cpp', '.c': 'c', '.h': 'c', '.java': 'java',
'.swift': 'swift', '.cs': 'csharp',
'.kt': 'kotlin', '.cpp': 'cpp',
'.php': 'php', '.c': 'c',
'.scala': 'scala', '.h': 'c',
'.sh': 'bash', '.bash': 'bash', '.zsh': 'bash', '.swift': 'swift',
'.md': 'markdown', '.mdx': 'markdown', '.kt': 'kotlin',
'.json': 'json', '.php': 'php',
'.yaml': 'yaml', '.yml': 'yaml', '.scala': 'scala',
'.toml': 'toml', '.sh': 'bash',
'.html': 'html', '.bash': 'bash',
'.css': 'css', '.zsh': 'bash',
'.svelte': 'svelte', '.md': 'markdown',
'.vue': 'vue', '.mdx': 'markdown',
'.sql': 'sql', '.json': 'json',
'.yaml': 'yaml',
'.yml': 'yaml',
'.toml': 'toml',
'.html': 'html',
'.css': 'css',
'.svelte': 'svelte',
'.vue': 'vue',
'.sql': 'sql'
}; };
function detectLanguage(filePath: string): string { function detectLanguage(filePath: string): string {
const ext = path.extname(filePath).toLowerCase(); const ext = path.extname(filePath).toLowerCase();
return LANGUAGE_MAP[ext] ?? 'text'; return LANGUAGE_MAP[ext] ?? 'text';
} }
``` ```
@@ -255,32 +257,32 @@ function detectLanguage(filePath: string): string {
```typescript ```typescript
export interface ParseOptions { export interface ParseOptions {
repositoryId: string; repositoryId: string;
documentId: string; documentId: string;
versionId?: string; versionId?: string;
} }
export function parseFile( export function parseFile(file: CrawledFile, options: ParseOptions): NewSnippet[] {
file: CrawledFile, const language = detectLanguage(file.path);
options: ParseOptions let rawSnippets: Omit<
): NewSnippet[] { NewSnippet,
const language = detectLanguage(file.path); 'id' | 'repositoryId' | 'documentId' | 'versionId' | 'createdAt'
let rawSnippets: Omit<NewSnippet, 'id' | 'repositoryId' | 'documentId' | 'versionId' | 'createdAt'>[]; >[];
if (language === 'markdown') { if (language === 'markdown') {
rawSnippets = parseMarkdown(file.content, file.path); rawSnippets = parseMarkdown(file.content, file.path);
} else { } else {
rawSnippets = parseCodeFile(file.content, file.path, language); rawSnippets = parseCodeFile(file.content, file.path, language);
} }
return rawSnippets.map(s => ({ return rawSnippets.map((s) => ({
...s, ...s,
id: crypto.randomUUID(), id: crypto.randomUUID(),
repositoryId: options.repositoryId, repositoryId: options.repositoryId,
documentId: options.documentId, documentId: options.documentId,
versionId: options.versionId ?? null, versionId: options.versionId ?? null,
createdAt: new Date(), createdAt: new Date()
})); }));
} }
``` ```

View File

@@ -33,42 +33,37 @@ Implement the full-text search engine using SQLite's built-in FTS5 extension. Th
// src/lib/server/search/search.service.ts // src/lib/server/search/search.service.ts
export interface SnippetSearchOptions { export interface SnippetSearchOptions {
repositoryId: string; repositoryId: string;
versionId?: string; versionId?: string;
type?: 'code' | 'info'; type?: 'code' | 'info';
limit?: number; // default: 20 limit?: number; // default: 20
offset?: number; // default: 0 offset?: number; // default: 0
} }
export interface SnippetSearchResult { export interface SnippetSearchResult {
snippet: Snippet; snippet: Snippet;
score: number; // BM25 rank (negative, lower = better) score: number; // BM25 rank (negative, lower = better)
repository: Pick<Repository, 'id' | 'title'>; repository: Pick<Repository, 'id' | 'title'>;
} }
export interface LibrarySearchOptions { export interface LibrarySearchOptions {
libraryName: string; libraryName: string;
query?: string; // semantic relevance hint query?: string; // semantic relevance hint
limit?: number; // default: 10 limit?: number; // default: 10
} }
export interface LibrarySearchResult { export interface LibrarySearchResult {
repository: Repository; repository: Repository;
versions: RepositoryVersion[]; versions: RepositoryVersion[];
score: number; // composite relevance score score: number; // composite relevance score
} }
export class SearchService { export class SearchService {
constructor(private db: BetterSQLite3.Database) {} constructor(private db: BetterSQLite3.Database) {}
searchSnippets( searchSnippets(query: string, options: SnippetSearchOptions): SnippetSearchResult[];
query: string,
options: SnippetSearchOptions
): SnippetSearchResult[]
searchRepositories( searchRepositories(options: LibrarySearchOptions): LibrarySearchResult[];
options: LibrarySearchOptions
): LibrarySearchResult[]
} }
``` ```
@@ -101,21 +96,21 @@ The FTS5 MATCH query uses the porter stemmer and unicode61 tokenizer (configured
```typescript ```typescript
function preprocessQuery(raw: string): string { function preprocessQuery(raw: string): string {
// 1. Trim and normalize whitespace // 1. Trim and normalize whitespace
let q = raw.trim().replace(/\s+/g, ' '); let q = raw.trim().replace(/\s+/g, ' ');
// 2. Escape FTS5 special characters that aren't intended as operators // 2. Escape FTS5 special characters that aren't intended as operators
// Keep: * (prefix), " " (phrase), AND, OR, NOT // Keep: * (prefix), " " (phrase), AND, OR, NOT
q = q.replace(/[()]/g, ' '); q = q.replace(/[()]/g, ' ');
// 3. Add prefix wildcard to last token for "typing as you go" feel // 3. Add prefix wildcard to last token for "typing as you go" feel
const tokens = q.split(' '); const tokens = q.split(' ');
const lastToken = tokens.at(-1) ?? ''; const lastToken = tokens.at(-1) ?? '';
if (lastToken.length >= 3 && !lastToken.endsWith('*')) { if (lastToken.length >= 3 && !lastToken.endsWith('*')) {
tokens[tokens.length - 1] = lastToken + '*'; tokens[tokens.length - 1] = lastToken + '*';
} }
return tokens.join(' '); return tokens.join(' ');
} }
``` ```
@@ -174,56 +169,65 @@ searchRepositories(options: LibrarySearchOptions): LibrarySearchResult[] {
The search results must be formatted for the REST API and MCP tool responses: The search results must be formatted for the REST API and MCP tool responses:
### Library search response (for `resolve-library-id`): ### Library search response (for `resolve-library-id`):
```typescript ```typescript
function formatLibraryResults(results: LibrarySearchResult[]): string { function formatLibraryResults(results: LibrarySearchResult[]): string {
if (results.length === 0) { if (results.length === 0) {
return 'No libraries found matching your search.'; return 'No libraries found matching your search.';
} }
return results.map((r, i) => { return results
const repo = r.repository; .map((r, i) => {
const versions = r.versions.map(v => v.tag).join(', ') || 'default branch'; const repo = r.repository;
return [ const versions = r.versions.map((v) => v.tag).join(', ') || 'default branch';
`${i + 1}. ${repo.title}`, return [
` Library ID: ${repo.id}`, `${i + 1}. ${repo.title}`,
` Description: ${repo.description ?? 'No description'}`, ` Library ID: ${repo.id}`,
` Snippets: ${repo.totalSnippets} | Trust Score: ${repo.trustScore.toFixed(1)}/10`, ` Description: ${repo.description ?? 'No description'}`,
` Available Versions: ${versions}`, ` Snippets: ${repo.totalSnippets} | Trust Score: ${repo.trustScore.toFixed(1)}/10`,
].join('\n'); ` Available Versions: ${versions}`
}).join('\n\n'); ].join('\n');
})
.join('\n\n');
} }
``` ```
### Snippet search response (for `query-docs`): ### Snippet search response (for `query-docs`):
```typescript ```typescript
function formatSnippetResults( function formatSnippetResults(results: SnippetSearchResult[], rules?: string[]): string {
results: SnippetSearchResult[], const parts: string[] = [];
rules?: string[]
): string {
const parts: string[] = [];
// Prepend repository rules if present // Prepend repository rules if present
if (rules?.length) { if (rules?.length) {
parts.push('## Library Rules\n' + rules.map(r => `- ${r}`).join('\n')); parts.push('## Library Rules\n' + rules.map((r) => `- ${r}`).join('\n'));
} }
for (const { snippet } of results) { for (const { snippet } of results) {
if (snippet.type === 'code') { if (snippet.type === 'code') {
parts.push([ parts.push(
snippet.title ? `### ${snippet.title}` : '', [
snippet.breadcrumb ? `*${snippet.breadcrumb}*` : '', snippet.title ? `### ${snippet.title}` : '',
`\`\`\`${snippet.language ?? ''}\n${snippet.content}\n\`\`\``, snippet.breadcrumb ? `*${snippet.breadcrumb}*` : '',
].filter(Boolean).join('\n')); `\`\`\`${snippet.language ?? ''}\n${snippet.content}\n\`\`\``
} else { ]
parts.push([ .filter(Boolean)
snippet.title ? `### ${snippet.title}` : '', .join('\n')
snippet.breadcrumb ? `*${snippet.breadcrumb}*` : '', );
snippet.content, } else {
].filter(Boolean).join('\n')); parts.push(
} [
} snippet.title ? `### ${snippet.title}` : '',
snippet.breadcrumb ? `*${snippet.breadcrumb}*` : '',
snippet.content
]
.filter(Boolean)
.join('\n')
);
}
}
return parts.join('\n\n---\n\n'); return parts.join('\n\n---\n\n');
} }
``` ```
@@ -235,26 +239,26 @@ Compute `trustScore` (010) when a repository is first indexed:
```typescript ```typescript
function computeTrustScore(repo: Repository): number { function computeTrustScore(repo: Repository): number {
let score = 0; let score = 0;
// Stars (up to 4 points): log scale, 10k stars = 4 pts // Stars (up to 4 points): log scale, 10k stars = 4 pts
if (repo.stars) { if (repo.stars) {
score += Math.min(4, Math.log10(repo.stars + 1)); score += Math.min(4, Math.log10(repo.stars + 1));
} }
// Documentation coverage (up to 3 points) // Documentation coverage (up to 3 points)
score += Math.min(3, repo.totalSnippets / 500); score += Math.min(3, repo.totalSnippets / 500);
// Source type (1 point for GitHub, 0 for local) // Source type (1 point for GitHub, 0 for local)
if (repo.source === 'github') score += 1; if (repo.source === 'github') score += 1;
// Successful indexing (1 point) // Successful indexing (1 point)
if (repo.state === 'indexed') score += 1; if (repo.state === 'indexed') score += 1;
// Has description (1 point) // Has description (1 point)
if (repo.description) score += 1; if (repo.description) score += 1;
return Math.min(10, parseFloat(score.toFixed(1))); return Math.min(10, parseFloat(score.toFixed(1)));
} }
``` ```

View File

@@ -34,18 +34,18 @@ Implement a pluggable embedding generation system that produces vector represent
// src/lib/server/embeddings/provider.ts // src/lib/server/embeddings/provider.ts
export interface EmbeddingVector { export interface EmbeddingVector {
values: Float32Array; values: Float32Array;
dimensions: number; dimensions: number;
model: string; model: string;
} }
export interface EmbeddingProvider { export interface EmbeddingProvider {
readonly name: string; readonly name: string;
readonly dimensions: number; readonly dimensions: number;
readonly model: string; readonly model: string;
embed(texts: string[]): Promise<EmbeddingVector[]>; embed(texts: string[]): Promise<EmbeddingVector[]>;
isAvailable(): Promise<boolean>; isAvailable(): Promise<boolean>;
} }
``` ```
@@ -55,51 +55,51 @@ export interface EmbeddingProvider {
```typescript ```typescript
export interface OpenAIProviderConfig { export interface OpenAIProviderConfig {
baseUrl: string; // e.g. "https://api.openai.com/v1" or "http://localhost:11434/v1" baseUrl: string; // e.g. "https://api.openai.com/v1" or "http://localhost:11434/v1"
apiKey: string; apiKey: string;
model: string; // e.g. "text-embedding-3-small", "nomic-embed-text" model: string; // e.g. "text-embedding-3-small", "nomic-embed-text"
dimensions?: number; // override for models that support it (e.g. text-embedding-3-small) dimensions?: number; // override for models that support it (e.g. text-embedding-3-small)
maxBatchSize?: number; // default: 100 maxBatchSize?: number; // default: 100
} }
export class OpenAIEmbeddingProvider implements EmbeddingProvider { export class OpenAIEmbeddingProvider implements EmbeddingProvider {
constructor(private config: OpenAIProviderConfig) {} constructor(private config: OpenAIProviderConfig) {}
async embed(texts: string[]): Promise<EmbeddingVector[]> { async embed(texts: string[]): Promise<EmbeddingVector[]> {
// Batch into groups of maxBatchSize // Batch into groups of maxBatchSize
const batches = chunk(texts, this.config.maxBatchSize ?? 100); const batches = chunk(texts, this.config.maxBatchSize ?? 100);
const allEmbeddings: EmbeddingVector[] = []; const allEmbeddings: EmbeddingVector[] = [];
for (const batch of batches) { for (const batch of batches) {
const response = await fetch(`${this.config.baseUrl}/embeddings`, { const response = await fetch(`${this.config.baseUrl}/embeddings`, {
method: 'POST', method: 'POST',
headers: { headers: {
'Authorization': `Bearer ${this.config.apiKey}`, Authorization: `Bearer ${this.config.apiKey}`,
'Content-Type': 'application/json', 'Content-Type': 'application/json'
}, },
body: JSON.stringify({ body: JSON.stringify({
model: this.config.model, model: this.config.model,
input: batch, input: batch,
dimensions: this.config.dimensions, dimensions: this.config.dimensions
}), })
}); });
if (!response.ok) { if (!response.ok) {
throw new EmbeddingError(`API error: ${response.status}`); throw new EmbeddingError(`API error: ${response.status}`);
} }
const data = await response.json(); const data = await response.json();
for (const item of data.data) { for (const item of data.data) {
allEmbeddings.push({ allEmbeddings.push({
values: new Float32Array(item.embedding), values: new Float32Array(item.embedding),
dimensions: item.embedding.length, dimensions: item.embedding.length,
model: this.config.model, model: this.config.model
}); });
} }
} }
return allEmbeddings; return allEmbeddings;
} }
} }
``` ```
@@ -110,41 +110,41 @@ export class OpenAIEmbeddingProvider implements EmbeddingProvider {
```typescript ```typescript
// Uses @xenova/transformers — only loaded if installed // Uses @xenova/transformers — only loaded if installed
export class LocalEmbeddingProvider implements EmbeddingProvider { export class LocalEmbeddingProvider implements EmbeddingProvider {
private pipeline: unknown = null; private pipeline: unknown = null;
readonly name = 'local'; readonly name = 'local';
readonly model = 'Xenova/all-MiniLM-L6-v2'; // 384-dim, fast, small readonly model = 'Xenova/all-MiniLM-L6-v2'; // 384-dim, fast, small
readonly dimensions = 384; readonly dimensions = 384;
async embed(texts: string[]): Promise<EmbeddingVector[]> { async embed(texts: string[]): Promise<EmbeddingVector[]> {
if (!this.pipeline) { if (!this.pipeline) {
const { pipeline } = await import('@xenova/transformers'); const { pipeline } = await import('@xenova/transformers');
this.pipeline = await pipeline('feature-extraction', this.model); this.pipeline = await pipeline('feature-extraction', this.model);
} }
const results: EmbeddingVector[] = []; const results: EmbeddingVector[] = [];
for (const text of texts) { for (const text of texts) {
const output = await (this.pipeline as Function)(text, { const output = await (this.pipeline as Function)(text, {
pooling: 'mean', pooling: 'mean',
normalize: true, normalize: true
}); });
results.push({ results.push({
values: new Float32Array(output.data), values: new Float32Array(output.data),
dimensions: this.dimensions, dimensions: this.dimensions,
model: this.model, model: this.model
}); });
} }
return results; return results;
} }
async isAvailable(): Promise<boolean> { async isAvailable(): Promise<boolean> {
try { try {
await import('@xenova/transformers'); await import('@xenova/transformers');
return true; return true;
} catch { } catch {
return false; return false;
} }
} }
} }
``` ```
@@ -154,53 +154,55 @@ export class LocalEmbeddingProvider implements EmbeddingProvider {
```typescript ```typescript
export class EmbeddingService { export class EmbeddingService {
constructor( constructor(
private db: BetterSQLite3.Database, private db: BetterSQLite3.Database,
private provider: EmbeddingProvider private provider: EmbeddingProvider
) {} ) {}
async embedSnippets( async embedSnippets(
snippetIds: string[], snippetIds: string[],
onProgress?: (done: number, total: number) => void onProgress?: (done: number, total: number) => void
): Promise<void> { ): Promise<void> {
const snippets = this.db.prepare( const snippets = this.db
`SELECT id, content, type FROM snippets WHERE id IN (${snippetIds.map(() => '?').join(',')})` .prepare(
).all(...snippetIds) as Snippet[]; `SELECT id, content, type FROM snippets WHERE id IN (${snippetIds.map(() => '?').join(',')})`
)
.all(...snippetIds) as Snippet[];
// Prepare text for embedding: combine title + content // Prepare text for embedding: combine title + content
const texts = snippets.map(s => const texts = snippets.map((s) =>
[s.title, s.breadcrumb, s.content].filter(Boolean).join('\n').slice(0, 2048) [s.title, s.breadcrumb, s.content].filter(Boolean).join('\n').slice(0, 2048)
); );
const BATCH_SIZE = 50; const BATCH_SIZE = 50;
const insert = this.db.prepare(` const insert = this.db.prepare(`
INSERT OR REPLACE INTO snippet_embeddings (snippet_id, model, dimensions, embedding, created_at) INSERT OR REPLACE INTO snippet_embeddings (snippet_id, model, dimensions, embedding, created_at)
VALUES (?, ?, ?, ?, unixepoch()) VALUES (?, ?, ?, ?, unixepoch())
`); `);
for (let i = 0; i < snippets.length; i += BATCH_SIZE) { for (let i = 0; i < snippets.length; i += BATCH_SIZE) {
const batch = snippets.slice(i, i + BATCH_SIZE); const batch = snippets.slice(i, i + BATCH_SIZE);
const batchTexts = texts.slice(i, i + BATCH_SIZE); const batchTexts = texts.slice(i, i + BATCH_SIZE);
const embeddings = await this.provider.embed(batchTexts); const embeddings = await this.provider.embed(batchTexts);
const insertMany = this.db.transaction(() => { const insertMany = this.db.transaction(() => {
for (let j = 0; j < batch.length; j++) { for (let j = 0; j < batch.length; j++) {
const snippet = batch[j]; const snippet = batch[j];
const embedding = embeddings[j]; const embedding = embeddings[j];
insert.run( insert.run(
snippet.id, snippet.id,
embedding.model, embedding.model,
embedding.dimensions, embedding.dimensions,
Buffer.from(embedding.values.buffer) Buffer.from(embedding.values.buffer)
); );
} }
}); });
insertMany(); insertMany();
onProgress?.(Math.min(i + BATCH_SIZE, snippets.length), snippets.length); onProgress?.(Math.min(i + BATCH_SIZE, snippets.length), snippets.length);
} }
} }
} }
``` ```
@@ -212,13 +214,13 @@ Stored in the `settings` table as JSON:
```typescript ```typescript
export interface EmbeddingConfig { export interface EmbeddingConfig {
provider: 'openai' | 'local' | 'none'; provider: 'openai' | 'local' | 'none';
openai?: { openai?: {
baseUrl: string; baseUrl: string;
apiKey: string; apiKey: string;
model: string; model: string;
dimensions?: number; dimensions?: number;
}; };
} }
// Settings key: 'embedding_config' // Settings key: 'embedding_config'
@@ -227,14 +229,15 @@ export interface EmbeddingConfig {
### API Endpoints ### API Endpoints
`GET /api/v1/settings/embedding` `GET /api/v1/settings/embedding`
```json ```json
{ {
"provider": "openai", "provider": "openai",
"openai": { "openai": {
"baseUrl": "https://api.openai.com/v1", "baseUrl": "https://api.openai.com/v1",
"model": "text-embedding-3-small", "model": "text-embedding-3-small",
"dimensions": 1536 "dimensions": 1536
} }
} }
``` ```
@@ -251,11 +254,7 @@ Embeddings are stored as raw `Float32Array` binary blobs:
const buffer = Buffer.from(float32Array.buffer); const buffer = Buffer.from(float32Array.buffer);
// Retrieve // Retrieve
const float32Array = new Float32Array( const float32Array = new Float32Array(buffer.buffer, buffer.byteOffset, buffer.byteLength / 4);
buffer.buffer,
buffer.byteOffset,
buffer.byteLength / 4
);
``` ```
--- ---

View File

@@ -102,21 +102,21 @@ async vectorSearch(
```typescript ```typescript
function reciprocalRankFusion( function reciprocalRankFusion(
...rankings: Array<Array<{ id: string; score: number }>> ...rankings: Array<Array<{ id: string; score: number }>>
): Array<{ id: string; rrfScore: number }> { ): Array<{ id: string; rrfScore: number }> {
const K = 60; // RRF constant (standard value) const K = 60; // RRF constant (standard value)
const scores = new Map<string, number>(); const scores = new Map<string, number>();
for (const ranking of rankings) { for (const ranking of rankings) {
ranking.forEach(({ id }, rank) => { ranking.forEach(({ id }, rank) => {
const current = scores.get(id) ?? 0; const current = scores.get(id) ?? 0;
scores.set(id, current + 1 / (K + rank + 1)); scores.set(id, current + 1 / (K + rank + 1));
}); });
} }
return Array.from(scores.entries()) return Array.from(scores.entries())
.map(([id, rrfScore]) => ({ id, rrfScore })) .map(([id, rrfScore]) => ({ id, rrfScore }))
.sort((a, b) => b.rrfScore - a.rrfScore); .sort((a, b) => b.rrfScore - a.rrfScore);
} }
``` ```
@@ -126,65 +126,62 @@ function reciprocalRankFusion(
```typescript ```typescript
export interface HybridSearchOptions { export interface HybridSearchOptions {
repositoryId: string; repositoryId: string;
versionId?: string; versionId?: string;
type?: 'code' | 'info'; type?: 'code' | 'info';
limit?: number; limit?: number;
alpha?: number; // 0.0 = FTS5 only, 1.0 = vector only, 0.5 = balanced alpha?: number; // 0.0 = FTS5 only, 1.0 = vector only, 0.5 = balanced
} }
export class HybridSearchService { export class HybridSearchService {
constructor( constructor(
private db: BetterSQLite3.Database, private db: BetterSQLite3.Database,
private searchService: SearchService, private searchService: SearchService,
private embeddingProvider: EmbeddingProvider | null, private embeddingProvider: EmbeddingProvider | null
) {} ) {}
async search( async search(query: string, options: HybridSearchOptions): Promise<SnippetSearchResult[]> {
query: string, const limit = options.limit ?? 20;
options: HybridSearchOptions const alpha = options.alpha ?? 0.5;
): Promise<SnippetSearchResult[]> {
const limit = options.limit ?? 20;
const alpha = options.alpha ?? 0.5;
// Always run FTS5 search // Always run FTS5 search
const ftsResults = this.searchService.searchSnippets(query, { const ftsResults = this.searchService.searchSnippets(query, {
repositoryId: options.repositoryId, repositoryId: options.repositoryId,
versionId: options.versionId, versionId: options.versionId,
type: options.type, type: options.type,
limit: limit * 3, // get more candidates for fusion limit: limit * 3 // get more candidates for fusion
}); });
// If no embedding provider or alpha = 0, return FTS5 results directly // If no embedding provider or alpha = 0, return FTS5 results directly
if (!this.embeddingProvider || alpha === 0) { if (!this.embeddingProvider || alpha === 0) {
return ftsResults.slice(0, limit); return ftsResults.slice(0, limit);
} }
// Embed the query and run vector search // Embed the query and run vector search
const [queryEmbedding] = await this.embeddingProvider.embed([query]); const [queryEmbedding] = await this.embeddingProvider.embed([query]);
const vectorResults = await this.vectorSearch( const vectorResults = await this.vectorSearch(
queryEmbedding.values, queryEmbedding.values,
options.repositoryId, options.repositoryId,
limit * 3 limit * 3
); );
// Normalize result lists for RRF // Normalize result lists for RRF
const ftsRanked = ftsResults.map((r, i) => ({ const ftsRanked = ftsResults.map((r, i) => ({
id: r.snippet.id, id: r.snippet.id,
score: i, score: i
})); }));
const vecRanked = vectorResults.map((r, i) => ({ const vecRanked = vectorResults.map((r, i) => ({
id: r.snippetId, id: r.snippetId,
score: i, score: i
})); }));
// Apply RRF // Apply RRF
const fused = reciprocalRankFusion(ftsRanked, vecRanked); const fused = reciprocalRankFusion(ftsRanked, vecRanked);
// Fetch full snippet data for top results // Fetch full snippet data for top results
const topIds = fused.slice(0, limit).map(r => r.id); const topIds = fused.slice(0, limit).map((r) => r.id);
return this.fetchSnippetsByIds(topIds, options.repositoryId); return this.fetchSnippetsByIds(topIds, options.repositoryId);
} }
} }
``` ```
@@ -197,9 +194,9 @@ The hybrid search alpha value can be set per-request or globally via settings:
```typescript ```typescript
// Default config stored in settings table under key 'search_config' // Default config stored in settings table under key 'search_config'
export interface SearchConfig { export interface SearchConfig {
alpha: number; // 0.5 default alpha: number; // 0.5 default
maxResults: number; // 20 default maxResults: number; // 20 default
enableHybrid: boolean; // true if embedding provider is configured enableHybrid: boolean; // true if embedding provider is configured
} }
``` ```

View File

@@ -56,75 +56,83 @@ Implement the end-to-end indexing pipeline that orchestrates crawling, parsing,
// src/lib/server/pipeline/job-queue.ts // src/lib/server/pipeline/job-queue.ts
export class JobQueue { export class JobQueue {
private isRunning = false; private isRunning = false;
constructor(private db: BetterSQLite3.Database) {} constructor(private db: BetterSQLite3.Database) {}
enqueue(repositoryId: string, versionId?: string): IndexingJob { enqueue(repositoryId: string, versionId?: string): IndexingJob {
const job: NewIndexingJob = { const job: NewIndexingJob = {
id: crypto.randomUUID(), id: crypto.randomUUID(),
repositoryId, repositoryId,
versionId: versionId ?? null, versionId: versionId ?? null,
status: 'queued', status: 'queued',
progress: 0, progress: 0,
totalFiles: 0, totalFiles: 0,
processedFiles: 0, processedFiles: 0,
error: null, error: null,
startedAt: null, startedAt: null,
completedAt: null, completedAt: null,
createdAt: new Date(), createdAt: new Date()
}; };
this.db.prepare(` this.db
.prepare(
`
INSERT INTO indexing_jobs VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) INSERT INTO indexing_jobs VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
`).run(...Object.values(job)); `
)
.run(...Object.values(job));
// Kick off processing if not already running // Kick off processing if not already running
if (!this.isRunning) { if (!this.isRunning) {
setImmediate(() => this.processNext()); setImmediate(() => this.processNext());
} }
return job; return job;
} }
private async processNext(): Promise<void> { private async processNext(): Promise<void> {
if (this.isRunning) return; if (this.isRunning) return;
const job = this.db.prepare(` const job = this.db
.prepare(
`
SELECT * FROM indexing_jobs SELECT * FROM indexing_jobs
WHERE status = 'queued' WHERE status = 'queued'
ORDER BY created_at ASC ORDER BY created_at ASC
LIMIT 1 LIMIT 1
`).get() as IndexingJob | undefined; `
)
.get() as IndexingJob | undefined;
if (!job) return; if (!job) return;
this.isRunning = true; this.isRunning = true;
try { try {
await this.pipeline.run(job); await this.pipeline.run(job);
} finally { } finally {
this.isRunning = false; this.isRunning = false;
// Check for next queued job // Check for next queued job
const nextJob = this.db.prepare( const nextJob = this.db
`SELECT id FROM indexing_jobs WHERE status = 'queued' LIMIT 1` .prepare(`SELECT id FROM indexing_jobs WHERE status = 'queued' LIMIT 1`)
).get(); .get();
if (nextJob) setImmediate(() => this.processNext()); if (nextJob) setImmediate(() => this.processNext());
} }
} }
getJob(id: string): IndexingJob | null { getJob(id: string): IndexingJob | null {
return this.db.prepare( return this.db
`SELECT * FROM indexing_jobs WHERE id = ?` .prepare(`SELECT * FROM indexing_jobs WHERE id = ?`)
).get(id) as IndexingJob | null; .get(id) as IndexingJob | null;
} }
listJobs(repositoryId?: string, limit = 20): IndexingJob[] { listJobs(repositoryId?: string, limit = 20): IndexingJob[] {
const query = repositoryId const query = repositoryId
? `SELECT * FROM indexing_jobs WHERE repository_id = ? ORDER BY created_at DESC LIMIT ?` ? `SELECT * FROM indexing_jobs WHERE repository_id = ? ORDER BY created_at DESC LIMIT ?`
: `SELECT * FROM indexing_jobs ORDER BY created_at DESC LIMIT ?`; : `SELECT * FROM indexing_jobs ORDER BY created_at DESC LIMIT ?`;
const params = repositoryId ? [repositoryId, limit] : [limit]; const params = repositoryId ? [repositoryId, limit] : [limit];
return this.db.prepare(query).all(...params) as IndexingJob[]; return this.db.prepare(query).all(...params) as IndexingJob[];
} }
} }
``` ```
@@ -136,94 +144,96 @@ export class JobQueue {
// src/lib/server/pipeline/indexing.pipeline.ts // src/lib/server/pipeline/indexing.pipeline.ts
export class IndexingPipeline { export class IndexingPipeline {
constructor( constructor(
private db: BetterSQLite3.Database, private db: BetterSQLite3.Database,
private githubCrawler: GitHubCrawler, private githubCrawler: GitHubCrawler,
private localCrawler: LocalCrawler, private localCrawler: LocalCrawler,
private embeddingService: EmbeddingService | null, private embeddingService: EmbeddingService | null
) {} ) {}
async run(job: IndexingJob): Promise<void> { async run(job: IndexingJob): Promise<void> {
this.updateJob(job.id, { status: 'running', startedAt: new Date() }); this.updateJob(job.id, { status: 'running', startedAt: new Date() });
try { try {
const repo = this.getRepository(job.repositoryId); const repo = this.getRepository(job.repositoryId);
if (!repo) throw new Error(`Repository ${job.repositoryId} not found`); if (!repo) throw new Error(`Repository ${job.repositoryId} not found`);
// Update repo state // Update repo state
this.updateRepo(repo.id, { state: 'indexing' }); this.updateRepo(repo.id, { state: 'indexing' });
// Step 1: Crawl // Step 1: Crawl
const crawlResult = await this.crawl(repo, job); const crawlResult = await this.crawl(repo, job);
// Step 2: Parse and diff // Step 2: Parse and diff
const { newSnippets, changedDocIds, newDocuments } = const { newSnippets, changedDocIds, newDocuments } = await this.parseAndDiff(
await this.parseAndDiff(crawlResult, repo, job); crawlResult,
repo,
job
);
// Step 3: Atomic replacement // Step 3: Atomic replacement
this.replaceSnippets(repo.id, changedDocIds, newDocuments, newSnippets); this.replaceSnippets(repo.id, changedDocIds, newDocuments, newSnippets);
// Step 4: Embeddings (async, non-blocking for job completion) // Step 4: Embeddings (async, non-blocking for job completion)
if (this.embeddingService && newSnippets.length > 0) { if (this.embeddingService && newSnippets.length > 0) {
await this.embeddingService.embedSnippets( await this.embeddingService.embedSnippets(
newSnippets.map(s => s.id), newSnippets.map((s) => s.id),
(done, total) => { (done, total) => {
// Update job progress for embedding phase // Update job progress for embedding phase
} }
); );
} }
// Step 5: Update repo stats // Step 5: Update repo stats
const stats = this.computeStats(repo.id); const stats = this.computeStats(repo.id);
this.updateRepo(repo.id, { this.updateRepo(repo.id, {
state: 'indexed', state: 'indexed',
totalSnippets: stats.totalSnippets, totalSnippets: stats.totalSnippets,
totalTokens: stats.totalTokens, totalTokens: stats.totalTokens,
trustScore: computeTrustScore({ ...repo, ...stats }), trustScore: computeTrustScore({ ...repo, ...stats }),
lastIndexedAt: new Date(), lastIndexedAt: new Date()
}); });
this.updateJob(job.id, { this.updateJob(job.id, {
status: 'done', status: 'done',
progress: 100, progress: 100,
completedAt: new Date(), completedAt: new Date()
}); });
} catch (error) {
this.updateJob(job.id, {
status: 'failed',
error: (error as Error).message,
completedAt: new Date()
});
this.updateRepo(job.repositoryId, { state: 'error' });
throw error;
}
}
} catch (error) { private replaceSnippets(
this.updateJob(job.id, { repositoryId: string,
status: 'failed', changedDocIds: string[],
error: (error as Error).message, newDocuments: NewDocument[],
completedAt: new Date(), newSnippets: NewSnippet[]
}); ): void {
this.updateRepo(job.repositoryId, { state: 'error' }); // Single transaction: delete old → insert new
throw error; this.db.transaction(() => {
} if (changedDocIds.length > 0) {
} // Cascade deletes snippets via FK constraint
this.db
.prepare(`DELETE FROM documents WHERE id IN (${changedDocIds.map(() => '?').join(',')})`)
.run(...changedDocIds);
}
private replaceSnippets( for (const doc of newDocuments) {
repositoryId: string, this.insertDocument(doc);
changedDocIds: string[], }
newDocuments: NewDocument[],
newSnippets: NewSnippet[]
): void {
// Single transaction: delete old → insert new
this.db.transaction(() => {
if (changedDocIds.length > 0) {
// Cascade deletes snippets via FK constraint
this.db.prepare(
`DELETE FROM documents WHERE id IN (${changedDocIds.map(() => '?').join(',')})`
).run(...changedDocIds);
}
for (const doc of newDocuments) { for (const snippet of newSnippets) {
this.insertDocument(doc); this.insertSnippet(snippet);
} }
})();
for (const snippet of newSnippets) { }
this.insertSnippet(snippet);
}
})();
}
} }
``` ```
@@ -233,26 +243,24 @@ export class IndexingPipeline {
```typescript ```typescript
function calculateProgress( function calculateProgress(
processedFiles: number, processedFiles: number,
totalFiles: number, totalFiles: number,
embeddingsDone: number, embeddingsDone: number,
embeddingsTotal: number, embeddingsTotal: number,
hasEmbeddings: boolean hasEmbeddings: boolean
): number { ): number {
if (totalFiles === 0) return 0; if (totalFiles === 0) return 0;
if (!hasEmbeddings) { if (!hasEmbeddings) {
// Crawl + parse = 100% // Crawl + parse = 100%
return Math.round((processedFiles / totalFiles) * 100); return Math.round((processedFiles / totalFiles) * 100);
} }
// Crawl+parse = 80%, embeddings = 20% // Crawl+parse = 80%, embeddings = 20%
const parseProgress = (processedFiles / totalFiles) * 80; const parseProgress = (processedFiles / totalFiles) * 80;
const embedProgress = embeddingsTotal > 0 const embedProgress = embeddingsTotal > 0 ? (embeddingsDone / embeddingsTotal) * 20 : 0;
? (embeddingsDone / embeddingsTotal) * 20
: 0;
return Math.round(parseProgress + embedProgress); return Math.round(parseProgress + embedProgress);
} }
``` ```
@@ -263,20 +271,21 @@ function calculateProgress(
### `GET /api/v1/jobs/:id` ### `GET /api/v1/jobs/:id`
Response `200`: Response `200`:
```json ```json
{ {
"job": { "job": {
"id": "uuid", "id": "uuid",
"repositoryId": "/facebook/react", "repositoryId": "/facebook/react",
"status": "running", "status": "running",
"progress": 47, "progress": 47,
"totalFiles": 342, "totalFiles": 342,
"processedFiles": 162, "processedFiles": 162,
"error": null, "error": null,
"startedAt": "2026-03-22T10:00:00Z", "startedAt": "2026-03-22T10:00:00Z",
"completedAt": null, "completedAt": null,
"createdAt": "2026-03-22T09:59:55Z" "createdAt": "2026-03-22T09:59:55Z"
} }
} }
``` ```
@@ -285,6 +294,7 @@ Response `200`:
Query params: `repositoryId` (optional), `status` (optional), `limit` (default 20). Query params: `repositoryId` (optional), `status` (optional), `limit` (default 20).
Response `200`: Response `200`:
```json ```json
{ {
"jobs": [...], "jobs": [...],
@@ -300,20 +310,24 @@ On application start, mark any jobs in `running` state as `failed` (they were in
```typescript ```typescript
function recoverStaleJobs(db: BetterSQLite3.Database): void { function recoverStaleJobs(db: BetterSQLite3.Database): void {
db.prepare(` db.prepare(
`
UPDATE indexing_jobs UPDATE indexing_jobs
SET status = 'failed', SET status = 'failed',
error = 'Server restarted while job was running', error = 'Server restarted while job was running',
completed_at = unixepoch() completed_at = unixepoch()
WHERE status = 'running' WHERE status = 'running'
`).run(); `
).run();
// Also reset any repositories stuck in 'indexing' state // Also reset any repositories stuck in 'indexing' state
db.prepare(` db.prepare(
`
UPDATE repositories UPDATE repositories
SET state = 'error' SET state = 'error'
WHERE state = 'indexing' WHERE state = 'indexing'
`).run(); `
).run();
} }
``` ```

View File

@@ -32,33 +32,33 @@ Implement the public-facing REST API endpoints that replicate context7's `/api/v
### Query Parameters ### Query Parameters
| Parameter | Type | Required | Description | | Parameter | Type | Required | Description |
|-----------|------|----------|-------------| | ------------- | ------- | -------- | ------------------------------------- |
| `libraryName` | string | Yes | Library name to search for | | `libraryName` | string | Yes | Library name to search for |
| `query` | string | No | User's question for relevance ranking | | `query` | string | No | User's question for relevance ranking |
| `limit` | integer | No | Max results (default: 10, max: 50) | | `limit` | integer | No | Max results (default: 10, max: 50) |
### Response `200` (`type=json`, default): ### Response `200` (`type=json`, default):
```json ```json
{ {
"results": [ "results": [
{ {
"id": "/facebook/react", "id": "/facebook/react",
"title": "React", "title": "React",
"description": "A JavaScript library for building user interfaces", "description": "A JavaScript library for building user interfaces",
"branch": "main", "branch": "main",
"lastUpdateDate": "2026-03-22T10:00:00Z", "lastUpdateDate": "2026-03-22T10:00:00Z",
"state": "finalized", "state": "finalized",
"totalTokens": 142000, "totalTokens": 142000,
"totalSnippets": 1247, "totalSnippets": 1247,
"stars": 228000, "stars": 228000,
"trustScore": 9.2, "trustScore": 9.2,
"benchmarkScore": 87, "benchmarkScore": 87,
"versions": ["v18.3.0", "v17.0.2"], "versions": ["v18.3.0", "v17.0.2"],
"source": "https://github.com/facebook/react" "source": "https://github.com/facebook/react"
} }
] ]
} }
``` ```
@@ -67,11 +67,11 @@ Note: `state: "finalized"` maps from TrueRef's `state: "indexed"` for compatibil
### State Mapping ### State Mapping
| TrueRef state | context7 state | | TrueRef state | context7 state |
|---------------|---------------| | ------------- | -------------- |
| `pending` | `initial` | | `pending` | `initial` |
| `indexing` | `initial` | | `indexing` | `initial` |
| `indexed` | `finalized` | | `indexed` | `finalized` |
| `error` | `error` | | `error` | `error` |
--- ---
@@ -81,43 +81,43 @@ Note: `state: "finalized"` maps from TrueRef's `state: "indexed"` for compatibil
### Query Parameters ### Query Parameters
| Parameter | Type | Required | Description | | Parameter | Type | Required | Description |
|-----------|------|----------|-------------| | ----------- | ------- | -------- | --------------------------------------------------------------- |
| `libraryId` | string | Yes | Library ID, e.g. `/facebook/react` or `/facebook/react/v18.3.0` | | `libraryId` | string | Yes | Library ID, e.g. `/facebook/react` or `/facebook/react/v18.3.0` |
| `query` | string | Yes | Specific question about the library | | `query` | string | Yes | Specific question about the library |
| `type` | string | No | `json` (default) or `txt` (plain text for LLM injection) | | `type` | string | No | `json` (default) or `txt` (plain text for LLM injection) |
| `tokens` | integer | No | Approximate max token count for response (default: 10000) | | `tokens` | integer | No | Approximate max token count for response (default: 10000) |
### Response `200` (`type=json`): ### Response `200` (`type=json`):
```json ```json
{ {
"snippets": [ "snippets": [
{ {
"type": "code", "type": "code",
"title": "Basic Component", "title": "Basic Component",
"description": "Getting Started > Components", "description": "Getting Started > Components",
"language": "tsx", "language": "tsx",
"codeList": [ "codeList": [
{ {
"language": "tsx", "language": "tsx",
"code": "function MyComponent() {\n return <div>Hello</div>;\n}" "code": "function MyComponent() {\n return <div>Hello</div>;\n}"
} }
], ],
"id": "uuid", "id": "uuid",
"tokenCount": 45, "tokenCount": 45,
"pageTitle": "Getting Started" "pageTitle": "Getting Started"
}, },
{ {
"type": "info", "type": "info",
"text": "React components let you split the UI into independent...", "text": "React components let you split the UI into independent...",
"breadcrumb": "Core Concepts > Components", "breadcrumb": "Core Concepts > Components",
"pageId": "uuid", "pageId": "uuid",
"tokenCount": 120 "tokenCount": 120
} }
], ],
"rules": ["Always use functional components", "..."], "rules": ["Always use functional components", "..."],
"totalTokens": 2840 "totalTokens": 2840
} }
``` ```
@@ -125,7 +125,7 @@ Note: `state: "finalized"` maps from TrueRef's `state: "indexed"` for compatibil
Plain text formatted for direct LLM context injection: Plain text formatted for direct LLM context injection:
``` ````
## Library Rules ## Library Rules
- Always use functional components - Always use functional components
- Use hooks for state management - Use hooks for state management
@@ -139,15 +139,17 @@ Plain text formatted for direct LLM context injection:
function MyComponent() { function MyComponent() {
return <div>Hello</div>; return <div>Hello</div>;
} }
``` ````
--- ---
### React components let you split the UI... ### React components let you split the UI...
*Core Concepts > Components*
_Core Concepts > Components_
React components let you split the UI into independent, reusable pieces... React components let you split the UI into independent, reusable pieces...
```
````
--- ---
@@ -167,7 +169,7 @@ function parseLibraryId(libraryId: string): {
version: match[3], version: match[3],
}; };
} }
``` ````
--- ---
@@ -176,20 +178,17 @@ function parseLibraryId(libraryId: string): {
The `tokens` parameter limits the total response size. Snippets are added greedily until the budget is exhausted: The `tokens` parameter limits the total response size. Snippets are added greedily until the budget is exhausted:
```typescript ```typescript
function selectSnippetsWithinBudget( function selectSnippetsWithinBudget(snippets: Snippet[], maxTokens: number): Snippet[] {
snippets: Snippet[], const selected: Snippet[] = [];
maxTokens: number let usedTokens = 0;
): Snippet[] {
const selected: Snippet[] = [];
let usedTokens = 0;
for (const snippet of snippets) { for (const snippet of snippets) {
if (usedTokens + (snippet.tokenCount ?? 0) > maxTokens) break; if (usedTokens + (snippet.tokenCount ?? 0) > maxTokens) break;
selected.push(snippet); selected.push(snippet);
usedTokens += snippet.tokenCount ?? 0; usedTokens += snippet.tokenCount ?? 0;
} }
return selected; return selected;
} }
``` ```
@@ -215,6 +214,7 @@ Default token budget: 10,000 tokens (~7,500 words) — enough for ~20 medium sni
## CORS Configuration ## CORS Configuration
All API routes include: All API routes include:
``` ```
Access-Control-Allow-Origin: * Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, PATCH, DELETE, OPTIONS Access-Control-Allow-Methods: GET, POST, PATCH, DELETE, OPTIONS

View File

@@ -32,8 +32,8 @@ Implement a Model Context Protocol (MCP) server that exposes `resolve-library-id
```json ```json
{ {
"@modelcontextprotocol/sdk": "^1.25.1", "@modelcontextprotocol/sdk": "^1.25.1",
"zod": "^4.3.4" "zod": "^4.3.4"
} }
``` ```
@@ -46,189 +46,190 @@ Implement a Model Context Protocol (MCP) server that exposes `resolve-library-id
import { Server } from '@modelcontextprotocol/sdk/server/index.js'; import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'; import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { import { CallToolRequestSchema, ListToolsRequestSchema } from '@modelcontextprotocol/sdk/types.js';
CallToolRequestSchema,
ListToolsRequestSchema,
} from '@modelcontextprotocol/sdk/types.js';
import { z } from 'zod'; import { z } from 'zod';
const API_BASE = process.env.TRUEREF_API_URL ?? 'http://localhost:5173'; const API_BASE = process.env.TRUEREF_API_URL ?? 'http://localhost:5173';
const server = new Server( const server = new Server(
{ {
name: 'io.github.trueref/trueref', name: 'io.github.trueref/trueref',
version: '1.0.0', version: '1.0.0'
}, },
{ {
capabilities: { tools: {} }, capabilities: { tools: {} }
} }
); );
// Tool schemas — identical to context7 for drop-in compatibility // Tool schemas — identical to context7 for drop-in compatibility
const ResolveLibraryIdSchema = z.object({ const ResolveLibraryIdSchema = z.object({
libraryName: z.string().describe( libraryName: z
'Library name to search for and resolve to a TrueRef library ID' .string()
), .describe('Library name to search for and resolve to a TrueRef library ID'),
query: z.string().describe( query: z.string().describe("The user's question or context to help rank results")
"The user's question or context to help rank results"
),
}); });
const QueryDocsSchema = z.object({ const QueryDocsSchema = z.object({
libraryId: z.string().describe( libraryId: z
'The TrueRef library ID obtained from resolve-library-id, e.g. /facebook/react' .string()
), .describe('The TrueRef library ID obtained from resolve-library-id, e.g. /facebook/react'),
query: z.string().describe( query: z
'Specific question about the library to retrieve relevant documentation' .string()
), .describe('Specific question about the library to retrieve relevant documentation'),
tokens: z.number().optional().describe( tokens: z.number().optional().describe('Maximum token budget for the response (default: 10000)')
'Maximum token budget for the response (default: 10000)'
),
}); });
server.setRequestHandler(ListToolsRequestSchema, async () => ({ server.setRequestHandler(ListToolsRequestSchema, async () => ({
tools: [ tools: [
{ {
name: 'resolve-library-id', name: 'resolve-library-id',
description: [ description: [
'Searches TrueRef to find a library matching the given name.', 'Searches TrueRef to find a library matching the given name.',
'Returns a list of matching libraries with their IDs.', 'Returns a list of matching libraries with their IDs.',
'ALWAYS call this tool before query-docs to get the correct library ID.', 'ALWAYS call this tool before query-docs to get the correct library ID.',
'Call at most 3 times per user question.', 'Call at most 3 times per user question.'
].join(' '), ].join(' '),
inputSchema: { inputSchema: {
type: 'object', type: 'object',
properties: { properties: {
libraryName: { libraryName: {
type: 'string', type: 'string',
description: 'Library name to search for', description: 'Library name to search for'
}, },
query: { query: {
type: 'string', type: 'string',
description: "User's question for relevance ranking", description: "User's question for relevance ranking"
}, }
}, },
required: ['libraryName', 'query'], required: ['libraryName', 'query']
}, }
}, },
{ {
name: 'query-docs', name: 'query-docs',
description: [ description: [
'Fetches documentation and code examples from TrueRef for a specific library.', 'Fetches documentation and code examples from TrueRef for a specific library.',
'Requires a library ID obtained from resolve-library-id.', 'Requires a library ID obtained from resolve-library-id.',
'Returns relevant snippets formatted for LLM consumption.', 'Returns relevant snippets formatted for LLM consumption.',
'Call at most 3 times per user question.', 'Call at most 3 times per user question.'
].join(' '), ].join(' '),
inputSchema: { inputSchema: {
type: 'object', type: 'object',
properties: { properties: {
libraryId: { libraryId: {
type: 'string', type: 'string',
description: 'TrueRef library ID, e.g. /facebook/react', description: 'TrueRef library ID, e.g. /facebook/react'
}, },
query: { query: {
type: 'string', type: 'string',
description: 'Specific question about the library', description: 'Specific question about the library'
}, },
tokens: { tokens: {
type: 'number', type: 'number',
description: 'Max token budget (default: 10000)', description: 'Max token budget (default: 10000)'
}, }
}, },
required: ['libraryId', 'query'], required: ['libraryId', 'query']
}, }
}, }
], ]
})); }));
server.setRequestHandler(CallToolRequestSchema, async (request) => { server.setRequestHandler(CallToolRequestSchema, async (request) => {
const { name, arguments: args } = request.params; const { name, arguments: args } = request.params;
if (name === 'resolve-library-id') { if (name === 'resolve-library-id') {
const { libraryName, query } = ResolveLibraryIdSchema.parse(args); const { libraryName, query } = ResolveLibraryIdSchema.parse(args);
const url = new URL(`${API_BASE}/api/v1/libs/search`); const url = new URL(`${API_BASE}/api/v1/libs/search`);
url.searchParams.set('libraryName', libraryName); url.searchParams.set('libraryName', libraryName);
url.searchParams.set('query', query); url.searchParams.set('query', query);
url.searchParams.set('type', 'txt'); url.searchParams.set('type', 'txt');
const response = await fetch(url.toString()); const response = await fetch(url.toString());
if (!response.ok) { if (!response.ok) {
return { return {
content: [{ content: [
type: 'text', {
text: `Error searching libraries: ${response.status} ${response.statusText}`, type: 'text',
}], text: `Error searching libraries: ${response.status} ${response.statusText}`
isError: true, }
}; ],
} isError: true
};
}
const text = await response.text(); const text = await response.text();
return { return {
content: [{ type: 'text', text }], content: [{ type: 'text', text }]
}; };
} }
if (name === 'query-docs') { if (name === 'query-docs') {
const { libraryId, query, tokens } = QueryDocsSchema.parse(args); const { libraryId, query, tokens } = QueryDocsSchema.parse(args);
const url = new URL(`${API_BASE}/api/v1/context`); const url = new URL(`${API_BASE}/api/v1/context`);
url.searchParams.set('libraryId', libraryId); url.searchParams.set('libraryId', libraryId);
url.searchParams.set('query', query); url.searchParams.set('query', query);
url.searchParams.set('type', 'txt'); url.searchParams.set('type', 'txt');
if (tokens) url.searchParams.set('tokens', String(tokens)); if (tokens) url.searchParams.set('tokens', String(tokens));
const response = await fetch(url.toString()); const response = await fetch(url.toString());
if (!response.ok) { if (!response.ok) {
const status = response.status; const status = response.status;
if (status === 404) { if (status === 404) {
return { return {
content: [{ content: [
type: 'text', {
text: `Library "${libraryId}" not found. Please run resolve-library-id first.`, type: 'text',
}], text: `Library "${libraryId}" not found. Please run resolve-library-id first.`
isError: true, }
}; ],
} isError: true
if (status === 503) { };
return { }
content: [{ if (status === 503) {
type: 'text', return {
text: `Library "${libraryId}" is currently being indexed. Please try again in a moment.`, content: [
}], {
isError: true, type: 'text',
}; text: `Library "${libraryId}" is currently being indexed. Please try again in a moment.`
} }
return { ],
content: [{ isError: true
type: 'text', };
text: `Error fetching documentation: ${response.status} ${response.statusText}`, }
}], return {
isError: true, content: [
}; {
} type: 'text',
text: `Error fetching documentation: ${response.status} ${response.statusText}`
}
],
isError: true
};
}
const text = await response.text(); const text = await response.text();
return { return {
content: [{ type: 'text', text }], content: [{ type: 'text', text }]
}; };
} }
return { return {
content: [{ type: 'text', text: `Unknown tool: ${name}` }], content: [{ type: 'text', text: `Unknown tool: ${name}` }],
isError: true, isError: true
}; };
}); });
async function main() { async function main() {
const transport = new StdioServerTransport(); const transport = new StdioServerTransport();
await server.connect(transport); await server.connect(transport);
// Server runs until process exits // Server runs until process exits
} }
main().catch((err) => { main().catch((err) => {
process.stderr.write(`MCP server error: ${err.message}\n`); process.stderr.write(`MCP server error: ${err.message}\n`);
process.exit(1); process.exit(1);
}); });
``` ```
@@ -238,18 +239,19 @@ main().catch((err) => {
```json ```json
{ {
"scripts": { "scripts": {
"mcp:start": "node --experimental-vm-modules src/mcp/index.ts" "mcp:start": "node --experimental-vm-modules src/mcp/index.ts"
} }
} }
``` ```
Or with `tsx` for TypeScript-direct execution: Or with `tsx` for TypeScript-direct execution:
```json ```json
{ {
"scripts": { "scripts": {
"mcp:start": "tsx src/mcp/index.ts" "mcp:start": "tsx src/mcp/index.ts"
} }
} }
``` ```
@@ -261,30 +263,31 @@ Users add to `.mcp.json`:
```json ```json
{ {
"mcpServers": { "mcpServers": {
"trueref": { "trueref": {
"command": "node", "command": "node",
"args": ["/path/to/trueref/dist/mcp/index.js"], "args": ["/path/to/trueref/dist/mcp/index.js"],
"env": { "env": {
"TRUEREF_API_URL": "http://localhost:5173" "TRUEREF_API_URL": "http://localhost:5173"
} }
} }
} }
} }
``` ```
Or with tsx for development: Or with tsx for development:
```json ```json
{ {
"mcpServers": { "mcpServers": {
"trueref": { "trueref": {
"command": "npx", "command": "npx",
"args": ["tsx", "/path/to/trueref/src/mcp/index.ts"], "args": ["tsx", "/path/to/trueref/src/mcp/index.ts"],
"env": { "env": {
"TRUEREF_API_URL": "http://localhost:5173" "TRUEREF_API_URL": "http://localhost:5173"
} }
} }
} }
} }
``` ```
@@ -295,13 +298,15 @@ Or with tsx for development:
The MCP server should include a `resources` list item (optional) or the library responses themselves prepend rules. Additionally, users should add a Claude rule file: The MCP server should include a `resources` list item (optional) or the library responses themselves prepend rules. Additionally, users should add a Claude rule file:
```markdown ```markdown
<!-- .claude/rules/trueref.md --> ## <!-- .claude/rules/trueref.md -->
---
description: Use TrueRef to retrieve documentation for indexed libraries description: Use TrueRef to retrieve documentation for indexed libraries
alwaysApply: true alwaysApply: true
--- ---
When answering questions about indexed libraries, always use the TrueRef MCP tools: When answering questions about indexed libraries, always use the TrueRef MCP tools:
1. Call `resolve-library-id` with the library name and the user's question to get the library ID 1. Call `resolve-library-id` with the library name and the user's question to get the library ID
2. Call `query-docs` with the library ID and question to retrieve relevant documentation 2. Call `query-docs` with the library ID and question to retrieve relevant documentation
3. Use the returned documentation to answer the question accurately 3. Use the returned documentation to answer the question accurately

View File

@@ -50,64 +50,64 @@ import { StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server/
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'; import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
const { values: args } = parseArgs({ const { values: args } = parseArgs({
options: { options: {
transport: { type: 'string', default: 'stdio' }, transport: { type: 'string', default: 'stdio' },
port: { type: 'string', default: process.env.PORT ?? '3001' }, port: { type: 'string', default: process.env.PORT ?? '3001' }
}, }
}); });
async function startHttp(server: Server, port: number): Promise<void> { async function startHttp(server: Server, port: number): Promise<void> {
const httpServer = createServer(async (req, res) => { const httpServer = createServer(async (req, res) => {
const url = new URL(req.url!, `http://localhost:${port}`); const url = new URL(req.url!, `http://localhost:${port}`);
// Health check // Health check
if (url.pathname === '/ping') { if (url.pathname === '/ping') {
res.writeHead(200, { 'Content-Type': 'application/json' }); res.writeHead(200, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ ok: true })); res.end(JSON.stringify({ ok: true }));
return; return;
} }
// MCP endpoint // MCP endpoint
if (url.pathname === '/mcp') { if (url.pathname === '/mcp') {
// CORS preflight // CORS preflight
res.setHeader('Access-Control-Allow-Origin', '*'); res.setHeader('Access-Control-Allow-Origin', '*');
res.setHeader('Access-Control-Allow-Methods', 'POST, GET, OPTIONS'); res.setHeader('Access-Control-Allow-Methods', 'POST, GET, OPTIONS');
res.setHeader('Access-Control-Allow-Headers', 'Content-Type, Accept'); res.setHeader('Access-Control-Allow-Headers', 'Content-Type, Accept');
if (req.method === 'OPTIONS') { if (req.method === 'OPTIONS') {
res.writeHead(204); res.writeHead(204);
res.end(); res.end();
return; return;
} }
const transport = new StreamableHTTPServerTransport({ const transport = new StreamableHTTPServerTransport({
sessionIdGenerator: () => crypto.randomUUID(), sessionIdGenerator: () => crypto.randomUUID()
}); });
await server.connect(transport); await server.connect(transport);
await transport.handleRequest(req, res); await transport.handleRequest(req, res);
return; return;
} }
res.writeHead(404); res.writeHead(404);
res.end('Not Found'); res.end('Not Found');
}); });
httpServer.listen(port, () => { httpServer.listen(port, () => {
process.stderr.write(`TrueRef MCP server listening on http://localhost:${port}/mcp\n`); process.stderr.write(`TrueRef MCP server listening on http://localhost:${port}/mcp\n`);
}); });
} }
async function main() { async function main() {
const mcpServer = createMcpServer(); // shared server creation const mcpServer = createMcpServer(); // shared server creation
if (args.transport === 'http') { if (args.transport === 'http') {
const port = parseInt(args.port!, 10); const port = parseInt(args.port!, 10);
await startHttp(mcpServer, port); await startHttp(mcpServer, port);
} else { } else {
const transport = new StdioServerTransport(); const transport = new StdioServerTransport();
await mcpServer.connect(transport); await mcpServer.connect(transport);
} }
} }
``` ```
@@ -117,10 +117,10 @@ async function main() {
```json ```json
{ {
"scripts": { "scripts": {
"mcp:start": "tsx src/mcp/index.ts", "mcp:start": "tsx src/mcp/index.ts",
"mcp:http": "tsx src/mcp/index.ts --transport http --port 3001" "mcp:http": "tsx src/mcp/index.ts --transport http --port 3001"
} }
} }
``` ```
@@ -132,12 +132,12 @@ For HTTP transport, users configure Claude Code with the remote URL:
```json ```json
{ {
"mcpServers": { "mcpServers": {
"trueref": { "trueref": {
"type": "http", "type": "http",
"url": "http://localhost:3001/mcp" "url": "http://localhost:3001/mcp"
} }
} }
} }
``` ```

View File

@@ -32,53 +32,53 @@ Support `trueref.json` configuration files placed in the root of a repository. T
// src/lib/server/config/trueref-config.schema.ts // src/lib/server/config/trueref-config.schema.ts
export interface TrueRefConfig { export interface TrueRefConfig {
/** /**
* Override the display name for this library. * Override the display name for this library.
* 1100 characters. * 1100 characters.
*/ */
projectTitle?: string; projectTitle?: string;
/** /**
* Description of the library for search ranking. * Description of the library for search ranking.
* 10500 characters. * 10500 characters.
*/ */
description?: string; description?: string;
/** /**
* Folders to include in indexing (allowlist). * Folders to include in indexing (allowlist).
* Each entry is a path prefix or regex string. * Each entry is a path prefix or regex string.
* If empty/absent, all folders are included. * If empty/absent, all folders are included.
* Examples: ["src/", "docs/", "^packages/core"] * Examples: ["src/", "docs/", "^packages/core"]
*/ */
folders?: string[]; folders?: string[];
/** /**
* Folders to exclude from indexing. * Folders to exclude from indexing.
* Applied after `folders` allowlist. * Applied after `folders` allowlist.
* Examples: ["test/", "fixtures/", "__mocks__"] * Examples: ["test/", "fixtures/", "__mocks__"]
*/ */
excludeFolders?: string[]; excludeFolders?: string[];
/** /**
* Exact filenames to exclude (no path, no regex). * Exact filenames to exclude (no path, no regex).
* Examples: ["README.md", "CHANGELOG.md", "jest.config.ts"] * Examples: ["README.md", "CHANGELOG.md", "jest.config.ts"]
*/ */
excludeFiles?: string[]; excludeFiles?: string[];
/** /**
* Best practices / rules to inject at the top of every query-docs response. * Best practices / rules to inject at the top of every query-docs response.
* Each rule: 5500 characters. * Each rule: 5500 characters.
* Maximum 20 rules. * Maximum 20 rules.
*/ */
rules?: string[]; rules?: string[];
/** /**
* Previously released versions to make available for versioned queries. * Previously released versions to make available for versioned queries.
*/ */
previousVersions?: Array<{ previousVersions?: Array<{
tag: string; // git tag (e.g. "v1.2.3") tag: string; // git tag (e.g. "v1.2.3")
title: string; // human-readable (e.g. "Version 1.2.3") title: string; // human-readable (e.g. "Version 1.2.3")
}>; }>;
} }
``` ```
@@ -88,14 +88,14 @@ export interface TrueRefConfig {
```typescript ```typescript
const CONFIG_CONSTRAINTS = { const CONFIG_CONSTRAINTS = {
projectTitle: { minLength: 1, maxLength: 100 }, projectTitle: { minLength: 1, maxLength: 100 },
description: { minLength: 10, maxLength: 500 }, description: { minLength: 10, maxLength: 500 },
folders: { maxItems: 50, maxLength: 200 }, // per entry folders: { maxItems: 50, maxLength: 200 }, // per entry
excludeFolders: { maxItems: 50, maxLength: 200 }, excludeFolders: { maxItems: 50, maxLength: 200 },
excludeFiles: { maxItems: 100, maxLength: 200 }, excludeFiles: { maxItems: 100, maxLength: 200 },
rules: { maxItems: 20, minLength: 5, maxLength: 500 }, rules: { maxItems: 20, minLength: 5, maxLength: 500 },
previousVersions: { maxItems: 50 }, previousVersions: { maxItems: 50 },
versionTag: { pattern: /^v?\d+\.\d+(\.\d+)?(-.*)?$/ }, versionTag: { pattern: /^v?\d+\.\d+(\.\d+)?(-.*)?$/ }
}; };
``` ```
@@ -107,94 +107,96 @@ const CONFIG_CONSTRAINTS = {
// src/lib/server/config/config-parser.ts // src/lib/server/config/config-parser.ts
export interface ParsedConfig { export interface ParsedConfig {
config: TrueRefConfig; config: TrueRefConfig;
source: 'trueref.json' | 'context7.json'; source: 'trueref.json' | 'context7.json';
warnings: string[]; warnings: string[];
} }
export function parseConfigFile(content: string, filename: string): ParsedConfig { export function parseConfigFile(content: string, filename: string): ParsedConfig {
let raw: unknown; let raw: unknown;
try { try {
raw = JSON.parse(content); raw = JSON.parse(content);
} catch (e) { } catch (e) {
throw new ConfigParseError(`${filename} is not valid JSON: ${(e as Error).message}`); throw new ConfigParseError(`${filename} is not valid JSON: ${(e as Error).message}`);
} }
if (typeof raw !== 'object' || raw === null) { if (typeof raw !== 'object' || raw === null) {
throw new ConfigParseError(`${filename} must be a JSON object`); throw new ConfigParseError(`${filename} must be a JSON object`);
} }
const config = raw as Record<string, unknown>; const config = raw as Record<string, unknown>;
const validated: TrueRefConfig = {}; const validated: TrueRefConfig = {};
const warnings: string[] = []; const warnings: string[] = [];
// projectTitle // projectTitle
if (config.projectTitle !== undefined) { if (config.projectTitle !== undefined) {
if (typeof config.projectTitle !== 'string') { if (typeof config.projectTitle !== 'string') {
warnings.push('projectTitle must be a string, ignoring'); warnings.push('projectTitle must be a string, ignoring');
} else if (config.projectTitle.length > 100) { } else if (config.projectTitle.length > 100) {
validated.projectTitle = config.projectTitle.slice(0, 100); validated.projectTitle = config.projectTitle.slice(0, 100);
warnings.push('projectTitle truncated to 100 characters'); warnings.push('projectTitle truncated to 100 characters');
} else { } else {
validated.projectTitle = config.projectTitle; validated.projectTitle = config.projectTitle;
} }
} }
// description // description
if (config.description !== undefined) { if (config.description !== undefined) {
if (typeof config.description === 'string') { if (typeof config.description === 'string') {
validated.description = config.description.slice(0, 500); validated.description = config.description.slice(0, 500);
} }
} }
// folders / excludeFolders / excludeFiles — validated as string arrays // folders / excludeFolders / excludeFiles — validated as string arrays
for (const field of ['folders', 'excludeFolders', 'excludeFiles'] as const) { for (const field of ['folders', 'excludeFolders', 'excludeFiles'] as const) {
if (config[field] !== undefined) { if (config[field] !== undefined) {
if (!Array.isArray(config[field])) { if (!Array.isArray(config[field])) {
warnings.push(`${field} must be an array, ignoring`); warnings.push(`${field} must be an array, ignoring`);
} else { } else {
validated[field] = (config[field] as unknown[]) validated[field] = (config[field] as unknown[])
.filter((item): item is string => { .filter((item): item is string => {
if (typeof item !== 'string') { if (typeof item !== 'string') {
warnings.push(`${field} entry must be a string, skipping: ${item}`); warnings.push(`${field} entry must be a string, skipping: ${item}`);
return false; return false;
} }
return true; return true;
}) })
.slice(0, field === 'excludeFiles' ? 100 : 50); .slice(0, field === 'excludeFiles' ? 100 : 50);
} }
} }
} }
// rules // rules
if (config.rules !== undefined) { if (config.rules !== undefined) {
if (Array.isArray(config.rules)) { if (Array.isArray(config.rules)) {
validated.rules = (config.rules as unknown[]) validated.rules = (config.rules as unknown[])
.filter((r): r is string => typeof r === 'string' && r.length >= 5) .filter((r): r is string => typeof r === 'string' && r.length >= 5)
.map(r => r.slice(0, 500)) .map((r) => r.slice(0, 500))
.slice(0, 20); .slice(0, 20);
} }
} }
// previousVersions // previousVersions
if (config.previousVersions !== undefined) { if (config.previousVersions !== undefined) {
if (Array.isArray(config.previousVersions)) { if (Array.isArray(config.previousVersions)) {
validated.previousVersions = (config.previousVersions as unknown[]) validated.previousVersions = (config.previousVersions as unknown[])
.filter((v): v is { tag: string; title: string } => .filter(
typeof v === 'object' && v !== null && (v): v is { tag: string; title: string } =>
typeof (v as Record<string, unknown>).tag === 'string' && typeof v === 'object' &&
typeof (v as Record<string, unknown>).title === 'string' v !== null &&
) typeof (v as Record<string, unknown>).tag === 'string' &&
.slice(0, 50); typeof (v as Record<string, unknown>).title === 'string'
} )
} .slice(0, 50);
}
}
return { return {
config: validated, config: validated,
source: filename.startsWith('trueref') ? 'trueref.json' : 'context7.json', source: filename.startsWith('trueref') ? 'trueref.json' : 'context7.json',
warnings, warnings
}; };
} }
``` ```
@@ -219,21 +221,15 @@ When `query-docs` returns results, `rules` from `repository_configs` are prepend
```typescript ```typescript
// In formatters.ts // In formatters.ts
function buildContextResponse( function buildContextResponse(snippets: Snippet[], config: RepositoryConfig | null): string {
snippets: Snippet[], const parts: string[] = [];
config: RepositoryConfig | null
): string {
const parts: string[] = [];
if (config?.rules?.length) { if (config?.rules?.length) {
parts.push( parts.push('## Library Best Practices\n' + config.rules.map((r) => `- ${r}`).join('\n'));
'## Library Best Practices\n' + }
config.rules.map(r => `- ${r}`).join('\n')
);
}
// ... append snippet content // ... append snippet content
return parts.join('\n\n---\n\n'); return parts.join('\n\n---\n\n');
} }
``` ```

View File

@@ -45,34 +45,37 @@ Examples:
### `GET /api/v1/libs/:id/versions` ### `GET /api/v1/libs/:id/versions`
Response `200`: Response `200`:
```json ```json
{ {
"versions": [ "versions": [
{ {
"id": "/facebook/react/v18.3.0", "id": "/facebook/react/v18.3.0",
"repositoryId": "/facebook/react", "repositoryId": "/facebook/react",
"tag": "v18.3.0", "tag": "v18.3.0",
"title": "React v18.3.0", "title": "React v18.3.0",
"state": "indexed", "state": "indexed",
"totalSnippets": 892, "totalSnippets": 892,
"indexedAt": "2026-03-22T10:00:00Z" "indexedAt": "2026-03-22T10:00:00Z"
} }
] ]
} }
``` ```
### `POST /api/v1/libs/:id/versions` ### `POST /api/v1/libs/:id/versions`
Request body: Request body:
```json ```json
{ {
"tag": "v18.3.0", "tag": "v18.3.0",
"title": "React v18.3.0", "title": "React v18.3.0",
"autoIndex": true "autoIndex": true
} }
``` ```
Response `201`: Response `201`:
```json ```json
{ {
"version": { ...RepositoryVersion }, "version": { ...RepositoryVersion },
@@ -96,23 +99,22 @@ Response `202` with job details.
```typescript ```typescript
async function listGitHubTags( async function listGitHubTags(
owner: string, owner: string,
repo: string, repo: string,
token?: string token?: string
): Promise<Array<{ name: string; commit: { sha: string } }>> { ): Promise<Array<{ name: string; commit: { sha: string } }>> {
const headers: Record<string, string> = { const headers: Record<string, string> = {
'Accept': 'application/vnd.github.v3+json', Accept: 'application/vnd.github.v3+json',
'User-Agent': 'TrueRef/1.0', 'User-Agent': 'TrueRef/1.0'
}; };
if (token) headers['Authorization'] = `Bearer ${token}`; if (token) headers['Authorization'] = `Bearer ${token}`;
const response = await fetch( const response = await fetch(`https://api.github.com/repos/${owner}/${repo}/tags?per_page=100`, {
`https://api.github.com/repos/${owner}/${repo}/tags?per_page=100`, headers
{ headers } });
);
if (!response.ok) throw new GitHubApiError(response.status); if (!response.ok) throw new GitHubApiError(response.status);
return response.json(); return response.json();
} }
``` ```
@@ -124,28 +126,26 @@ In the search/context endpoints, the `libraryId` is parsed to extract the option
```typescript ```typescript
function resolveSearchTarget(libraryId: string): { function resolveSearchTarget(libraryId: string): {
repositoryId: string; repositoryId: string;
versionId?: string; versionId?: string;
} { } {
const { repositoryId, version } = parseLibraryId(libraryId); const { repositoryId, version } = parseLibraryId(libraryId);
if (!version) { if (!version) {
// Query default branch: versionId = NULL // Query default branch: versionId = NULL
return { repositoryId }; return { repositoryId };
} }
// Look up versionId from tag // Look up versionId from tag
const versionRecord = db.prepare( const versionRecord = db
`SELECT id FROM repository_versions WHERE repository_id = ? AND tag = ?` .prepare(`SELECT id FROM repository_versions WHERE repository_id = ? AND tag = ?`)
).get(repositoryId, version) as { id: string } | undefined; .get(repositoryId, version) as { id: string } | undefined;
if (!versionRecord) { if (!versionRecord) {
throw new NotFoundError( throw new NotFoundError(`Version "${version}" not found for library "${repositoryId}"`);
`Version "${version}" not found for library "${repositoryId}"` }
);
}
return { repositoryId, versionId: versionRecord.id }; return { repositoryId, versionId: versionRecord.id };
} }
``` ```
@@ -157,20 +157,20 @@ Snippets with `version_id IS NULL` belong to the default branch; snippets with a
```typescript ```typescript
export class VersionService { export class VersionService {
constructor(private db: BetterSQLite3.Database) {} constructor(private db: BetterSQLite3.Database) {}
list(repositoryId: string): RepositoryVersion[] list(repositoryId: string): RepositoryVersion[];
add(repositoryId: string, tag: string, title?: string): RepositoryVersion add(repositoryId: string, tag: string, title?: string): RepositoryVersion;
remove(repositoryId: string, tag: string): void remove(repositoryId: string, tag: string): void;
getByTag(repositoryId: string, tag: string): RepositoryVersion | null getByTag(repositoryId: string, tag: string): RepositoryVersion | null;
registerFromConfig( registerFromConfig(
repositoryId: string, repositoryId: string,
previousVersions: { tag: string; title: string }[] previousVersions: { tag: string; title: string }[]
): RepositoryVersion[] ): RepositoryVersion[];
} }
``` ```

View File

@@ -49,79 +49,79 @@ Implement the main web interface for managing repositories. Built with SvelteKit
```svelte ```svelte
<!-- src/lib/components/RepositoryCard.svelte --> <!-- src/lib/components/RepositoryCard.svelte -->
<script lang="ts"> <script lang="ts">
import type { Repository } from '$lib/types'; import type { Repository } from '$lib/types';
let { repo, onReindex, onDelete } = $props<{ let { repo, onReindex, onDelete } = $props<{
repo: Repository; repo: Repository;
onReindex: (id: string) => void; onReindex: (id: string) => void;
onDelete: (id: string) => void; onDelete: (id: string) => void;
}>(); }>();
const stateColors = { const stateColors = {
pending: 'bg-gray-100 text-gray-600', pending: 'bg-gray-100 text-gray-600',
indexing: 'bg-blue-100 text-blue-700', indexing: 'bg-blue-100 text-blue-700',
indexed: 'bg-green-100 text-green-700', indexed: 'bg-green-100 text-green-700',
error: 'bg-red-100 text-red-700', error: 'bg-red-100 text-red-700'
}; };
const stateLabels = { const stateLabels = {
pending: 'Pending', pending: 'Pending',
indexing: 'Indexing...', indexing: 'Indexing...',
indexed: 'Indexed', indexed: 'Indexed',
error: 'Error', error: 'Error'
}; };
</script> </script>
<div class="rounded-xl border border-gray-200 bg-white p-5 shadow-sm"> <div class="rounded-xl border border-gray-200 bg-white p-5 shadow-sm">
<div class="flex items-start justify-between"> <div class="flex items-start justify-between">
<div> <div>
<h3 class="font-semibold text-gray-900">{repo.title}</h3> <h3 class="font-semibold text-gray-900">{repo.title}</h3>
<p class="mt-0.5 font-mono text-sm text-gray-500">{repo.id}</p> <p class="mt-0.5 font-mono text-sm text-gray-500">{repo.id}</p>
</div> </div>
<span class="rounded-full px-2.5 py-0.5 text-xs font-medium {stateColors[repo.state]}"> <span class="rounded-full px-2.5 py-0.5 text-xs font-medium {stateColors[repo.state]}">
{stateLabels[repo.state]} {stateLabels[repo.state]}
</span> </span>
</div> </div>
{#if repo.description} {#if repo.description}
<p class="mt-2 line-clamp-2 text-sm text-gray-600">{repo.description}</p> <p class="mt-2 line-clamp-2 text-sm text-gray-600">{repo.description}</p>
{/if} {/if}
<div class="mt-4 flex gap-4 text-sm text-gray-500"> <div class="mt-4 flex gap-4 text-sm text-gray-500">
<span>{repo.totalSnippets.toLocaleString()} snippets</span> <span>{repo.totalSnippets.toLocaleString()} snippets</span>
<span>·</span> <span>·</span>
<span>Trust: {repo.trustScore.toFixed(1)}/10</span> <span>Trust: {repo.trustScore.toFixed(1)}/10</span>
{#if repo.stars} {#if repo.stars}
<span>·</span> <span>·</span>
<span>{repo.stars.toLocaleString()}</span> <span>{repo.stars.toLocaleString()}</span>
{/if} {/if}
</div> </div>
{#if repo.state === 'error'} {#if repo.state === 'error'}
<p class="mt-2 text-xs text-red-600">Indexing failed. Check jobs for details.</p> <p class="mt-2 text-xs text-red-600">Indexing failed. Check jobs for details.</p>
{/if} {/if}
<div class="mt-4 flex gap-2"> <div class="mt-4 flex gap-2">
<button <button
onclick={() => onReindex(repo.id)} onclick={() => onReindex(repo.id)}
class="rounded-lg bg-blue-600 px-3 py-1.5 text-sm text-white hover:bg-blue-700" class="rounded-lg bg-blue-600 px-3 py-1.5 text-sm text-white hover:bg-blue-700"
disabled={repo.state === 'indexing'} disabled={repo.state === 'indexing'}
> >
{repo.state === 'indexing' ? 'Indexing...' : 'Re-index'} {repo.state === 'indexing' ? 'Indexing...' : 'Re-index'}
</button> </button>
<a <a
href="/repos/{encodeURIComponent(repo.id)}" href="/repos/{encodeURIComponent(repo.id)}"
class="rounded-lg border border-gray-200 px-3 py-1.5 text-sm text-gray-700 hover:bg-gray-50" class="rounded-lg border border-gray-200 px-3 py-1.5 text-sm text-gray-700 hover:bg-gray-50"
> >
Details Details
</a> </a>
<button <button
onclick={() => onDelete(repo.id)} onclick={() => onDelete(repo.id)}
class="ml-auto rounded-lg px-3 py-1.5 text-sm text-red-600 hover:bg-red-50" class="ml-auto rounded-lg px-3 py-1.5 text-sm text-red-600 hover:bg-red-50"
> >
Delete Delete
</button> </button>
</div> </div>
</div> </div>
``` ```
@@ -132,98 +132,104 @@ Implement the main web interface for managing repositories. Built with SvelteKit
```svelte ```svelte
<!-- src/lib/components/AddRepositoryModal.svelte --> <!-- src/lib/components/AddRepositoryModal.svelte -->
<script lang="ts"> <script lang="ts">
let { onClose, onAdded } = $props<{ let { onClose, onAdded } = $props<{
onClose: () => void; onClose: () => void;
onAdded: () => void; onAdded: () => void;
}>(); }>();
let source = $state<'github' | 'local'>('github'); let source = $state<'github' | 'local'>('github');
let sourceUrl = $state(''); let sourceUrl = $state('');
let githubToken = $state(''); let githubToken = $state('');
let loading = $state(false); let loading = $state(false);
let error = $state<string | null>(null); let error = $state<string | null>(null);
async function handleSubmit() { async function handleSubmit() {
loading = true; loading = true;
error = null; error = null;
try { try {
const res = await fetch('/api/v1/libs', { const res = await fetch('/api/v1/libs', {
method: 'POST', method: 'POST',
headers: { 'Content-Type': 'application/json' }, headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ source, sourceUrl, githubToken: githubToken || undefined }), body: JSON.stringify({ source, sourceUrl, githubToken: githubToken || undefined })
}); });
if (!res.ok) { if (!res.ok) {
const data = await res.json(); const data = await res.json();
throw new Error(data.error ?? 'Failed to add repository'); throw new Error(data.error ?? 'Failed to add repository');
} }
onAdded(); onAdded();
onClose(); onClose();
} catch (e) { } catch (e) {
error = (e as Error).message; error = (e as Error).message;
} finally { } finally {
loading = false; loading = false;
} }
} }
</script> </script>
<dialog class="modal" open> <dialog class="modal" open>
<div class="modal-box max-w-md"> <div class="modal-box max-w-md">
<h2 class="mb-4 text-lg font-semibold">Add Repository</h2> <h2 class="mb-4 text-lg font-semibold">Add Repository</h2>
<div class="mb-4 flex gap-2"> <div class="mb-4 flex gap-2">
<button <button
class="flex-1 rounded-lg py-2 text-sm {source === 'github' ? 'bg-blue-600 text-white' : 'border border-gray-200 text-gray-700'}" class="flex-1 rounded-lg py-2 text-sm {source === 'github'
onclick={() => source = 'github'} ? 'bg-blue-600 text-white'
>GitHub</button> : 'border border-gray-200 text-gray-700'}"
<button onclick={() => (source = 'github')}>GitHub</button
class="flex-1 rounded-lg py-2 text-sm {source === 'local' ? 'bg-blue-600 text-white' : 'border border-gray-200 text-gray-700'}" >
onclick={() => source = 'local'} <button
>Local Path</button> class="flex-1 rounded-lg py-2 text-sm {source === 'local'
</div> ? 'bg-blue-600 text-white'
: 'border border-gray-200 text-gray-700'}"
onclick={() => (source = 'local')}>Local Path</button
>
</div>
<label class="block"> <label class="block">
<span class="text-sm font-medium text-gray-700"> <span class="text-sm font-medium text-gray-700">
{source === 'github' ? 'GitHub URL' : 'Absolute Path'} {source === 'github' ? 'GitHub URL' : 'Absolute Path'}
</span> </span>
<input <input
type="text" type="text"
bind:value={sourceUrl} bind:value={sourceUrl}
placeholder={source === 'github' placeholder={source === 'github'
? 'https://github.com/facebook/react' ? 'https://github.com/facebook/react'
: '/home/user/projects/my-sdk'} : '/home/user/projects/my-sdk'}
class="mt-1 w-full rounded-lg border border-gray-300 px-3 py-2 text-sm" class="mt-1 w-full rounded-lg border border-gray-300 px-3 py-2 text-sm"
/> />
</label> </label>
{#if source === 'github'} {#if source === 'github'}
<label class="mt-3 block"> <label class="mt-3 block">
<span class="text-sm font-medium text-gray-700">GitHub Token (optional, for private repos)</span> <span class="text-sm font-medium text-gray-700"
<input >GitHub Token (optional, for private repos)</span
type="password" >
bind:value={githubToken} <input
placeholder="ghp_..." type="password"
class="mt-1 w-full rounded-lg border border-gray-300 px-3 py-2 text-sm" bind:value={githubToken}
/> placeholder="ghp_..."
</label> class="mt-1 w-full rounded-lg border border-gray-300 px-3 py-2 text-sm"
{/if} />
</label>
{/if}
{#if error} {#if error}
<p class="mt-3 text-sm text-red-600">{error}</p> <p class="mt-3 text-sm text-red-600">{error}</p>
{/if} {/if}
<div class="mt-6 flex justify-end gap-3"> <div class="mt-6 flex justify-end gap-3">
<button onclick={onClose} class="rounded-lg border border-gray-200 px-4 py-2 text-sm"> <button onclick={onClose} class="rounded-lg border border-gray-200 px-4 py-2 text-sm">
Cancel Cancel
</button> </button>
<button <button
onclick={handleSubmit} onclick={handleSubmit}
disabled={loading || !sourceUrl} disabled={loading || !sourceUrl}
class="rounded-lg bg-blue-600 px-4 py-2 text-sm text-white disabled:opacity-50" class="rounded-lg bg-blue-600 px-4 py-2 text-sm text-white disabled:opacity-50"
> >
{loading ? 'Adding...' : 'Add & Index'} {loading ? 'Adding...' : 'Add & Index'}
</button> </button>
</div> </div>
</div> </div>
</dialog> </dialog>
``` ```
@@ -234,48 +240,48 @@ Implement the main web interface for managing repositories. Built with SvelteKit
```svelte ```svelte
<!-- src/lib/components/IndexingProgress.svelte --> <!-- src/lib/components/IndexingProgress.svelte -->
<script lang="ts"> <script lang="ts">
import { onMount, onDestroy } from 'svelte'; import { onMount, onDestroy } from 'svelte';
import type { IndexingJob } from '$lib/types'; import type { IndexingJob } from '$lib/types';
let { jobId } = $props<{ jobId: string }>(); let { jobId } = $props<{ jobId: string }>();
let job = $state<IndexingJob | null>(null); let job = $state<IndexingJob | null>(null);
let interval: ReturnType<typeof setInterval>; let interval: ReturnType<typeof setInterval>;
async function pollJob() { async function pollJob() {
const res = await fetch(`/api/v1/jobs/${jobId}`); const res = await fetch(`/api/v1/jobs/${jobId}`);
if (res.ok) { if (res.ok) {
const data = await res.json(); const data = await res.json();
job = data.job; job = data.job;
if (job?.status === 'done' || job?.status === 'failed') { if (job?.status === 'done' || job?.status === 'failed') {
clearInterval(interval); clearInterval(interval);
} }
} }
} }
onMount(() => { onMount(() => {
pollJob(); pollJob();
interval = setInterval(pollJob, 2000); interval = setInterval(pollJob, 2000);
}); });
onDestroy(() => clearInterval(interval)); onDestroy(() => clearInterval(interval));
</script> </script>
{#if job} {#if job}
<div class="mt-2"> <div class="mt-2">
<div class="flex justify-between text-xs text-gray-500"> <div class="flex justify-between text-xs text-gray-500">
<span>{job.processedFiles} / {job.totalFiles} files</span> <span>{job.processedFiles} / {job.totalFiles} files</span>
<span>{job.progress}%</span> <span>{job.progress}%</span>
</div> </div>
<div class="mt-1 h-1.5 w-full rounded-full bg-gray-200"> <div class="mt-1 h-1.5 w-full rounded-full bg-gray-200">
<div <div
class="h-1.5 rounded-full bg-blue-600 transition-all" class="h-1.5 rounded-full bg-blue-600 transition-all"
style="width: {job.progress}%" style="width: {job.progress}%"
></div> ></div>
</div> </div>
{#if job.status === 'failed'} {#if job.status === 'failed'}
<p class="mt-1 text-xs text-red-600">{job.error}</p> <p class="mt-1 text-xs text-red-600">{job.error}</p>
{/if} {/if}
</div> </div>
{/if} {/if}
``` ```
@@ -288,9 +294,9 @@ Implement the main web interface for managing repositories. Built with SvelteKit
import type { PageServerLoad } from './$types'; import type { PageServerLoad } from './$types';
export const load: PageServerLoad = async ({ fetch }) => { export const load: PageServerLoad = async ({ fetch }) => {
const res = await fetch('/api/v1/libs'); const res = await fetch('/api/v1/libs');
const data = await res.json(); const data = await res.json();
return { repositories: data.libraries }; return { repositories: data.libraries };
}; };
``` ```

View File

@@ -57,25 +57,31 @@ An interactive search interface within the web UI that lets users test the docum
```svelte ```svelte
<!-- src/lib/components/search/LibraryResult.svelte --> <!-- src/lib/components/search/LibraryResult.svelte -->
<script lang="ts"> <script lang="ts">
let { result, onSelect } = $props<{ let { result, onSelect } = $props<{
result: { id: string; title: string; description: string; totalSnippets: number; trustScore: number }; result: {
onSelect: (id: string) => void; id: string;
}>(); title: string;
description: string;
totalSnippets: number;
trustScore: number;
};
onSelect: (id: string) => void;
}>();
</script> </script>
<button <button
onclick={() => onSelect(result.id)} onclick={() => onSelect(result.id)}
class="w-full rounded-xl border border-gray-200 bg-white p-4 text-left shadow-sm hover:border-blue-300 hover:shadow-md transition-all" class="w-full rounded-xl border border-gray-200 bg-white p-4 text-left shadow-sm transition-all hover:border-blue-300 hover:shadow-md"
> >
<div class="flex items-center justify-between"> <div class="flex items-center justify-between">
<span class="font-semibold text-gray-900">{result.title}</span> <span class="font-semibold text-gray-900">{result.title}</span>
<span class="text-xs text-gray-400">Trust {result.trustScore.toFixed(1)}/10</span> <span class="text-xs text-gray-400">Trust {result.trustScore.toFixed(1)}/10</span>
</div> </div>
<p class="font-mono text-xs text-gray-400">{result.id}</p> <p class="font-mono text-xs text-gray-400">{result.id}</p>
{#if result.description} {#if result.description}
<p class="mt-1.5 text-sm text-gray-600 line-clamp-2">{result.description}</p> <p class="mt-1.5 line-clamp-2 text-sm text-gray-600">{result.description}</p>
{/if} {/if}
<p class="mt-2 text-xs text-gray-400">{result.totalSnippets.toLocaleString()} snippets</p> <p class="mt-2 text-xs text-gray-400">{result.totalSnippets.toLocaleString()} snippets</p>
</button> </button>
``` ```
@@ -86,37 +92,39 @@ An interactive search interface within the web UI that lets users test the docum
```svelte ```svelte
<!-- src/lib/components/search/SnippetCard.svelte --> <!-- src/lib/components/search/SnippetCard.svelte -->
<script lang="ts"> <script lang="ts">
import type { Snippet } from '$lib/types'; import type { Snippet } from '$lib/types';
let { snippet } = $props<{ snippet: Snippet }>(); let { snippet } = $props<{ snippet: Snippet }>();
</script> </script>
<div class="rounded-xl border border-gray-200 bg-white overflow-hidden"> <div class="overflow-hidden rounded-xl border border-gray-200 bg-white">
<div class="flex items-center justify-between border-b border-gray-100 px-4 py-2.5"> <div class="flex items-center justify-between border-b border-gray-100 px-4 py-2.5">
<div class="flex items-center gap-2"> <div class="flex items-center gap-2">
{#if snippet.type === 'code'} {#if snippet.type === 'code'}
<span class="rounded bg-purple-100 px-1.5 py-0.5 text-xs text-purple-700">code</span> <span class="rounded bg-purple-100 px-1.5 py-0.5 text-xs text-purple-700">code</span>
{:else} {:else}
<span class="rounded bg-blue-100 px-1.5 py-0.5 text-xs text-blue-700">info</span> <span class="rounded bg-blue-100 px-1.5 py-0.5 text-xs text-blue-700">info</span>
{/if} {/if}
{#if snippet.title} {#if snippet.title}
<span class="text-sm font-medium text-gray-800">{snippet.title}</span> <span class="text-sm font-medium text-gray-800">{snippet.title}</span>
{/if} {/if}
</div> </div>
<span class="text-xs text-gray-400">{snippet.tokenCount} tokens</span> <span class="text-xs text-gray-400">{snippet.tokenCount} tokens</span>
</div> </div>
{#if snippet.breadcrumb} {#if snippet.breadcrumb}
<p class="bg-gray-50 px-4 py-1.5 text-xs text-gray-500 italic">{snippet.breadcrumb}</p> <p class="bg-gray-50 px-4 py-1.5 text-xs text-gray-500 italic">{snippet.breadcrumb}</p>
{/if} {/if}
<div class="p-4"> <div class="p-4">
{#if snippet.type === 'code'} {#if snippet.type === 'code'}
<pre class="overflow-x-auto rounded bg-gray-950 p-4 text-sm text-gray-100"><code>{snippet.content}</code></pre> <pre class="overflow-x-auto rounded bg-gray-950 p-4 text-sm text-gray-100"><code
{:else} >{snippet.content}</code
<div class="prose prose-sm max-w-none text-gray-700">{snippet.content}</div> ></pre>
{/if} {:else}
</div> <div class="prose prose-sm max-w-none text-gray-700">{snippet.content}</div>
{/if}
</div>
</div> </div>
``` ```
@@ -127,44 +135,44 @@ An interactive search interface within the web UI that lets users test the docum
```svelte ```svelte
<!-- src/routes/search/+page.svelte --> <!-- src/routes/search/+page.svelte -->
<script lang="ts"> <script lang="ts">
import { page } from '$app/stores'; import { page } from '$app/stores';
import { goto } from '$app/navigation'; import { goto } from '$app/navigation';
let libraryName = $state(''); let libraryName = $state('');
let selectedLibraryId = $state<string | null>(null); let selectedLibraryId = $state<string | null>(null);
let query = $state(''); let query = $state('');
let libraryResults = $state<LibrarySearchResult[]>([]); let libraryResults = $state<LibrarySearchResult[]>([]);
let snippets = $state<Snippet[]>([]); let snippets = $state<Snippet[]>([]);
let loadingLibraries = $state(false); let loadingLibraries = $state(false);
let loadingSnippets = $state(false); let loadingSnippets = $state(false);
async function searchLibraries() { async function searchLibraries() {
loadingLibraries = true; loadingLibraries = true;
const res = await fetch( const res = await fetch(
`/api/v1/libs/search?libraryName=${encodeURIComponent(libraryName)}&query=${encodeURIComponent(query)}` `/api/v1/libs/search?libraryName=${encodeURIComponent(libraryName)}&query=${encodeURIComponent(query)}`
); );
const data = await res.json(); const data = await res.json();
libraryResults = data.results; libraryResults = data.results;
loadingLibraries = false; loadingLibraries = false;
} }
async function searchDocs() { async function searchDocs() {
if (!selectedLibraryId) return; if (!selectedLibraryId) return;
loadingSnippets = true; loadingSnippets = true;
const url = new URL('/api/v1/context', window.location.origin); const url = new URL('/api/v1/context', window.location.origin);
url.searchParams.set('libraryId', selectedLibraryId); url.searchParams.set('libraryId', selectedLibraryId);
url.searchParams.set('query', query); url.searchParams.set('query', query);
const res = await fetch(url); const res = await fetch(url);
const data = await res.json(); const data = await res.json();
snippets = data.snippets; snippets = data.snippets;
loadingSnippets = false; loadingSnippets = false;
// Update URL // Update URL
goto(`/search?lib=${encodeURIComponent(selectedLibraryId)}&q=${encodeURIComponent(query)}`, { goto(`/search?lib=${encodeURIComponent(selectedLibraryId)}&q=${encodeURIComponent(query)}`, {
replaceState: true, replaceState: true,
keepFocus: true, keepFocus: true
}); });
} }
</script> </script>
``` ```
@@ -177,9 +185,9 @@ Use a minimal, zero-dependency approach for v1 — wrap code blocks in `<pre><co
```typescript ```typescript
// Optional: lazy-load highlight.js only when code snippets are present // Optional: lazy-load highlight.js only when code snippets are present
async function highlightCode(code: string, language: string): Promise<string> { async function highlightCode(code: string, language: string): Promise<string> {
const hljs = await import('highlight.js/lib/core'); const hljs = await import('highlight.js/lib/core');
// Register only needed languages // Register only needed languages
return hljs.highlight(code, { language }).value; return hljs.highlight(code, { language }).value;
} }
``` ```

View File

@@ -30,39 +30,39 @@ Optimize re-indexing by skipping files that haven't changed since the last index
```typescript ```typescript
interface FileDiff { interface FileDiff {
added: CrawledFile[]; // new files not in DB added: CrawledFile[]; // new files not in DB
modified: CrawledFile[]; // files with changed checksum modified: CrawledFile[]; // files with changed checksum
deleted: string[]; // file paths in DB but not in crawl deleted: string[]; // file paths in DB but not in crawl
unchanged: string[]; // file paths with matching checksum unchanged: string[]; // file paths with matching checksum
} }
function computeDiff( function computeDiff(
crawledFiles: CrawledFile[], crawledFiles: CrawledFile[],
existingDocs: Document[] // documents currently in DB for this repo existingDocs: Document[] // documents currently in DB for this repo
): FileDiff { ): FileDiff {
const existingMap = new Map(existingDocs.map(d => [d.filePath, d])); const existingMap = new Map(existingDocs.map((d) => [d.filePath, d]));
const crawledMap = new Map(crawledFiles.map(f => [f.path, f])); const crawledMap = new Map(crawledFiles.map((f) => [f.path, f]));
const added: CrawledFile[] = []; const added: CrawledFile[] = [];
const modified: CrawledFile[] = []; const modified: CrawledFile[] = [];
const unchanged: string[] = []; const unchanged: string[] = [];
for (const file of crawledFiles) { for (const file of crawledFiles) {
const existing = existingMap.get(file.path); const existing = existingMap.get(file.path);
if (!existing) { if (!existing) {
added.push(file); added.push(file);
} else if (existing.checksum !== file.sha) { } else if (existing.checksum !== file.sha) {
modified.push(file); modified.push(file);
} else { } else {
unchanged.push(file.path); unchanged.push(file.path);
} }
} }
const deleted = existingDocs const deleted = existingDocs
.filter(doc => !crawledMap.has(doc.filePath)) .filter((doc) => !crawledMap.has(doc.filePath))
.map(doc => doc.filePath); .map((doc) => doc.filePath);
return { added, modified, deleted, unchanged }; return { added, modified, deleted, unchanged };
} }
``` ```
@@ -78,7 +78,7 @@ const diff = computeDiff(crawledResult.files, existingDocs);
// Log diff summary // Log diff summary
this.updateJob(job.id, { this.updateJob(job.id, {
totalFiles: crawledResult.files.length, totalFiles: crawledResult.files.length
}); });
// Process only changed/new files // Process only changed/new files
@@ -89,29 +89,29 @@ const docIdsToDelete: string[] = [];
// Map modified files to their existing document IDs for deletion // Map modified files to their existing document IDs for deletion
for (const file of diff.modified) { for (const file of diff.modified) {
const existing = existingDocs.find(d => d.filePath === file.path); const existing = existingDocs.find((d) => d.filePath === file.path);
if (existing) docIdsToDelete.push(existing.id); if (existing) docIdsToDelete.push(existing.id);
} }
// Map deleted file paths to document IDs // Map deleted file paths to document IDs
for (const filePath of diff.deleted) { for (const filePath of diff.deleted) {
const existing = existingDocs.find(d => d.filePath === filePath); const existing = existingDocs.find((d) => d.filePath === filePath);
if (existing) docIdsToDelete.push(existing.id); if (existing) docIdsToDelete.push(existing.id);
} }
// Parse new/modified files // Parse new/modified files
for (const [i, file] of filesToProcess.entries()) { for (const [i, file] of filesToProcess.entries()) {
const docId = crypto.randomUUID(); const docId = crypto.randomUUID();
newDocuments.push({ id: docId, ...buildDocument(file, repo.id, job.versionId) }); newDocuments.push({ id: docId, ...buildDocument(file, repo.id, job.versionId) });
newSnippets.push(...parseFile(file, { repositoryId: repo.id, documentId: docId })); newSnippets.push(...parseFile(file, { repositoryId: repo.id, documentId: docId }));
// Count ALL files (including skipped) in progress // Count ALL files (including skipped) in progress
const totalProcessed = diff.unchanged.length + i + 1; const totalProcessed = diff.unchanged.length + i + 1;
const progress = Math.round((totalProcessed / crawledResult.files.length) * 80); const progress = Math.round((totalProcessed / crawledResult.files.length) * 80);
this.updateJob(job.id, { this.updateJob(job.id, {
processedFiles: totalProcessed, processedFiles: totalProcessed,
progress, progress
}); });
} }
// Atomic replacement of only changed documents // Atomic replacement of only changed documents
@@ -123,6 +123,7 @@ this.replaceSnippets(repo.id, docIdsToDelete, newDocuments, newSnippets);
## Performance Impact ## Performance Impact
For a typical repository with 1,000 files where 50 changed: For a typical repository with 1,000 files where 50 changed:
- **Without incremental**: 1,000 files parsed + 1,000 embed batches - **Without incremental**: 1,000 files parsed + 1,000 embed batches
- **With incremental**: 50 files parsed + 50 embed batches - **With incremental**: 50 files parsed + 50 embed batches
- Estimated speedup: ~20x for re-indexing - Estimated speedup: ~20x for re-indexing

View File

@@ -32,24 +32,24 @@ A settings page within the web UI that allows users to configure the embedding p
```typescript ```typescript
const PROVIDER_PRESETS = [ const PROVIDER_PRESETS = [
{ {
name: 'OpenAI', name: 'OpenAI',
baseUrl: 'https://api.openai.com/v1', baseUrl: 'https://api.openai.com/v1',
model: 'text-embedding-3-small', model: 'text-embedding-3-small',
dimensions: 1536, dimensions: 1536
}, },
{ {
name: 'Ollama (local)', name: 'Ollama (local)',
baseUrl: 'http://localhost:11434/v1', baseUrl: 'http://localhost:11434/v1',
model: 'nomic-embed-text', model: 'nomic-embed-text',
dimensions: 768, dimensions: 768
}, },
{ {
name: 'Azure OpenAI', name: 'Azure OpenAI',
baseUrl: 'https://{resource}.openai.azure.com/openai/deployments/{deployment}/v1', baseUrl: 'https://{resource}.openai.azure.com/openai/deployments/{deployment}/v1',
model: 'text-embedding-3-small', model: 'text-embedding-3-small',
dimensions: 1536, dimensions: 1536
}, }
]; ];
``` ```
@@ -60,133 +60,157 @@ const PROVIDER_PRESETS = [
```svelte ```svelte
<!-- src/routes/settings/+page.svelte --> <!-- src/routes/settings/+page.svelte -->
<script lang="ts"> <script lang="ts">
let provider = $state<'none' | 'openai' | 'local'>('none'); let provider = $state<'none' | 'openai' | 'local'>('none');
let baseUrl = $state('https://api.openai.com/v1'); let baseUrl = $state('https://api.openai.com/v1');
let apiKey = $state(''); let apiKey = $state('');
let model = $state('text-embedding-3-small'); let model = $state('text-embedding-3-small');
let dimensions = $state<number | undefined>(1536); let dimensions = $state<number | undefined>(1536);
let testStatus = $state<'idle' | 'testing' | 'ok' | 'error'>('idle'); let testStatus = $state<'idle' | 'testing' | 'ok' | 'error'>('idle');
let testError = $state<string | null>(null); let testError = $state<string | null>(null);
let saving = $state(false); let saving = $state(false);
async function testConnection() { async function testConnection() {
testStatus = 'testing'; testStatus = 'testing';
testError = null; testError = null;
try { try {
const res = await fetch('/api/v1/settings/embedding/test', { const res = await fetch('/api/v1/settings/embedding/test', {
method: 'POST', method: 'POST',
headers: { 'Content-Type': 'application/json' }, headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ provider, openai: { baseUrl, apiKey, model, dimensions } }), body: JSON.stringify({ provider, openai: { baseUrl, apiKey, model, dimensions } })
}); });
if (res.ok) { if (res.ok) {
testStatus = 'ok'; testStatus = 'ok';
} else { } else {
const data = await res.json(); const data = await res.json();
testStatus = 'error'; testStatus = 'error';
testError = data.error; testError = data.error;
} }
} catch (e) { } catch (e) {
testStatus = 'error'; testStatus = 'error';
testError = (e as Error).message; testError = (e as Error).message;
} }
} }
async function save() { async function save() {
saving = true; saving = true;
await fetch('/api/v1/settings/embedding', { await fetch('/api/v1/settings/embedding', {
method: 'PUT', method: 'PUT',
headers: { 'Content-Type': 'application/json' }, headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ provider, openai: { baseUrl, apiKey, model, dimensions } }), body: JSON.stringify({ provider, openai: { baseUrl, apiKey, model, dimensions } })
}); });
saving = false; saving = false;
} }
</script> </script>
<div class="mx-auto max-w-2xl py-8"> <div class="mx-auto max-w-2xl py-8">
<h1 class="mb-6 text-2xl font-bold text-gray-900">Settings</h1> <h1 class="mb-6 text-2xl font-bold text-gray-900">Settings</h1>
<section class="rounded-xl border border-gray-200 bg-white p-6"> <section class="rounded-xl border border-gray-200 bg-white p-6">
<h2 class="mb-1 text-lg font-semibold">Embedding Provider</h2> <h2 class="mb-1 text-lg font-semibold">Embedding Provider</h2>
<p class="mb-4 text-sm text-gray-500"> <p class="mb-4 text-sm text-gray-500">
Embeddings enable semantic search. Without them, only keyword search (FTS5) is used. Embeddings enable semantic search. Without them, only keyword search (FTS5) is used.
</p> </p>
<div class="mb-4 flex gap-2"> <div class="mb-4 flex gap-2">
{#each ['none', 'openai', 'local'] as p} {#each ['none', 'openai', 'local'] as p}
<button <button
onclick={() => provider = p} onclick={() => (provider = p)}
class="rounded-lg px-4 py-2 text-sm {provider === p class="rounded-lg px-4 py-2 text-sm {provider === p
? 'bg-blue-600 text-white' ? 'bg-blue-600 text-white'
: 'border border-gray-200 text-gray-700 hover:bg-gray-50'}" : 'border border-gray-200 text-gray-700 hover:bg-gray-50'}"
> >
{p === 'none' ? 'None (FTS5 only)' : p === 'openai' ? 'OpenAI-compatible' : 'Local Model'} {p === 'none' ? 'None (FTS5 only)' : p === 'openai' ? 'OpenAI-compatible' : 'Local Model'}
</button> </button>
{/each} {/each}
</div> </div>
{#if provider === 'none'} {#if provider === 'none'}
<div class="rounded-lg bg-amber-50 border border-amber-200 p-3 text-sm text-amber-700"> <div class="rounded-lg border border-amber-200 bg-amber-50 p-3 text-sm text-amber-700">
Search will use keyword matching only. Results may be less relevant for complex questions. Search will use keyword matching only. Results may be less relevant for complex questions.
</div> </div>
{/if} {/if}
{#if provider === 'openai'} {#if provider === 'openai'}
<div class="space-y-3"> <div class="space-y-3">
<!-- Preset buttons --> <!-- Preset buttons -->
<div class="flex gap-2 flex-wrap"> <div class="flex flex-wrap gap-2">
{#each PROVIDER_PRESETS as preset} {#each PROVIDER_PRESETS as preset}
<button <button
onclick={() => { baseUrl = preset.baseUrl; model = preset.model; dimensions = preset.dimensions; }} onclick={() => {
class="rounded border border-gray-200 px-2.5 py-1 text-xs text-gray-600 hover:bg-gray-50" baseUrl = preset.baseUrl;
> model = preset.model;
{preset.name} dimensions = preset.dimensions;
</button> }}
{/each} class="rounded border border-gray-200 px-2.5 py-1 text-xs text-gray-600 hover:bg-gray-50"
</div> >
{preset.name}
</button>
{/each}
</div>
<label class="block"> <label class="block">
<span class="text-sm font-medium">Base URL</span> <span class="text-sm font-medium">Base URL</span>
<input type="text" bind:value={baseUrl} class="mt-1 w-full rounded-lg border px-3 py-2 text-sm" /> <input
</label> type="text"
bind:value={baseUrl}
class="mt-1 w-full rounded-lg border px-3 py-2 text-sm"
/>
</label>
<label class="block"> <label class="block">
<span class="text-sm font-medium">API Key</span> <span class="text-sm font-medium">API Key</span>
<input type="password" bind:value={apiKey} class="mt-1 w-full rounded-lg border px-3 py-2 text-sm" placeholder="sk-..." /> <input
</label> type="password"
bind:value={apiKey}
class="mt-1 w-full rounded-lg border px-3 py-2 text-sm"
placeholder="sk-..."
/>
</label>
<label class="block"> <label class="block">
<span class="text-sm font-medium">Model</span> <span class="text-sm font-medium">Model</span>
<input type="text" bind:value={model} class="mt-1 w-full rounded-lg border px-3 py-2 text-sm" /> <input
</label> type="text"
bind:value={model}
class="mt-1 w-full rounded-lg border px-3 py-2 text-sm"
/>
</label>
<label class="block"> <label class="block">
<span class="text-sm font-medium">Dimensions (optional override)</span> <span class="text-sm font-medium">Dimensions (optional override)</span>
<input type="number" bind:value={dimensions} class="mt-1 w-full rounded-lg border px-3 py-2 text-sm" /> <input
</label> type="number"
bind:value={dimensions}
class="mt-1 w-full rounded-lg border px-3 py-2 text-sm"
/>
</label>
<div class="flex items-center gap-3"> <div class="flex items-center gap-3">
<button onclick={testConnection} class="rounded-lg border border-gray-300 px-3 py-1.5 text-sm"> <button
{testStatus === 'testing' ? 'Testing...' : 'Test Connection'} onclick={testConnection}
</button> class="rounded-lg border border-gray-300 px-3 py-1.5 text-sm"
{#if testStatus === 'ok'} >
<span class="text-sm text-green-600">✓ Connection successful</span> {testStatus === 'testing' ? 'Testing...' : 'Test Connection'}
{:else if testStatus === 'error'} </button>
<span class="text-sm text-red-600">{testError}</span> {#if testStatus === 'ok'}
{/if} <span class="text-sm text-green-600">✓ Connection successful</span>
</div> {:else if testStatus === 'error'}
</div> <span class="text-sm text-red-600">{testError}</span>
{/if} {/if}
</div>
</div>
{/if}
<div class="mt-6 flex justify-end"> <div class="mt-6 flex justify-end">
<button <button
onclick={save} onclick={save}
disabled={saving} disabled={saving}
class="rounded-lg bg-blue-600 px-4 py-2 text-sm text-white disabled:opacity-50" class="rounded-lg bg-blue-600 px-4 py-2 text-sm text-white disabled:opacity-50"
> >
{saving ? 'Saving...' : 'Save Settings'} {saving ? 'Saving...' : 'Save Settings'}
</button> </button>
</div> </div>
</section> </section>
</div> </div>
``` ```

View File

@@ -80,6 +80,7 @@ git -C /path/to/repo archive <commit-hash> | tar -x -C /tmp/trueref-idx/<repo>-<
``` ```
Advantages over `git checkout` or worktrees: Advantages over `git checkout` or worktrees:
- Working directory is completely untouched - Working directory is completely untouched
- No `.git` directory in the output (cleaner for parsing) - No `.git` directory in the output (cleaner for parsing)
- Temp directory deleted after indexing with no git state to clean up - Temp directory deleted after indexing with no git state to clean up
@@ -102,26 +103,26 @@ Allow commit hashes to be pinned explicitly per version, overriding tag resoluti
```json ```json
{ {
"previousVersions": [ "previousVersions": [
{ {
"tag": "v2.0.0", "tag": "v2.0.0",
"title": "Version 2.0.0", "title": "Version 2.0.0",
"commitHash": "a3f9c12abc..." "commitHash": "a3f9c12abc..."
} }
] ]
} }
``` ```
### Edge Cases ### Edge Cases
| Case | Handling | | Case | Handling |
|------|----------| | ---------------------------- | ---------------------------------------------------------------------------------- |
| Annotated tags | `rev-parse <tag>^{commit}` peels to commit automatically | | Annotated tags | `rev-parse <tag>^{commit}` peels to commit automatically |
| Mutable tags (e.g. `latest`) | Re-resolve on re-index; warn in UI if hash has changed | | Mutable tags (e.g. `latest`) | Re-resolve on re-index; warn in UI if hash has changed |
| Branch as version | `rev-parse origin/<branch>^{commit}` gives tip; re-resolves on re-index | | Branch as version | `rev-parse origin/<branch>^{commit}` gives tip; re-resolves on re-index |
| Shallow clone | Run `git fetch --unshallow` before `git archive` if commit is unavailable | | Shallow clone | Run `git fetch --unshallow` before `git archive` if commit is unavailable |
| Submodules | `git archive --recurse-submodules` or document as a known limitation | | Submodules | `git archive --recurse-submodules` or document as a known limitation |
| Git LFS | `git lfs pull` required after archive if LFS-tracked files are needed for indexing | | Git LFS | `git lfs pull` required after archive if LFS-tracked files are needed for indexing |
### Acceptance Criteria ### Acceptance Criteria
@@ -169,12 +170,12 @@ fi
Username conventions by server type: Username conventions by server type:
| Server | HTTPS username | Password | | Server | HTTPS username | Password |
|--------|---------------|----------| | ------------------------------ | --------------------- | --------------------- |
| Bitbucket Server / Data Center | `x-token-auth` | HTTP access token | | Bitbucket Server / Data Center | `x-token-auth` | HTTP access token |
| Bitbucket Cloud | account username | App password | | Bitbucket Cloud | account username | App password |
| GitLab (self-hosted or cloud) | `oauth2` | Personal access token | | GitLab (self-hosted or cloud) | `oauth2` | Personal access token |
| GitLab deploy token | `gitlab-deploy-token` | Deploy token secret | | GitLab deploy token | `gitlab-deploy-token` | Deploy token secret |
SSH authentication is also supported and preferred for long-lived deployments. The host SSH configuration (`~/.ssh/config`) handles per-host key selection and travels into the container via volume mount. SSH authentication is also supported and preferred for long-lived deployments. The host SSH configuration (`~/.ssh/config`) handles per-host key selection and travels into the container via volume mount.
@@ -220,7 +221,7 @@ services:
web: web:
build: . build: .
ports: ports:
- "3000:3000" - '3000:3000'
volumes: volumes:
- trueref-data:/data - trueref-data:/data
- ${USERPROFILE}/.ssh:/root/.ssh:ro - ${USERPROFILE}/.ssh:/root/.ssh:ro
@@ -228,20 +229,20 @@ services:
- ${CORP_CA_CERT}:/certs/corp-ca.crt:ro - ${CORP_CA_CERT}:/certs/corp-ca.crt:ro
environment: environment:
DATABASE_URL: /data/trueref.db DATABASE_URL: /data/trueref.db
GIT_TOKEN_BITBUCKET: "${BITBUCKET_TOKEN}" GIT_TOKEN_BITBUCKET: '${BITBUCKET_TOKEN}'
GIT_TOKEN_GITLAB: "${GITLAB_TOKEN}" GIT_TOKEN_GITLAB: '${GITLAB_TOKEN}'
BITBUCKET_HOST: "${BITBUCKET_HOST}" BITBUCKET_HOST: '${BITBUCKET_HOST}'
GITLAB_HOST: "${GITLAB_HOST}" GITLAB_HOST: '${GITLAB_HOST}'
restart: unless-stopped restart: unless-stopped
mcp: mcp:
build: . build: .
command: mcp command: mcp
ports: ports:
- "3001:3001" - '3001:3001'
environment: environment:
TRUEREF_API_URL: http://web:3000 TRUEREF_API_URL: http://web:3000
MCP_PORT: "3001" MCP_PORT: '3001'
depends_on: depends_on:
- web - web
restart: unless-stopped restart: unless-stopped

View File

@@ -107,16 +107,16 @@ An embedding profile is persisted configuration selecting one provider adapter p
```typescript ```typescript
interface EmbeddingProfile { interface EmbeddingProfile {
id: string; id: string;
providerKind: string; providerKind: string;
title: string; title: string;
enabled: boolean; enabled: boolean;
isDefault: boolean; isDefault: boolean;
config: Record<string, unknown>; config: Record<string, unknown>;
model: string; model: string;
dimensions: number; dimensions: number;
createdAt: number; createdAt: number;
updatedAt: number; updatedAt: number;
} }
``` ```

View File

@@ -9,7 +9,11 @@
import { initializeDatabase } from '$lib/server/db/index.js'; import { initializeDatabase } from '$lib/server/db/index.js';
import { getClient } from '$lib/server/db/client.js'; import { getClient } from '$lib/server/db/client.js';
import { initializePipeline } from '$lib/server/pipeline/startup.js'; import { initializePipeline } from '$lib/server/pipeline/startup.js';
import { EMBEDDING_CONFIG_KEY, createProviderFromConfig, defaultEmbeddingConfig } from '$lib/server/embeddings/factory.js'; import {
EMBEDDING_CONFIG_KEY,
createProviderFromConfig,
defaultEmbeddingConfig
} from '$lib/server/embeddings/factory.js';
import { EmbeddingService } from '$lib/server/embeddings/embedding.service.js'; import { EmbeddingService } from '$lib/server/embeddings/embedding.service.js';
import type { EmbeddingConfig } from '$lib/server/embeddings/factory.js'; import type { EmbeddingConfig } from '$lib/server/embeddings/factory.js';
import type { Handle } from '@sveltejs/kit'; import type { Handle } from '@sveltejs/kit';

View File

@@ -115,7 +115,12 @@
/> />
{:else} {:else}
<div class="mt-1"> <div class="mt-1">
<FolderPicker bind:value={sourceUrl} onselect={(p) => { if (!title) title = p.split('/').at(-1) ?? ''; }} /> <FolderPicker
bind:value={sourceUrl}
onselect={(p) => {
if (!title) title = p.split('/').at(-1) ?? '';
}}
/>
</div> </div>
{/if} {/if}
</div> </div>
@@ -133,7 +138,8 @@
{#if source === 'github'} {#if source === 'github'}
<label class="block"> <label class="block">
<span class="text-sm font-medium text-gray-700" <span class="text-sm font-medium text-gray-700"
>GitHub Token <span class="font-normal text-gray-500">(optional, for private repos)</span >GitHub Token <span class="font-normal text-gray-500"
>(optional, for private repos)</span
></span ></span
> >
<input <input

View File

@@ -78,9 +78,7 @@
title="Browse folders" title="Browse folders"
> >
<svg class="h-4 w-4" viewBox="0 0 20 20" fill="currentColor"> <svg class="h-4 w-4" viewBox="0 0 20 20" fill="currentColor">
<path <path d="M2 6a2 2 0 012-2h5l2 2h5a2 2 0 012 2v6a2 2 0 01-2 2H4a2 2 0 01-2-2V6z" />
d="M2 6a2 2 0 012-2h5l2 2h5a2 2 0 012 2v6a2 2 0 01-2 2H4a2 2 0 01-2-2V6z"
/>
</svg> </svg>
Browse Browse
</button> </button>
@@ -94,7 +92,10 @@
class="fixed inset-0 z-[60] flex items-center justify-center bg-black/50 p-4" class="fixed inset-0 z-[60] flex items-center justify-center bg-black/50 p-4"
onclick={handleBackdropClick} onclick={handleBackdropClick}
> >
<div class="flex w-full max-w-lg flex-col rounded-xl bg-white shadow-xl" style="max-height: 70vh"> <div
class="flex w-full max-w-lg flex-col rounded-xl bg-white shadow-xl"
style="max-height: 70vh"
>
<!-- Header --> <!-- Header -->
<div class="flex items-center justify-between border-b border-gray-200 px-4 py-3"> <div class="flex items-center justify-between border-b border-gray-200 px-4 py-3">
<h3 class="text-sm font-semibold text-gray-900">Select Folder</h3> <h3 class="text-sm font-semibold text-gray-900">Select Folder</h3>
@@ -133,11 +134,7 @@
{browsePath} {browsePath}
</span> </span>
{#if loading} {#if loading}
<svg <svg class="h-4 w-4 animate-spin text-gray-400" fill="none" viewBox="0 0 24 24">
class="h-4 w-4 animate-spin text-gray-400"
fill="none"
viewBox="0 0 24 24"
>
<circle class="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" stroke-width="4" <circle class="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" stroke-width="4"
></circle> ></circle>
<path <path
@@ -166,7 +163,9 @@
title="Click to navigate, double-click to select" title="Click to navigate, double-click to select"
> >
<svg <svg
class="h-4 w-4 shrink-0 {entry.isGitRepo ? 'text-orange-400' : 'text-yellow-400'}" class="h-4 w-4 shrink-0 {entry.isGitRepo
? 'text-orange-400'
: 'text-yellow-400'}"
viewBox="0 0 20 20" viewBox="0 0 20 20"
fill="currentColor" fill="currentColor"
> >
@@ -176,7 +175,9 @@
</svg> </svg>
<span class="flex-1 truncate text-gray-800">{entry.name}</span> <span class="flex-1 truncate text-gray-800">{entry.name}</span>
{#if entry.isGitRepo} {#if entry.isGitRepo}
<span class="shrink-0 rounded bg-orange-100 px-1.5 py-0.5 text-xs text-orange-700"> <span
class="shrink-0 rounded bg-orange-100 px-1.5 py-0.5 text-xs text-orange-700"
>
git git
</span> </span>
{/if} {/if}

View File

@@ -36,8 +36,9 @@
<p class="mt-0.5 truncate font-mono text-sm text-gray-500">{repo.id}</p> <p class="mt-0.5 truncate font-mono text-sm text-gray-500">{repo.id}</p>
</div> </div>
<span <span
class="ml-3 shrink-0 rounded-full px-2.5 py-0.5 text-xs font-medium {stateColors[repo.state] ?? class="ml-3 shrink-0 rounded-full px-2.5 py-0.5 text-xs font-medium {stateColors[
'bg-gray-100 text-gray-600'}" repo.state
] ?? 'bg-gray-100 text-gray-600'}"
> >
{stateLabels[repo.state] ?? repo.state} {stateLabels[repo.state] ?? repo.state}
</span> </span>

View File

@@ -17,7 +17,10 @@
}; };
</script> </script>
<div class="flex flex-col items-center rounded-lg p-3 {variantClasses[variant] ?? variantClasses.default}"> <div
class="flex flex-col items-center rounded-lg p-3 {variantClasses[variant] ??
variantClasses.default}"
>
<span class="text-lg font-bold">{value}</span> <span class="text-lg font-bold">{value}</span>
<span class="mt-0.5 text-xs">{label}</span> <span class="mt-0.5 text-xs">{label}</span>
</div> </div>

View File

@@ -17,6 +17,8 @@
const config = $derived(statusConfig[status]); const config = $derived(statusConfig[status]);
</script> </script>
<span class="inline-flex items-center rounded-full px-2.5 py-0.5 text-xs font-medium {config.bg} {config.text}"> <span
class="inline-flex items-center rounded-full px-2.5 py-0.5 text-xs font-medium {config.bg} {config.text}"
>
{config.label} {config.label}
</span> </span>

View File

@@ -25,7 +25,7 @@
{placeholder} {placeholder}
onkeydown={handleKeydown} onkeydown={handleKeydown}
disabled={loading} disabled={loading}
class="flex-1 rounded-lg border border-gray-200 bg-white px-4 py-2.5 text-sm text-gray-900 placeholder-gray-400 shadow-sm outline-none transition-all focus:border-blue-400 focus:ring-2 focus:ring-blue-100 disabled:cursor-not-allowed disabled:opacity-60" class="flex-1 rounded-lg border border-gray-200 bg-white px-4 py-2.5 text-sm text-gray-900 placeholder-gray-400 shadow-sm transition-all outline-none focus:border-blue-400 focus:ring-2 focus:ring-blue-100 disabled:cursor-not-allowed disabled:opacity-60"
/> />
<button <button
onclick={onsubmit} onclick={onsubmit}

View File

@@ -5,21 +5,15 @@
const isCode = $derived(snippet.type === 'code'); const isCode = $derived(snippet.type === 'code');
const title = $derived( const title = $derived(snippet.type === 'code' ? snippet.title : null);
snippet.type === 'code' ? snippet.title : null
);
const breadcrumb = $derived( const breadcrumb = $derived(snippet.type === 'code' ? snippet.description : snippet.breadcrumb);
snippet.type === 'code' ? snippet.description : snippet.breadcrumb
);
const content = $derived( const content = $derived(
snippet.type === 'code' ? snippet.codeList[0]?.code ?? '' : snippet.text snippet.type === 'code' ? (snippet.codeList[0]?.code ?? '') : snippet.text
); );
const language = $derived( const language = $derived(snippet.type === 'code' ? (snippet.codeList[0]?.language ?? '') : null);
snippet.type === 'code' ? (snippet.codeList[0]?.language ?? '') : null
);
const tokenCount = $derived(snippet.tokenCount ?? 0); const tokenCount = $derived(snippet.tokenCount ?? 0);
</script> </script>
@@ -40,14 +34,14 @@
</div> </div>
{#if breadcrumb} {#if breadcrumb}
<p class="bg-gray-50 px-4 py-1.5 text-xs italic text-gray-500">{breadcrumb}</p> <p class="bg-gray-50 px-4 py-1.5 text-xs text-gray-500 italic">{breadcrumb}</p>
{/if} {/if}
<div class="p-4"> <div class="p-4">
{#if isCode} {#if isCode}
<pre <pre class="overflow-x-auto rounded bg-gray-950 p-4 text-sm text-gray-100"><code
class="overflow-x-auto rounded bg-gray-950 p-4 text-sm text-gray-100" >{content}</code
><code>{content}</code></pre> ></pre>
{:else} {:else}
<div class="prose prose-sm max-w-none whitespace-pre-wrap text-gray-700">{content}</div> <div class="prose prose-sm max-w-none whitespace-pre-wrap text-gray-700">{content}</div>
{/if} {/if}

View File

@@ -83,9 +83,7 @@ function makeSnippetResult(snippet: Snippet): SnippetSearchResult {
}); });
} }
function makeMetadata( function makeMetadata(overrides: Partial<ContextResponseMetadata> = {}): ContextResponseMetadata {
overrides: Partial<ContextResponseMetadata> = {}
): ContextResponseMetadata {
return { return {
localSource: false, localSource: false,
resultCount: 1, resultCount: 1,
@@ -160,7 +158,11 @@ describe('formatLibrarySearchJson', () => {
it('maps non-indexed state to initial', () => { it('maps non-indexed state to initial', () => {
const results: LibrarySearchResult[] = [ const results: LibrarySearchResult[] = [
new LibrarySearchResult({ repository: makeRepo({ state: 'pending' }), versions: [], score: 0 }) new LibrarySearchResult({
repository: makeRepo({ state: 'pending' }),
versions: [],
score: 0
})
]; ];
const response = formatLibrarySearchJson(results); const response = formatLibrarySearchJson(results);
expect(response.results[0].state).toBe('initial'); expect(response.results[0].state).toBe('initial');
@@ -168,7 +170,11 @@ describe('formatLibrarySearchJson', () => {
it('handles null lastIndexedAt', () => { it('handles null lastIndexedAt', () => {
const results: LibrarySearchResult[] = [ const results: LibrarySearchResult[] = [
new LibrarySearchResult({ repository: makeRepo({ lastIndexedAt: null }), versions: [], score: 0 }) new LibrarySearchResult({
repository: makeRepo({ lastIndexedAt: null }),
versions: [],
score: 0
})
]; ];
const response = formatLibrarySearchJson(results); const response = formatLibrarySearchJson(results);
expect(response.results[0].lastUpdateDate).toBeNull(); expect(response.results[0].lastUpdateDate).toBeNull();

View File

@@ -66,7 +66,9 @@ export const CORS_HEADERS = {
/** /**
* Convert internal LibrarySearchResult[] to the context7-compatible JSON body. * Convert internal LibrarySearchResult[] to the context7-compatible JSON body.
*/ */
export function formatLibrarySearchJson(results: LibrarySearchResult[]): LibrarySearchJsonResponseDto { export function formatLibrarySearchJson(
results: LibrarySearchResult[]
): LibrarySearchJsonResponseDto {
return ContextResponseMapper.toLibrarySearchJson(results); return ContextResponseMapper.toLibrarySearchJson(results);
} }
@@ -80,7 +82,7 @@ export function formatContextJson(
snippets: SnippetSearchResult[], snippets: SnippetSearchResult[],
rules: string[], rules: string[],
metadata?: ContextResponseMetadata metadata?: ContextResponseMetadata
): ContextJsonResponseDto { ): ContextJsonResponseDto {
return ContextResponseMapper.toContextJson(snippets, rules, metadata); return ContextResponseMapper.toContextJson(snippets, rules, metadata);
} }
@@ -94,7 +96,10 @@ export function formatContextJson(
* @param snippets - Ranked snippet search results (already token-budget trimmed). * @param snippets - Ranked snippet search results (already token-budget trimmed).
* @param rules - Rules from `trueref.json` / `repository_configs`. * @param rules - Rules from `trueref.json` / `repository_configs`.
*/ */
function formatOriginLine(result: SnippetSearchResult, metadata?: ContextResponseMetadata): string | null { function formatOriginLine(
result: SnippetSearchResult,
metadata?: ContextResponseMetadata
): string | null {
if (!metadata?.repository) return null; if (!metadata?.repository) return null;
const parts = [ const parts = [

View File

@@ -115,10 +115,7 @@ describe('parseConfigFile — description', () => {
describe('parseConfigFile — array path fields', () => { describe('parseConfigFile — array path fields', () => {
it('accepts valid folders', () => { it('accepts valid folders', () => {
const result = parseConfigFile( const result = parseConfigFile(JSON.stringify({ folders: ['src/', 'docs/'] }), 'trueref.json');
JSON.stringify({ folders: ['src/', 'docs/'] }),
'trueref.json'
);
expect(result.config.folders).toEqual(['src/', 'docs/']); expect(result.config.folders).toEqual(['src/', 'docs/']);
expect(result.warnings).toHaveLength(0); expect(result.warnings).toHaveLength(0);
}); });
@@ -130,10 +127,7 @@ describe('parseConfigFile — array path fields', () => {
}); });
it('skips non-string entries in folders with a warning', () => { it('skips non-string entries in folders with a warning', () => {
const result = parseConfigFile( const result = parseConfigFile(JSON.stringify({ folders: ['src/', 42, true] }), 'trueref.json');
JSON.stringify({ folders: ['src/', 42, true] }),
'trueref.json'
);
expect(result.config.folders).toEqual(['src/']); expect(result.config.folders).toEqual(['src/']);
expect(result.warnings.length).toBeGreaterThan(0); expect(result.warnings.length).toBeGreaterThan(0);
}); });
@@ -174,7 +168,9 @@ describe('parseConfigFile — array path fields', () => {
describe('parseConfigFile — rules', () => { describe('parseConfigFile — rules', () => {
it('accepts valid rules', () => { it('accepts valid rules', () => {
const result = parseConfigFile( const result = parseConfigFile(
JSON.stringify({ rules: ['Always use named imports.', 'Prefer async/await over callbacks.'] }), JSON.stringify({
rules: ['Always use named imports.', 'Prefer async/await over callbacks.']
}),
'trueref.json' 'trueref.json'
); );
expect(result.config.rules).toHaveLength(2); expect(result.config.rules).toHaveLength(2);
@@ -204,10 +200,7 @@ describe('parseConfigFile — rules', () => {
}); });
it('ignores non-array rules with a warning', () => { it('ignores non-array rules with a warning', () => {
const result = parseConfigFile( const result = parseConfigFile(JSON.stringify({ rules: 'use named imports' }), 'trueref.json');
JSON.stringify({ rules: 'use named imports' }),
'trueref.json'
);
expect(result.config.rules).toBeUndefined(); expect(result.config.rules).toBeUndefined();
expect(result.warnings.some((w) => /rules must be an array/.test(w))).toBe(true); expect(result.warnings.some((w) => /rules must be an array/.test(w))).toBe(true);
}); });
@@ -243,10 +236,7 @@ describe('parseConfigFile — previousVersions', () => {
it('skips entries missing tag', () => { it('skips entries missing tag', () => {
const result = parseConfigFile( const result = parseConfigFile(
JSON.stringify({ JSON.stringify({
previousVersions: [ previousVersions: [{ title: 'No tag here' }, { tag: 'v1.0.0', title: 'Valid' }]
{ title: 'No tag here' },
{ tag: 'v1.0.0', title: 'Valid' }
]
}), }),
'trueref.json' 'trueref.json'
); );
@@ -275,10 +265,7 @@ describe('parseConfigFile — previousVersions', () => {
}); });
it('ignores non-array previousVersions with a warning', () => { it('ignores non-array previousVersions with a warning', () => {
const result = parseConfigFile( const result = parseConfigFile(JSON.stringify({ previousVersions: 'v1.0.0' }), 'trueref.json');
JSON.stringify({ previousVersions: 'v1.0.0' }),
'trueref.json'
);
expect(result.config.previousVersions).toBeUndefined(); expect(result.config.previousVersions).toBeUndefined();
expect(result.warnings.some((w) => /previousVersions must be an array/.test(w))).toBe(true); expect(result.warnings.some((w) => /previousVersions must be an array/.test(w))).toBe(true);
}); });
@@ -294,9 +281,7 @@ describe('resolveConfig', () => {
}); });
it('returns null when no matching filenames', () => { it('returns null when no matching filenames', () => {
expect( expect(resolveConfig([{ filename: 'package.json', content: '{"name":"x"}' }])).toBeNull();
resolveConfig([{ filename: 'package.json', content: '{"name":"x"}' }])
).toBeNull();
}); });
it('prefers trueref.json over context7.json', () => { it('prefers trueref.json over context7.json', () => {

View File

@@ -65,7 +65,9 @@ export function parseConfigFile(content: string, filename: string): ParsedConfig
// ---- 2. Root must be an object ------------------------------------------ // ---- 2. Root must be an object ------------------------------------------
if (typeof raw !== 'object' || raw === null || Array.isArray(raw)) { if (typeof raw !== 'object' || raw === null || Array.isArray(raw)) {
throw new ConfigParseError(`${filename} must be a JSON object, got ${Array.isArray(raw) ? 'array' : typeof raw}`); throw new ConfigParseError(
`${filename} must be a JSON object, got ${Array.isArray(raw) ? 'array' : typeof raw}`
);
} }
const input = raw as Record<string, unknown>; const input = raw as Record<string, unknown>;
@@ -131,7 +133,9 @@ export function parseConfigFile(content: string, filename: string): ParsedConfig
}) })
.map((item) => { .map((item) => {
if (item.length > maxLength) { if (item.length > maxLength) {
warnings.push(`${field} entry truncated to ${maxLength} characters: "${item.slice(0, 40)}..."`); warnings.push(
`${field} entry truncated to ${maxLength} characters: "${item.slice(0, 40)}..."`
);
return item.slice(0, maxLength); return item.slice(0, maxLength);
} }
return item; return item;
@@ -160,9 +164,7 @@ export function parseConfigFile(content: string, filename: string): ParsedConfig
return false; return false;
} }
if (r.length < minLength) { if (r.length < minLength) {
warnings.push( warnings.push(`rules entry too short (< ${minLength} chars) — skipping: "${r}"`);
`rules entry too short (< ${minLength} chars) — skipping: "${r}"`
);
return false; return false;
} }
return true; return true;

View File

@@ -1,85 +1,85 @@
{ {
"$schema": "http://json-schema.org/draft-07/schema#", "$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://trueref.dev/schema/trueref-config.json", "$id": "https://trueref.dev/schema/trueref-config.json",
"title": "TrueRef Repository Configuration", "title": "TrueRef Repository Configuration",
"description": "Configuration file for controlling how a repository is indexed and presented by TrueRef. Place as trueref.json (or context7.json for backward compatibility) at the root of your repository.", "description": "Configuration file for controlling how a repository is indexed and presented by TrueRef. Place as trueref.json (or context7.json for backward compatibility) at the root of your repository.",
"type": "object", "type": "object",
"additionalProperties": false, "additionalProperties": false,
"properties": { "properties": {
"projectTitle": { "projectTitle": {
"type": "string", "type": "string",
"minLength": 1, "minLength": 1,
"maxLength": 100, "maxLength": 100,
"description": "Override the display name for this library. When set, this replaces the repository name in search results and UI." "description": "Override the display name for this library. When set, this replaces the repository name in search results and UI."
}, },
"description": { "description": {
"type": "string", "type": "string",
"minLength": 10, "minLength": 10,
"maxLength": 500, "maxLength": 500,
"description": "A short description of the library used for search ranking and display. Should accurately describe the library's purpose." "description": "A short description of the library used for search ranking and display. Should accurately describe the library's purpose."
}, },
"folders": { "folders": {
"type": "array", "type": "array",
"maxItems": 50, "maxItems": 50,
"description": "Allowlist of folder path prefixes or regex strings to include in indexing. If empty or absent, all folders are included. Examples: [\"src/\", \"docs/\", \"^packages/core\"]", "description": "Allowlist of folder path prefixes or regex strings to include in indexing. If empty or absent, all folders are included. Examples: [\"src/\", \"docs/\", \"^packages/core\"]",
"items": { "items": {
"type": "string", "type": "string",
"maxLength": 200, "maxLength": 200,
"description": "A path prefix or regex string. Paths are matched against the full relative file path within the repository." "description": "A path prefix or regex string. Paths are matched against the full relative file path within the repository."
} }
}, },
"excludeFolders": { "excludeFolders": {
"type": "array", "type": "array",
"maxItems": 50, "maxItems": 50,
"description": "Folders to exclude from indexing. Applied after the 'folders' allowlist. Examples: [\"test/\", \"fixtures/\", \"__mocks__\"]", "description": "Folders to exclude from indexing. Applied after the 'folders' allowlist. Examples: [\"test/\", \"fixtures/\", \"__mocks__\"]",
"items": { "items": {
"type": "string", "type": "string",
"maxLength": 200, "maxLength": 200,
"description": "A path prefix or regex string for folders to exclude." "description": "A path prefix or regex string for folders to exclude."
} }
}, },
"excludeFiles": { "excludeFiles": {
"type": "array", "type": "array",
"maxItems": 100, "maxItems": 100,
"description": "Exact filenames to exclude (no path, no regex). Examples: [\"README.md\", \"CHANGELOG.md\", \"jest.config.ts\"]", "description": "Exact filenames to exclude (no path, no regex). Examples: [\"README.md\", \"CHANGELOG.md\", \"jest.config.ts\"]",
"items": { "items": {
"type": "string", "type": "string",
"maxLength": 200, "maxLength": 200,
"description": "An exact filename (not a path). Must not contain path separators." "description": "An exact filename (not a path). Must not contain path separators."
} }
}, },
"rules": { "rules": {
"type": "array", "type": "array",
"maxItems": 20, "maxItems": 20,
"description": "Best practices and rules to inject at the top of every query-docs response. These are shown to AI coding assistants to guide correct library usage.", "description": "Best practices and rules to inject at the top of every query-docs response. These are shown to AI coding assistants to guide correct library usage.",
"items": { "items": {
"type": "string", "type": "string",
"minLength": 5, "minLength": 5,
"maxLength": 500, "maxLength": 500,
"description": "A single best-practice rule or guideline for using this library." "description": "A single best-practice rule or guideline for using this library."
} }
}, },
"previousVersions": { "previousVersions": {
"type": "array", "type": "array",
"maxItems": 50, "maxItems": 50,
"description": "Previously released versions to make available for versioned documentation queries.", "description": "Previously released versions to make available for versioned documentation queries.",
"items": { "items": {
"type": "object", "type": "object",
"required": ["tag", "title"], "required": ["tag", "title"],
"additionalProperties": false, "additionalProperties": false,
"properties": { "properties": {
"tag": { "tag": {
"type": "string", "type": "string",
"pattern": "^v?\\d+\\.\\d+(\\.\\d+)?(-.*)?$", "pattern": "^v?\\d+\\.\\d+(\\.\\d+)?(-.*)?$",
"description": "Git tag name for this version (e.g. \"v1.2.3\", \"2.0.0-beta.1\")." "description": "Git tag name for this version (e.g. \"v1.2.3\", \"2.0.0-beta.1\")."
}, },
"title": { "title": {
"type": "string", "type": "string",
"minLength": 1, "minLength": 1,
"description": "Human-readable version label (e.g. \"Version 1.2.3\", \"v2 Legacy\")." "description": "Human-readable version label (e.g. \"Version 1.2.3\", \"v2 Legacy\")."
} }
} }
} }
} }
} }
} }

View File

@@ -74,27 +74,46 @@ export const MAX_FILE_SIZE_BYTES = 500_000;
*/ */
export const IGNORED_DIR_NAMES = new Set([ export const IGNORED_DIR_NAMES = new Set([
// ── Version control ──────────────────────────────────────────────────── // ── Version control ────────────────────────────────────────────────────
'.git', '.hg', '.svn', '.git',
'.hg',
'.svn',
// ── JavaScript / TypeScript ───────────────────────────────────────────── // ── JavaScript / TypeScript ─────────────────────────────────────────────
'node_modules', 'node_modules',
'.npm', '.yarn', '.pnpm-store', '.pnp', '.npm',
'.yarn',
'.pnpm-store',
'.pnp',
// Build outputs and framework caches // Build outputs and framework caches
'dist', 'build', 'out', 'dist',
'.next', '.nuxt', '.svelte-kit', '.vite', 'build',
'.turbo', '.parcel-cache', '.webpack', 'out',
'.next',
'.nuxt',
'.svelte-kit',
'.vite',
'.turbo',
'.parcel-cache',
'.webpack',
// ── Python ────────────────────────────────────────────────────────────── // ── Python ──────────────────────────────────────────────────────────────
'__pycache__', '__pycache__',
'.venv', 'venv', 'env', '.venv',
'site-packages', '.eggs', 'venv',
'.pytest_cache', '.mypy_cache', '.ruff_cache', 'env',
'.tox', '.nox', 'site-packages',
'.eggs',
'.pytest_cache',
'.mypy_cache',
'.ruff_cache',
'.tox',
'.nox',
'htmlcov', 'htmlcov',
// ── Java / Kotlin / Scala ─────────────────────────────────────────────── // ── Java / Kotlin / Scala ───────────────────────────────────────────────
'target', // Maven + sbt 'target', // Maven + sbt
'.gradle', '.mvn', '.gradle',
'.mvn',
// ── Ruby ──────────────────────────────────────────────────────────────── // ── Ruby ────────────────────────────────────────────────────────────────
'.bundle', '.bundle',
@@ -103,19 +122,24 @@ export const IGNORED_DIR_NAMES = new Set([
// 'vendor' below covers PHP Composer // 'vendor' below covers PHP Composer
// ── .NET ──────────────────────────────────────────────────────────────── // ── .NET ────────────────────────────────────────────────────────────────
'bin', 'obj', 'packages', 'bin',
'obj',
'packages',
// ── Haskell ───────────────────────────────────────────────────────────── // ── Haskell ─────────────────────────────────────────────────────────────
'.stack-work', 'dist-newstyle', '.stack-work',
'dist-newstyle',
// ── Dart / Flutter ────────────────────────────────────────────────────── // ── Dart / Flutter ──────────────────────────────────────────────────────
'.dart_tool', '.dart_tool',
// ── Swift / iOS ───────────────────────────────────────────────────────── // ── Swift / iOS ─────────────────────────────────────────────────────────
'Pods', 'DerivedData', 'Pods',
'DerivedData',
// ── Elixir / Erlang ───────────────────────────────────────────────────── // ── Elixir / Erlang ─────────────────────────────────────────────────────
'_build', 'deps', '_build',
'deps',
// ── Clojure ───────────────────────────────────────────────────────────── // ── Clojure ─────────────────────────────────────────────────────────────
'.cpcache', '.cpcache',
@@ -125,16 +149,25 @@ export const IGNORED_DIR_NAMES = new Set([
'vendor', 'vendor',
// ── Generic caches / temp ─────────────────────────────────────────────── // ── Generic caches / temp ───────────────────────────────────────────────
'.cache', '.tmp', 'tmp', 'temp', '.temp', '.sass-cache', '.cache',
'.tmp',
'tmp',
'temp',
'.temp',
'.sass-cache',
// ── Test coverage ─────────────────────────────────────────────────────── // ── Test coverage ───────────────────────────────────────────────────────
'coverage', '.nyc_output', 'coverage',
'.nyc_output',
// ── IDE / editor artefacts ────────────────────────────────────────────── // ── IDE / editor artefacts ──────────────────────────────────────────────
'.idea', '.vs', '.idea',
'.vs',
// ── Generated code ────────────────────────────────────────────────────── // ── Generated code ──────────────────────────────────────────────────────
'generated', '__generated__', '_generated', 'generated',
'__generated__',
'_generated',
// ── Logs ──────────────────────────────────────────────────────────────── // ── Logs ────────────────────────────────────────────────────────────────
'logs' 'logs'
@@ -264,11 +297,7 @@ export function detectLanguage(filePath: string): string {
* 7. Must not be under a config.excludeFolders path / regex. * 7. Must not be under a config.excludeFolders path / regex.
* 8. Must be under a config.folders allowlist path / regex (if specified). * 8. Must be under a config.folders allowlist path / regex (if specified).
*/ */
export function shouldIndexFile( export function shouldIndexFile(filePath: string, fileSize: number, config?: RepoConfig): boolean {
filePath: string,
fileSize: number,
config?: RepoConfig
): boolean {
const ext = extname(filePath).toLowerCase(); const ext = extname(filePath).toLowerCase();
const base = basename(filePath); const base = basename(filePath);

View File

@@ -35,10 +35,9 @@ export async function listGitHubTags(
}; };
if (token) headers['Authorization'] = `Bearer ${token}`; if (token) headers['Authorization'] = `Bearer ${token}`;
const response = await fetch( const response = await fetch(`https://api.github.com/repos/${owner}/${repo}/tags?per_page=100`, {
`https://api.github.com/repos/${owner}/${repo}/tags?per_page=100`, headers
{ headers } });
);
if (!response.ok) throw new GitHubApiError(response.status); if (!response.ok) throw new GitHubApiError(response.status);
return response.json() as Promise<GitHubTag[]>; return response.json() as Promise<GitHubTag[]>;

View File

@@ -8,13 +8,14 @@
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest'; import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
import { crawl } from './github.crawler.js'; import { crawl } from './github.crawler.js';
import { shouldIndexFile, detectLanguage, INDEXABLE_EXTENSIONS, MAX_FILE_SIZE_BYTES } from './file-filter.js';
import { GitHubRateLimiter, Semaphore, withRetry } from './rate-limiter.js';
import { import {
AuthenticationError, shouldIndexFile,
PermissionError, detectLanguage,
RepositoryNotFoundError INDEXABLE_EXTENSIONS,
} from './types.js'; MAX_FILE_SIZE_BYTES
} from './file-filter.js';
import { GitHubRateLimiter, Semaphore, withRetry } from './rate-limiter.js';
import { AuthenticationError, PermissionError, RepositoryNotFoundError } from './types.js';
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------
// Mock fetch helpers // Mock fetch helpers
@@ -112,7 +113,9 @@ describe('shouldIndexFile()', () => {
}); });
it('respects config.excludeFolders prefix', () => { it('respects config.excludeFolders prefix', () => {
expect(shouldIndexFile('internal/config.ts', 100, { excludeFolders: ['internal/'] })).toBe(false); expect(shouldIndexFile('internal/config.ts', 100, { excludeFolders: ['internal/'] })).toBe(
false
);
}); });
it('allows files outside of config.excludeFolders', () => { it('allows files outside of config.excludeFolders', () => {
@@ -169,8 +172,10 @@ describe('detectLanguage()', () => {
it('detects markdown', () => expect(detectLanguage('README.md')).toBe('markdown')); it('detects markdown', () => expect(detectLanguage('README.md')).toBe('markdown'));
it('detects svelte', () => expect(detectLanguage('App.svelte')).toBe('svelte')); it('detects svelte', () => expect(detectLanguage('App.svelte')).toBe('svelte'));
it('detects yaml', () => expect(detectLanguage('config.yaml')).toBe('yaml')); it('detects yaml', () => expect(detectLanguage('config.yaml')).toBe('yaml'));
it('returns empty string for unknown extension', () => expect(detectLanguage('file.xyz')).toBe('')); it('returns empty string for unknown extension', () =>
it('is case-insensitive for extensions', () => expect(detectLanguage('FILE.TS')).toBe('typescript')); expect(detectLanguage('file.xyz')).toBe(''));
it('is case-insensitive for extensions', () =>
expect(detectLanguage('FILE.TS')).toBe('typescript'));
}); });
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------
@@ -267,9 +272,9 @@ describe('withRetry()', () => {
}); });
it('throws after exhausting all attempts', async () => { it('throws after exhausting all attempts', async () => {
await expect( await expect(withRetry(() => Promise.reject(new Error('always fails')), 3)).rejects.toThrow(
withRetry(() => Promise.reject(new Error('always fails')), 3) 'always fails'
).rejects.toThrow('always fails'); );
}); });
}); });
@@ -425,14 +430,15 @@ describe('crawl()', () => {
}); });
it('throws AuthenticationError on 401', async () => { it('throws AuthenticationError on 401', async () => {
stubFetch(() => stubFetch(
new Response('Unauthorized', { () =>
status: 401, new Response('Unauthorized', {
headers: { status: 401,
'X-RateLimit-Remaining': '0', headers: {
'X-RateLimit-Reset': String(Math.floor(Date.now() / 1000) + 3600) 'X-RateLimit-Remaining': '0',
} 'X-RateLimit-Reset': String(Math.floor(Date.now() / 1000) + 3600)
}) }
})
); );
await expect(crawl({ owner: 'owner', repo: 'repo', token: 'bad-token' })).rejects.toThrow( await expect(crawl({ owner: 'owner', repo: 'repo', token: 'bad-token' })).rejects.toThrow(
@@ -441,14 +447,15 @@ describe('crawl()', () => {
}); });
it('throws PermissionError on 403 without rate-limit exhaustion', async () => { it('throws PermissionError on 403 without rate-limit exhaustion', async () => {
stubFetch(() => stubFetch(
new Response('Forbidden', { () =>
status: 403, new Response('Forbidden', {
headers: { status: 403,
'X-RateLimit-Remaining': '100', headers: {
'X-RateLimit-Reset': String(Math.floor(Date.now() / 1000) + 3600) 'X-RateLimit-Remaining': '100',
} 'X-RateLimit-Reset': String(Math.floor(Date.now() / 1000) + 3600)
}) }
})
); );
await expect(crawl({ owner: 'owner', repo: 'repo' })).rejects.toThrow(PermissionError); await expect(crawl({ owner: 'owner', repo: 'repo' })).rejects.toThrow(PermissionError);

View File

@@ -106,9 +106,7 @@ async function throwForStatus(response: Response, rateLimiter: GitHubRateLimiter
); );
} }
case 404: case 404:
throw new RepositoryNotFoundError( throw new RepositoryNotFoundError(`Repository not found or not accessible: ${response.url}`);
`Repository not found or not accessible: ${response.url}`
);
default: { default: {
const body = await response.text().catch(() => ''); const body = await response.text().catch(() => '');
throw new Error(`GitHub API error ${response.status}: ${body}`); throw new Error(`GitHub API error ${response.status}: ${body}`);
@@ -129,18 +127,22 @@ async function fetchRepoInfo(
token: string | undefined, token: string | undefined,
rateLimiter: GitHubRateLimiter rateLimiter: GitHubRateLimiter
): Promise<GitHubRepoResponse> { ): Promise<GitHubRepoResponse> {
return withRetry(async () => { return withRetry(
await rateLimiter.waitIfNeeded(); async () => {
await rateLimiter.waitIfNeeded();
const response = await fetch(`${GITHUB_API}/repos/${owner}/${repo}`, { const response = await fetch(`${GITHUB_API}/repos/${owner}/${repo}`, {
headers: buildHeaders(token) headers: buildHeaders(token)
}); });
rateLimiter.updateFromHeaders(response.headers); rateLimiter.updateFromHeaders(response.headers);
await throwForStatus(response, rateLimiter); await throwForStatus(response, rateLimiter);
return (await response.json()) as GitHubRepoResponse; return (await response.json()) as GitHubRepoResponse;
}, 3, isRetryable); },
3,
isRetryable
);
} }
/** /**
@@ -155,21 +157,25 @@ async function fetchTree(
token: string | undefined, token: string | undefined,
rateLimiter: GitHubRateLimiter rateLimiter: GitHubRateLimiter
): Promise<GitHubTreeResponse | null> { ): Promise<GitHubTreeResponse | null> {
return withRetry(async () => { return withRetry(
await rateLimiter.waitIfNeeded(); async () => {
await rateLimiter.waitIfNeeded();
const url = `${GITHUB_API}/repos/${owner}/${repo}/git/trees/${ref}?recursive=1`; const url = `${GITHUB_API}/repos/${owner}/${repo}/git/trees/${ref}?recursive=1`;
const response = await fetch(url, { headers: buildHeaders(token) }); const response = await fetch(url, { headers: buildHeaders(token) });
rateLimiter.updateFromHeaders(response.headers); rateLimiter.updateFromHeaders(response.headers);
// 422 means the tree is too large for a single recursive call. // 422 means the tree is too large for a single recursive call.
if (response.status === 422) return null; if (response.status === 422) return null;
await throwForStatus(response, rateLimiter); await throwForStatus(response, rateLimiter);
return (await response.json()) as GitHubTreeResponse; return (await response.json()) as GitHubTreeResponse;
}, 3, isRetryable); },
3,
isRetryable
);
} }
/** /**
@@ -184,17 +190,21 @@ async function fetchSubTree(
token: string | undefined, token: string | undefined,
rateLimiter: GitHubRateLimiter rateLimiter: GitHubRateLimiter
): Promise<GitHubTreeResponse> { ): Promise<GitHubTreeResponse> {
return withRetry(async () => { return withRetry(
await rateLimiter.waitIfNeeded(); async () => {
await rateLimiter.waitIfNeeded();
const url = `${GITHUB_API}/repos/${owner}/${repo}/git/trees/${treeSha}`; const url = `${GITHUB_API}/repos/${owner}/${repo}/git/trees/${treeSha}`;
const response = await fetch(url, { headers: buildHeaders(token) }); const response = await fetch(url, { headers: buildHeaders(token) });
rateLimiter.updateFromHeaders(response.headers); rateLimiter.updateFromHeaders(response.headers);
await throwForStatus(response, rateLimiter); await throwForStatus(response, rateLimiter);
return (await response.json()) as GitHubTreeResponse; return (await response.json()) as GitHubTreeResponse;
}, 3, isRetryable); },
3,
isRetryable
);
} }
/** /**
@@ -208,21 +218,25 @@ async function fetchCommitSha(
token: string | undefined, token: string | undefined,
rateLimiter: GitHubRateLimiter rateLimiter: GitHubRateLimiter
): Promise<string> { ): Promise<string> {
return withRetry(async () => { return withRetry(
await rateLimiter.waitIfNeeded(); async () => {
await rateLimiter.waitIfNeeded();
const url = `${GITHUB_API}/repos/${owner}/${repo}/commits/${ref}`; const url = `${GITHUB_API}/repos/${owner}/${repo}/commits/${ref}`;
const response = await fetch(url, { const response = await fetch(url, {
headers: { ...buildHeaders(token), Accept: 'application/vnd.github.sha' } headers: { ...buildHeaders(token), Accept: 'application/vnd.github.sha' }
}); });
rateLimiter.updateFromHeaders(response.headers); rateLimiter.updateFromHeaders(response.headers);
await throwForStatus(response, rateLimiter); await throwForStatus(response, rateLimiter);
// When Accept is 'application/vnd.github.sha', the response body is the // When Accept is 'application/vnd.github.sha', the response body is the
// bare SHA string. // bare SHA string.
return (await response.text()).trim(); return (await response.text()).trim();
}, 3, isRetryable); },
3,
isRetryable
);
} }
/** /**
@@ -347,14 +361,7 @@ async function fetchRepoConfig(
const content = const content =
(await downloadRawFile(owner, repo, ref, configItem.path, token)) ?? (await downloadRawFile(owner, repo, ref, configItem.path, token)) ??
(await downloadViaContentsApi( (await downloadViaContentsApi(owner, repo, ref, configItem.path, token, rateLimiter));
owner,
repo,
ref,
configItem.path,
token,
rateLimiter
));
if (!content) return undefined; if (!content) return undefined;
@@ -435,14 +442,7 @@ export async function crawl(options: CrawlOptions): Promise<CrawlResult> {
// Prefer raw download (cheaper on rate limit); fall back to API. // Prefer raw download (cheaper on rate limit); fall back to API.
const content = const content =
(await downloadRawFile(owner, repo, ref!, item.path, token)) ?? (await downloadRawFile(owner, repo, ref!, item.path, token)) ??
(await downloadViaContentsApi( (await downloadViaContentsApi(owner, repo, ref!, item.path, token, rateLimiter));
owner,
repo,
ref!,
item.path,
token,
rateLimiter
));
if (content === null) { if (content === null) {
console.warn(`[GitHubCrawler] Could not download: ${item.path} — skipping.`); console.warn(`[GitHubCrawler] Could not download: ${item.path} — skipping.`);

View File

@@ -52,7 +52,9 @@ async function cleanupTempRepo(root: string): Promise<void> {
let root: string = ''; let root: string = '';
const crawler = new LocalCrawler(); const crawler = new LocalCrawler();
async function crawlRoot(opts: Partial<LocalCrawlOptions> = {}): Promise<ReturnType<LocalCrawler['crawl']>> { async function crawlRoot(
opts: Partial<LocalCrawlOptions> = {}
): Promise<ReturnType<LocalCrawler['crawl']>> {
return crawler.crawl({ rootPath: root, ...opts }); return crawler.crawl({ rootPath: root, ...opts });
} }

View File

@@ -141,7 +141,12 @@ export class LocalCrawler {
}); });
// Crawl the worktree and stamp the result with the git-resolved metadata. // Crawl the worktree and stamp the result with the git-resolved metadata.
const result = await this.crawlDirectory(worktreePath, options.config, options.onProgress, ref); const result = await this.crawlDirectory(
worktreePath,
options.config,
options.onProgress,
ref
);
return { ...result, commitSha }; return { ...result, commitSha };
} finally { } finally {

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,27 +1,27 @@
{ {
"version": "7", "version": "7",
"dialect": "sqlite", "dialect": "sqlite",
"entries": [ "entries": [
{ {
"idx": 0, "idx": 0,
"version": "6", "version": "6",
"when": 1774196053634, "when": 1774196053634,
"tag": "0000_large_master_chief", "tag": "0000_large_master_chief",
"breakpoints": true "breakpoints": true
}, },
{ {
"idx": 1, "idx": 1,
"version": "6", "version": "6",
"when": 1774448049161, "when": 1774448049161,
"tag": "0001_quick_nighthawk", "tag": "0001_quick_nighthawk",
"breakpoints": true "breakpoints": true
}, },
{ {
"idx": 2, "idx": 2,
"version": "6", "version": "6",
"when": 1774461897742, "when": 1774461897742,
"tag": "0002_silky_stellaris", "tag": "0002_silky_stellaris",
"breakpoints": true "breakpoints": true
} }
] ]
} }

View File

@@ -381,7 +381,12 @@ describe('EmbeddingService', () => {
.all(snippetId, 'local-default'); .all(snippetId, 'local-default');
expect(rows).toHaveLength(1); expect(rows).toHaveLength(1);
const row = rows[0] as { model: string; dimensions: number; embedding: Buffer; profile_id: string }; const row = rows[0] as {
model: string;
dimensions: number;
embedding: Buffer;
profile_id: string;
};
expect(row.model).toBe('test-model'); expect(row.model).toBe('test-model');
expect(row.dimensions).toBe(4); expect(row.dimensions).toBe(4);
expect(row.profile_id).toBe('local-default'); expect(row.profile_id).toBe('local-default');
@@ -494,9 +499,7 @@ describe('createProviderFromConfig', () => {
}); });
it('throws when openai provider is selected without config', () => { it('throws when openai provider is selected without config', () => {
expect(() => expect(() => createProviderFromConfig({ provider: 'openai' } as EmbeddingConfig)).toThrow();
createProviderFromConfig({ provider: 'openai' } as EmbeddingConfig)
).toThrow();
}); });
it('defaultEmbeddingConfig returns provider=none', () => { it('defaultEmbeddingConfig returns provider=none', () => {

View File

@@ -41,18 +41,16 @@ export class EmbeddingService {
const placeholders = snippetIds.map(() => '?').join(','); const placeholders = snippetIds.map(() => '?').join(',');
const snippets = this.db const snippets = this.db
.prepare<string[], SnippetRow>( .prepare<
`SELECT id, title, breadcrumb, content FROM snippets WHERE id IN (${placeholders})` string[],
) SnippetRow
>(`SELECT id, title, breadcrumb, content FROM snippets WHERE id IN (${placeholders})`)
.all(...snippetIds); .all(...snippetIds);
if (snippets.length === 0) return; if (snippets.length === 0) return;
const texts = snippets.map((s) => const texts = snippets.map((s) =>
[s.title, s.breadcrumb, s.content] [s.title, s.breadcrumb, s.content].filter(Boolean).join('\n').slice(0, TEXT_MAX_CHARS)
.filter(Boolean)
.join('\n')
.slice(0, TEXT_MAX_CHARS)
); );
const insert = this.db.prepare<[string, string, string, number, Buffer]>(` const insert = this.db.prepare<[string, string, string, number, Buffer]>(`
@@ -94,9 +92,10 @@ export class EmbeddingService {
*/ */
getEmbedding(snippetId: string, profileId: string = 'local-default'): Float32Array | null { getEmbedding(snippetId: string, profileId: string = 'local-default'): Float32Array | null {
const row = this.db const row = this.db
.prepare<[string, string], { embedding: Buffer; dimensions: number }>( .prepare<
`SELECT embedding, dimensions FROM snippet_embeddings WHERE snippet_id = ? AND profile_id = ?` [string, string],
) { embedding: Buffer; dimensions: number }
>(`SELECT embedding, dimensions FROM snippet_embeddings WHERE snippet_id = ? AND profile_id = ?`)
.get(snippetId, profileId); .get(snippetId, profileId);
if (!row) return null; if (!row) return null;

View File

@@ -12,7 +12,11 @@ import { OpenAIEmbeddingProvider } from './openai.provider.js';
import { LocalEmbeddingProvider } from './local.provider.js'; import { LocalEmbeddingProvider } from './local.provider.js';
// Re-export registry functions for new callers // Re-export registry functions for new callers
export { createProviderFromProfile, getDefaultLocalProfile, getRegisteredProviderKinds } from './registry.js'; export {
createProviderFromProfile,
getDefaultLocalProfile,
getRegisteredProviderKinds
} from './registry.js';
export interface EmbeddingConfig { export interface EmbeddingConfig {
provider: 'openai' | 'local' | 'none'; provider: 'openai' | 'local' | 'none';

View File

@@ -43,7 +43,12 @@ export class ContextResponseMapper {
lastUpdateDate: repository.lastIndexedAt lastUpdateDate: repository.lastIndexedAt
? repository.lastIndexedAt.toISOString() ? repository.lastIndexedAt.toISOString()
: null, : null,
state: repository.state === 'indexed' ? 'finalized' : repository.state === 'error' ? 'error' : 'initial', state:
repository.state === 'indexed'
? 'finalized'
: repository.state === 'error'
? 'error'
: 'initial',
totalTokens: repository.totalTokens ?? null, totalTokens: repository.totalTokens ?? null,
totalSnippets: repository.totalSnippets ?? null, totalSnippets: repository.totalSnippets ?? null,
stars: repository.stars ?? null, stars: repository.stars ?? null,
@@ -64,14 +69,16 @@ export class ContextResponseMapper {
const mapped: SnippetJsonDto[] = snippets.map(({ snippet }) => { const mapped: SnippetJsonDto[] = snippets.map(({ snippet }) => {
const origin = metadata?.repository const origin = metadata?.repository
? new SnippetOriginJsonDto({ ? new SnippetOriginJsonDto({
repositoryId: snippet.repositoryId, repositoryId: snippet.repositoryId,
repositoryTitle: metadata.repository.title, repositoryTitle: metadata.repository.title,
source: metadata.repository.source, source: metadata.repository.source,
sourceUrl: metadata.repository.sourceUrl, sourceUrl: metadata.repository.sourceUrl,
version: snippet.versionId ? metadata.snippetVersions[snippet.versionId] ?? null : null, version: snippet.versionId
versionId: snippet.versionId, ? (metadata.snippetVersions[snippet.versionId] ?? null)
isLocal: metadata.localSource : null,
}) versionId: snippet.versionId,
isLocal: metadata.localSource
})
: null; : null;
if (snippet.type === 'code') { if (snippet.type === 'code') {
@@ -108,20 +115,20 @@ export class ContextResponseMapper {
localSource: metadata?.localSource ?? false, localSource: metadata?.localSource ?? false,
repository: metadata?.repository repository: metadata?.repository
? new ContextRepositoryJsonDto({ ? new ContextRepositoryJsonDto({
id: metadata.repository.id, id: metadata.repository.id,
title: metadata.repository.title, title: metadata.repository.title,
source: metadata.repository.source, source: metadata.repository.source,
sourceUrl: metadata.repository.sourceUrl, sourceUrl: metadata.repository.sourceUrl,
branch: metadata.repository.branch, branch: metadata.repository.branch,
isLocal: metadata.localSource isLocal: metadata.localSource
}) })
: null, : null,
version: metadata?.version version: metadata?.version
? new ContextVersionJsonDto({ ? new ContextVersionJsonDto({
requested: metadata.version.requested, requested: metadata.version.requested,
resolved: metadata.version.resolved, resolved: metadata.version.resolved,
id: metadata.version.id id: metadata.version.id
}) })
: null, : null,
resultCount: metadata?.resultCount ?? snippets.length resultCount: metadata?.resultCount ?? snippets.length
}); });

View File

@@ -12,8 +12,7 @@ export class IndexingJobMapper {
processedFiles: entity.processed_files, processedFiles: entity.processed_files,
error: entity.error, error: entity.error,
startedAt: entity.started_at != null ? new Date(entity.started_at * 1000) : null, startedAt: entity.started_at != null ? new Date(entity.started_at * 1000) : null,
completedAt: completedAt: entity.completed_at != null ? new Date(entity.completed_at * 1000) : null,
entity.completed_at != null ? new Date(entity.completed_at * 1000) : null,
createdAt: new Date(entity.created_at * 1000) createdAt: new Date(entity.created_at * 1000)
}); });
} }

View File

@@ -1,4 +1,8 @@
import { LibrarySearchResult, SnippetRepositoryRef, SnippetSearchResult } from '$lib/server/models/search-result.js'; import {
LibrarySearchResult,
SnippetRepositoryRef,
SnippetSearchResult
} from '$lib/server/models/search-result.js';
import { RepositoryMapper } from '$lib/server/mappers/repository.mapper.js'; import { RepositoryMapper } from '$lib/server/mappers/repository.mapper.js';
import { RepositoryVersionMapper } from '$lib/server/mappers/repository-version.mapper.js'; import { RepositoryVersionMapper } from '$lib/server/mappers/repository-version.mapper.js';
import { SnippetMapper } from '$lib/server/mappers/snippet.mapper.js'; import { SnippetMapper } from '$lib/server/mappers/snippet.mapper.js';
@@ -26,9 +30,7 @@ export class SearchResultMapper {
): LibrarySearchResult { ): LibrarySearchResult {
return new LibrarySearchResult({ return new LibrarySearchResult({
repository: RepositoryMapper.fromEntity(repositoryEntity), repository: RepositoryMapper.fromEntity(repositoryEntity),
versions: versionEntities.map((version) => versions: versionEntities.map((version) => RepositoryVersionMapper.fromEntity(version)),
RepositoryVersionMapper.fromEntity(version)
),
score score
}); });
} }

View File

@@ -71,7 +71,7 @@ export class SnippetOriginJsonDto {
this.versionId = props.versionId; this.versionId = props.versionId;
this.isLocal = props.isLocal; this.isLocal = props.isLocal;
} }
} }
export class LibrarySearchJsonResultDto { export class LibrarySearchJsonResultDto {
id: string; id: string;

View File

@@ -286,7 +286,8 @@ This is the second paragraph that also has enough content to be included here.
}); });
it('skips paragraphs shorter than 20 characters', () => { it('skips paragraphs shorter than 20 characters', () => {
const content = 'Short.\n\nThis is a much longer paragraph that definitely passes the minimum length filter.'; const content =
'Short.\n\nThis is a much longer paragraph that definitely passes the minimum length filter.';
const snippets = parseCodeFile(content, 'notes.txt', 'text'); const snippets = parseCodeFile(content, 'notes.txt', 'text');
expect(snippets.length).toBe(1); expect(snippets.length).toBe(1);
}); });
@@ -331,7 +332,10 @@ export function realFunction(): string {
describe('parseCodeFile — token count', () => { describe('parseCodeFile — token count', () => {
it('all snippets have tokenCount within MAX_TOKENS', () => { it('all snippets have tokenCount within MAX_TOKENS', () => {
const lines = Array.from({ length: 300 }, (_, i) => `// comment line number ${i} here\nconst x${i} = ${i};`); const lines = Array.from(
{ length: 300 },
(_, i) => `// comment line number ${i} here\nconst x${i} = ${i};`
);
const content = lines.join('\n'); const content = lines.join('\n');
const snippets = parseCodeFile(content, 'large.ts', 'typescript'); const snippets = parseCodeFile(content, 'large.ts', 'typescript');

View File

@@ -26,15 +26,19 @@ import {
* The regex is tested line-by-line (multiline flag not needed). * The regex is tested line-by-line (multiline flag not needed).
*/ */
export const BOUNDARY_PATTERNS: Record<string, RegExp> = { export const BOUNDARY_PATTERNS: Record<string, RegExp> = {
typescript: /^(export\s+)?(declare\s+)?(async\s+)?(function|class|interface|type|enum|const|let|var)\s+\w+/, typescript:
/^(export\s+)?(declare\s+)?(async\s+)?(function|class|interface|type|enum|const|let|var)\s+\w+/,
javascript: /^(export\s+)?(async\s+)?(function|class|const|let|var)\s+\w+/, javascript: /^(export\s+)?(async\s+)?(function|class|const|let|var)\s+\w+/,
python: /^(async\s+)?(def|class)\s+\w+/, python: /^(async\s+)?(def|class)\s+\w+/,
go: /^(func|type|var|const)\s+\w+/, go: /^(func|type|var|const)\s+\w+/,
rust: /^(pub(\s*\(crate\))?\s+)?(async\s+)?(fn|impl|struct|enum|trait|type|const|static)\s+\w+/, rust: /^(pub(\s*\(crate\))?\s+)?(async\s+)?(fn|impl|struct|enum|trait|type|const|static)\s+\w+/,
java: /^(\s*(public|private|protected|static|final|abstract|synchronized)\s+)+[\w<>\[\]]+\s+\w+\s*[({]/, java: /^(\s*(public|private|protected|static|final|abstract|synchronized)\s+)+[\w<>\[\]]+\s+\w+\s*[({]/,
csharp: /^(\s*(public|private|protected|internal|static|override|virtual|abstract|sealed)\s+)+[\w<>\[\]]+\s+\w+\s*[({]/, csharp:
kotlin: /^(\s*(public|private|protected|internal|override|suspend|inline|open|abstract|sealed)\s+)*(fun|class|object|interface|data class|sealed class|enum class)\s+\w+/, /^(\s*(public|private|protected|internal|static|override|virtual|abstract|sealed)\s+)+[\w<>\[\]]+\s+\w+\s*[({]/,
swift: /^(\s*(public|private|internal|fileprivate|open|override|static|final|class)\s+)*(func|class|struct|enum|protocol|extension)\s+\w+/, kotlin:
/^(\s*(public|private|protected|internal|override|suspend|inline|open|abstract|sealed)\s+)*(fun|class|object|interface|data class|sealed class|enum class)\s+\w+/,
swift:
/^(\s*(public|private|internal|fileprivate|open|override|static|final|class)\s+)*(func|class|struct|enum|protocol|extension)\s+\w+/,
ruby: /^(def|class|module)\s+\w+/ ruby: /^(def|class|module)\s+\w+/
}; };
@@ -42,7 +46,10 @@ export const BOUNDARY_PATTERNS: Record<string, RegExp> = {
// Internal types // Internal types
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------
type RawSnippet = Omit<NewSnippet, 'id' | 'repositoryId' | 'documentId' | 'versionId' | 'createdAt'>; type RawSnippet = Omit<
NewSnippet,
'id' | 'repositoryId' | 'documentId' | 'versionId' | 'createdAt'
>;
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------
// Helpers // Helpers
@@ -161,7 +168,10 @@ function parseHtmlLikeFile(content: string, filePath: string, language: string):
while ((match = scriptPattern.exec(content)) !== null) { while ((match = scriptPattern.exec(content)) !== null) {
// Strip the outer tags, keep just the code // Strip the outer tags, keep just the code
const inner = match[0].replace(/^<script[^>]*>/, '').replace(/<\/script>$/, '').trim(); const inner = match[0]
.replace(/^<script[^>]*>/, '')
.replace(/<\/script>$/, '')
.trim();
if (inner.length >= MIN_CONTENT_LENGTH) { if (inner.length >= MIN_CONTENT_LENGTH) {
scriptBlocks.push(inner); scriptBlocks.push(inner);
} }

View File

@@ -48,6 +48,13 @@ export function parseFile(file: CrawledFile, options: ParseOptions): NewSnippet[
// Re-export helpers for consumers that need them individually // Re-export helpers for consumers that need them individually
export { detectLanguage } from './language.js'; export { detectLanguage } from './language.js';
export { estimateTokens, chunkText, chunkLines, MAX_TOKENS, OVERLAP_TOKENS, MIN_CONTENT_LENGTH } from './chunker.js'; export {
estimateTokens,
chunkText,
chunkLines,
MAX_TOKENS,
OVERLAP_TOKENS,
MIN_CONTENT_LENGTH
} from './chunker.js';
export { parseMarkdown } from './markdown.parser.js'; export { parseMarkdown } from './markdown.parser.js';
export { parseCodeFile, BOUNDARY_PATTERNS } from './code.parser.js'; export { parseCodeFile, BOUNDARY_PATTERNS } from './code.parser.js';

View File

@@ -99,7 +99,10 @@ describe('parseMarkdown — section splitting', () => {
describe('parseMarkdown — code block extraction', () => { describe('parseMarkdown — code block extraction', () => {
it('extracts a fenced code block as a code snippet', () => { it('extracts a fenced code block as a code snippet', () => {
const codeBlock = fence('typescript', 'function hello(name: string): string {\n return `Hello, ${name}!`;\n}'); const codeBlock = fence(
'typescript',
'function hello(name: string): string {\n return `Hello, ${name}!`;\n}'
);
const source = [ const source = [
'# Example', '# Example',
'', '',
@@ -232,7 +235,10 @@ describe('parseMarkdown — large content chunking', () => {
describe('parseMarkdown — real-world sample', () => { describe('parseMarkdown — real-world sample', () => {
it('correctly parses a realistic README excerpt', () => { it('correctly parses a realistic README excerpt', () => {
const bashInstall = fence('bash', 'npm install my-library'); const bashInstall = fence('bash', 'npm install my-library');
const tsUsage = fence('typescript', "import { doTheThing } from 'my-library';\n\ndoTheThing({ verbose: true });"); const tsUsage = fence(
'typescript',
"import { doTheThing } from 'my-library';\n\ndoTheThing({ verbose: true });"
);
const source = [ const source = [
'# My Library', '# My Library',

View File

@@ -7,7 +7,13 @@
import { basename } from 'node:path'; import { basename } from 'node:path';
import type { NewSnippet } from '$lib/server/db/schema.js'; import type { NewSnippet } from '$lib/server/db/schema.js';
import { estimateTokens, chunkText, MAX_TOKENS, OVERLAP_TOKENS, MIN_CONTENT_LENGTH } from './chunker.js'; import {
estimateTokens,
chunkText,
MAX_TOKENS,
OVERLAP_TOKENS,
MIN_CONTENT_LENGTH
} from './chunker.js';
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------
// Internal types // Internal types
@@ -121,7 +127,10 @@ function splitIntoSections(source: string): MarkdownSection[] {
// Public parser // Public parser
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------
type RawSnippet = Omit<NewSnippet, 'id' | 'repositoryId' | 'documentId' | 'versionId' | 'createdAt'>; type RawSnippet = Omit<
NewSnippet,
'id' | 'repositoryId' | 'documentId' | 'versionId' | 'createdAt'
>;
/** /**
* Parse a Markdown/MDX file into raw snippets (before IDs and DB fields are * Parse a Markdown/MDX file into raw snippets (before IDs and DB fields are

View File

@@ -86,16 +86,16 @@ describe('computeDiff', () => {
it('handles a mixed scenario: added, modified, deleted, and unchanged', () => { it('handles a mixed scenario: added, modified, deleted, and unchanged', () => {
const crawledFiles = [ const crawledFiles = [
makeCrawledFile('unchanged.md', 'sha-same'), // unchanged makeCrawledFile('unchanged.md', 'sha-same'), // unchanged
makeCrawledFile('modified.md', 'sha-new'), // modified (different sha) makeCrawledFile('modified.md', 'sha-new'), // modified (different sha)
makeCrawledFile('added.md', 'sha-added') // added (not in DB) makeCrawledFile('added.md', 'sha-added') // added (not in DB)
// 'deleted.md' is absent from crawl → deleted // 'deleted.md' is absent from crawl → deleted
]; ];
const existingDocs = [ const existingDocs = [
makeDocument('unchanged.md', 'sha-same'), // unchanged makeDocument('unchanged.md', 'sha-same'), // unchanged
makeDocument('modified.md', 'sha-old'), // modified makeDocument('modified.md', 'sha-old'), // modified
makeDocument('deleted.md', 'sha-deleted') // deleted makeDocument('deleted.md', 'sha-deleted') // deleted
]; ];
const diff = computeDiff(crawledFiles, existingDocs); const diff = computeDiff(crawledFiles, existingDocs);
@@ -114,9 +114,9 @@ describe('computeDiff', () => {
]; ];
const existingDocs = [ const existingDocs = [
makeDocument('a.md', 'sha-a'), // unchanged makeDocument('a.md', 'sha-a'), // unchanged
makeDocument('b.md', 'sha-b-old'), // modified makeDocument('b.md', 'sha-b-old'), // modified
makeDocument('d.md', 'sha-d') // deleted makeDocument('d.md', 'sha-d') // deleted
// 'c.md' is not in DB → added // 'c.md' is not in DB → added
]; ];

View File

@@ -22,10 +22,7 @@ function createTestDb(): Database.Database {
client.pragma('foreign_keys = ON'); client.pragma('foreign_keys = ON');
const migrationsFolder = join(import.meta.dirname, '../db/migrations'); const migrationsFolder = join(import.meta.dirname, '../db/migrations');
const migrationSql = readFileSync( const migrationSql = readFileSync(join(migrationsFolder, '0000_large_master_chief.sql'), 'utf-8');
join(migrationsFolder, '0000_large_master_chief.sql'),
'utf-8'
);
const statements = migrationSql const statements = migrationSql
.split('--> statement-breakpoint') .split('--> statement-breakpoint')
@@ -45,10 +42,7 @@ function createTestDb(): Database.Database {
const now = Math.floor(Date.now() / 1000); const now = Math.floor(Date.now() / 1000);
function insertRepo( function insertRepo(db: Database.Database, overrides: Partial<Record<string, unknown>> = {}): void {
db: Database.Database,
overrides: Partial<Record<string, unknown>> = {}
): void {
db.prepare( db.prepare(
`INSERT INTO repositories `INSERT INTO repositories
(id, title, source, source_url, branch, state, (id, title, source, source_url, branch, state,
@@ -62,7 +56,15 @@ function insertRepo(
overrides.source_url ?? '/tmp/test-repo', overrides.source_url ?? '/tmp/test-repo',
overrides.branch ?? 'main', overrides.branch ?? 'main',
overrides.state ?? 'pending', overrides.state ?? 'pending',
0, 0, 0, 0, null, null, null, now, now 0,
0,
0,
0,
null,
null,
null,
now,
now
); );
} }
@@ -108,9 +110,10 @@ describe('recoverStaleJobs', () => {
insertJob(db, { status: 'running' }); insertJob(db, { status: 'running' });
recoverStaleJobs(db); recoverStaleJobs(db);
const row = db const row = db.prepare(`SELECT status, error FROM indexing_jobs LIMIT 1`).get() as {
.prepare(`SELECT status, error FROM indexing_jobs LIMIT 1`) status: string;
.get() as { status: string; error: string }; error: string;
};
expect(row.status).toBe('failed'); expect(row.status).toBe('failed');
expect(row.error).toMatch(/restarted/i); expect(row.error).toMatch(/restarted/i);
}); });
@@ -119,9 +122,9 @@ describe('recoverStaleJobs', () => {
db.prepare(`UPDATE repositories SET state = 'indexing' WHERE id = '/test/repo'`).run(); db.prepare(`UPDATE repositories SET state = 'indexing' WHERE id = '/test/repo'`).run();
recoverStaleJobs(db); recoverStaleJobs(db);
const row = db const row = db.prepare(`SELECT state FROM repositories WHERE id = '/test/repo'`).get() as {
.prepare(`SELECT state FROM repositories WHERE id = '/test/repo'`) state: string;
.get() as { state: string }; };
expect(row.state).toBe('error'); expect(row.state).toBe('error');
}); });
@@ -164,9 +167,7 @@ describe('JobQueue', () => {
const job2 = queue.enqueue('/test/repo'); const job2 = queue.enqueue('/test/repo');
expect(job1.id).toBe(job2.id); expect(job1.id).toBe(job2.id);
const count = ( const count = (db.prepare(`SELECT COUNT(*) as n FROM indexing_jobs`).get() as { n: number }).n;
db.prepare(`SELECT COUNT(*) as n FROM indexing_jobs`).get() as { n: number }
).n;
expect(count).toBe(1); expect(count).toBe(1);
}); });
@@ -255,19 +256,19 @@ describe('IndexingPipeline', () => {
}) })
}; };
return new IndexingPipeline( return new IndexingPipeline(db, mockGithubCrawl as never, mockLocalCrawler as never, null);
db,
mockGithubCrawl as never,
mockLocalCrawler as never,
null
);
} }
function makeJob(repositoryId = '/test/repo') { function makeJob(repositoryId = '/test/repo') {
const jobId = insertJob(db, { repository_id: repositoryId, status: 'queued' }); const jobId = insertJob(db, { repository_id: repositoryId, status: 'queued' });
return db return db.prepare(`SELECT * FROM indexing_jobs WHERE id = ?`).get(jobId) as {
.prepare(`SELECT * FROM indexing_jobs WHERE id = ?`) id: string;
.get(jobId) as { id: string; repositoryId?: string; repository_id?: string; status: string; versionId?: string; version_id?: string }; repositoryId?: string;
repository_id?: string;
status: string;
versionId?: string;
version_id?: string;
};
} }
it('marks job as done when there are no files to index', async () => { it('marks job as done when there are no files to index', async () => {
@@ -289,9 +290,9 @@ describe('IndexingPipeline', () => {
await pipeline.run(job as never); await pipeline.run(job as never);
const updated = db const updated = db.prepare(`SELECT status FROM indexing_jobs WHERE id = ?`).get(job.id) as {
.prepare(`SELECT status FROM indexing_jobs WHERE id = ?`) status: string;
.get(job.id) as { status: string }; };
// The job should end in 'done' — the running→done transition is covered // The job should end in 'done' — the running→done transition is covered
// by the pipeline's internal updateJob calls. // by the pipeline's internal updateJob calls.
expect(updated.status).toBe('done'); expect(updated.status).toBe('done');
@@ -363,27 +364,24 @@ describe('IndexingPipeline', () => {
const job1 = makeJob(); const job1 = makeJob();
await pipeline.run(job1 as never); await pipeline.run(job1 as never);
const firstDocCount = ( const firstDocCount = (db.prepare(`SELECT COUNT(*) as n FROM documents`).get() as { n: number })
db.prepare(`SELECT COUNT(*) as n FROM documents`).get() as { n: number } .n;
).n; const firstSnippetIds = (db.prepare(`SELECT id FROM snippets`).all() as { id: string }[]).map(
const firstSnippetIds = ( (r) => r.id
db.prepare(`SELECT id FROM snippets`).all() as { id: string }[] );
).map((r) => r.id);
// Second run with identical files. // Second run with identical files.
const job2Id = insertJob(db, { repository_id: '/test/repo', status: 'queued' }); const job2Id = insertJob(db, { repository_id: '/test/repo', status: 'queued' });
const job2 = db const job2 = db.prepare(`SELECT * FROM indexing_jobs WHERE id = ?`).get(job2Id) as never;
.prepare(`SELECT * FROM indexing_jobs WHERE id = ?`)
.get(job2Id) as never;
await pipeline.run(job2); await pipeline.run(job2);
const secondDocCount = ( const secondDocCount = (
db.prepare(`SELECT COUNT(*) as n FROM documents`).get() as { n: number } db.prepare(`SELECT COUNT(*) as n FROM documents`).get() as { n: number }
).n; ).n;
const secondSnippetIds = ( const secondSnippetIds = (db.prepare(`SELECT id FROM snippets`).all() as { id: string }[]).map(
db.prepare(`SELECT id FROM snippets`).all() as { id: string }[] (r) => r.id
).map((r) => r.id); );
// Document count stays the same and snippet IDs are unchanged. // Document count stays the same and snippet IDs are unchanged.
expect(secondDocCount).toBe(firstDocCount); expect(secondDocCount).toBe(firstDocCount);
@@ -395,7 +393,8 @@ describe('IndexingPipeline', () => {
files: [ files: [
{ {
path: 'README.md', path: 'README.md',
content: '# Original\n\nThis is the original version of the documentation with sufficient content.', content:
'# Original\n\nThis is the original version of the documentation with sufficient content.',
sha: 'sha-v1', sha: 'sha-v1',
language: 'markdown' language: 'markdown'
} }
@@ -415,7 +414,8 @@ describe('IndexingPipeline', () => {
files: [ files: [
{ {
path: 'README.md', path: 'README.md',
content: '# Updated\n\nThis is a completely different version of the documentation with new content.', content:
'# Updated\n\nThis is a completely different version of the documentation with new content.',
sha: 'sha-v2', sha: 'sha-v2',
language: 'markdown' language: 'markdown'
} }
@@ -423,14 +423,11 @@ describe('IndexingPipeline', () => {
totalFiles: 1 totalFiles: 1
}); });
const job2Id = insertJob(db, { repository_id: '/test/repo', status: 'queued' }); const job2Id = insertJob(db, { repository_id: '/test/repo', status: 'queued' });
const job2 = db const job2 = db.prepare(`SELECT * FROM indexing_jobs WHERE id = ?`).get(job2Id) as never;
.prepare(`SELECT * FROM indexing_jobs WHERE id = ?`)
.get(job2Id) as never;
await pipeline2.run(job2); await pipeline2.run(job2);
const finalDocCount = ( const finalDocCount = (db.prepare(`SELECT COUNT(*) as n FROM documents`).get() as { n: number })
db.prepare(`SELECT COUNT(*) as n FROM documents`).get() as { n: number } .n;
).n;
// Only one document should exist (the updated one). // Only one document should exist (the updated one).
expect(finalDocCount).toBe(1); expect(finalDocCount).toBe(1);
@@ -452,9 +449,9 @@ describe('IndexingPipeline', () => {
const job = makeJob(); const job = makeJob();
await pipeline.run(job as never); await pipeline.run(job as never);
const updated = db const updated = db.prepare(`SELECT progress FROM indexing_jobs WHERE id = ?`).get(job.id) as {
.prepare(`SELECT progress FROM indexing_jobs WHERE id = ?`) progress: number;
.get(job.id) as { progress: number }; };
expect(updated.progress).toBe(100); expect(updated.progress).toBe(100);
}); });
@@ -467,12 +464,7 @@ describe('IndexingPipeline', () => {
commitSha: 'abc' commitSha: 'abc'
}); });
const pipeline = new IndexingPipeline( const pipeline = new IndexingPipeline(db, vi.fn() as never, { crawl } as never, null);
db,
vi.fn() as never,
{ crawl } as never,
null
);
const job = makeJob(); const job = makeJob();
await pipeline.run(job as never); await pipeline.run(job as never);
@@ -511,7 +503,10 @@ describe('IndexingPipeline', () => {
await pipeline1.run(job1 as never); await pipeline1.run(job1 as never);
const afterFirstRun = { const afterFirstRun = {
docs: db.prepare(`SELECT file_path, checksum FROM documents ORDER BY file_path`).all() as { file_path: string; checksum: string }[], docs: db.prepare(`SELECT file_path, checksum FROM documents ORDER BY file_path`).all() as {
file_path: string;
checksum: string;
}[],
snippetCount: (db.prepare(`SELECT COUNT(*) as n FROM snippets`).get() as { n: number }).n snippetCount: (db.prepare(`SELECT COUNT(*) as n FROM snippets`).get() as { n: number }).n
}; };
expect(afterFirstRun.docs).toHaveLength(3); expect(afterFirstRun.docs).toHaveLength(3);

View File

@@ -250,7 +250,10 @@ export class IndexingPipeline {
private async crawl( private async crawl(
repo: Repository, repo: Repository,
job: IndexingJob job: IndexingJob
): Promise<{ files: Array<{ path: string; content: string; sha: string; size: number; language: string }>; totalFiles: number }> { ): Promise<{
files: Array<{ path: string; content: string; sha: string; size: number; language: string }>;
totalFiles: number;
}> {
if (repo.source === 'github') { if (repo.source === 'github') {
// Parse owner/repo from the canonical ID: "/owner/repo" // Parse owner/repo from the canonical ID: "/owner/repo"
const parts = repo.id.replace(/^\//, '').split('/'); const parts = repo.id.replace(/^\//, '').split('/');

View File

@@ -133,9 +133,7 @@ export class JobQueue {
// Check whether another job was queued while this one ran. // Check whether another job was queued while this one ran.
const next = this.db const next = this.db
.prepare<[], { id: string }>( .prepare<[], { id: string }>(`SELECT id FROM indexing_jobs WHERE status = 'queued' LIMIT 1`)
`SELECT id FROM indexing_jobs WHERE status = 'queued' LIMIT 1`
)
.get(); .get();
if (next) { if (next) {
setImmediate(() => this.processNext()); setImmediate(() => this.processNext());
@@ -147,9 +145,7 @@ export class JobQueue {
* Retrieve a single job by ID. * Retrieve a single job by ID.
*/ */
getJob(id: string): IndexingJob | null { getJob(id: string): IndexingJob | null {
const raw = this.db const raw = this.db.prepare<[string], IndexingJobEntity>(`${JOB_SELECT} WHERE id = ?`).get(id);
.prepare<[string], IndexingJobEntity>(`${JOB_SELECT} WHERE id = ?`)
.get(id);
return raw ? IndexingJobMapper.fromEntity(new IndexingJobEntity(raw)) : null; return raw ? IndexingJobMapper.fromEntity(new IndexingJobEntity(raw)) : null;
} }
@@ -178,9 +174,9 @@ export class JobQueue {
const sql = `${JOB_SELECT} ${where} ORDER BY created_at DESC LIMIT ?`; const sql = `${JOB_SELECT} ${where} ORDER BY created_at DESC LIMIT ?`;
params.push(limit); params.push(limit);
return (this.db.prepare<unknown[], IndexingJobEntity>(sql).all(...params) as IndexingJobEntity[]).map( return (
(row) => IndexingJobMapper.fromEntity(new IndexingJobEntity(row)) this.db.prepare<unknown[], IndexingJobEntity>(sql).all(...params) as IndexingJobEntity[]
); ).map((row) => IndexingJobMapper.fromEntity(new IndexingJobEntity(row)));
} }
/** /**
@@ -228,9 +224,7 @@ export class JobQueue {
return false; return false;
} }
this.db this.db.prepare(`UPDATE indexing_jobs SET status = 'paused' WHERE id = ?`).run(id);
.prepare(`UPDATE indexing_jobs SET status = 'paused' WHERE id = ?`)
.run(id);
return true; return true;
} }
@@ -249,9 +243,7 @@ export class JobQueue {
return false; return false;
} }
this.db this.db.prepare(`UPDATE indexing_jobs SET status = 'queued' WHERE id = ?`).run(id);
.prepare(`UPDATE indexing_jobs SET status = 'queued' WHERE id = ?`)
.run(id);
// Trigger queue processing in case the queue was idle // Trigger queue processing in case the queue was idle
this.drainQueued(); this.drainQueued();

View File

@@ -27,7 +27,11 @@ function createTestDb(): Database.Database {
const migrationsFolder = join(import.meta.dirname, '../db/migrations'); const migrationsFolder = join(import.meta.dirname, '../db/migrations');
// Run all migrations in order // Run all migrations in order
const migrations = ['0000_large_master_chief.sql', '0001_quick_nighthawk.sql', '0002_silky_stellaris.sql']; const migrations = [
'0000_large_master_chief.sql',
'0001_quick_nighthawk.sql',
'0002_silky_stellaris.sql'
];
for (const migrationFile of migrations) { for (const migrationFile of migrations) {
const migrationSql = readFileSync(join(migrationsFolder, migrationFile), 'utf-8'); const migrationSql = readFileSync(join(migrationsFolder, migrationFile), 'utf-8');
const statements = migrationSql const statements = migrationSql
@@ -123,9 +127,7 @@ function seedEmbedding(
// Mock EmbeddingProvider // Mock EmbeddingProvider
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------
function makeMockProvider( function makeMockProvider(returnValues: number[][] = [[1, 0, 0, 0]]): EmbeddingProvider {
returnValues: number[][] = [[1, 0, 0, 0]]
): EmbeddingProvider {
return { return {
name: 'mock', name: 'mock',
dimensions: returnValues[0]?.length ?? 4, dimensions: returnValues[0]?.length ?? 4,
@@ -254,9 +256,18 @@ describe('reciprocalRankFusion', () => {
}); });
it('handles three lists correctly', () => { it('handles three lists correctly', () => {
const r1 = [{ id: 'a', score: 1 }, { id: 'b', score: 0 }]; const r1 = [
const r2 = [{ id: 'b', score: 1 }, { id: 'c', score: 0 }]; { id: 'a', score: 1 },
const r3 = [{ id: 'a', score: 1 }, { id: 'c', score: 0 }]; { id: 'b', score: 0 }
];
const r2 = [
{ id: 'b', score: 1 },
{ id: 'c', score: 0 }
];
const r3 = [
{ id: 'a', score: 1 },
{ id: 'c', score: 0 }
];
const result = reciprocalRankFusion(r1, r2, r3); const result = reciprocalRankFusion(r1, r2, r3);
// 'a' appears first in r1 and r3 → higher combined score than 'b' or 'c'. // 'a' appears first in r1 and r3 → higher combined score than 'b' or 'c'.
expect(result[0].id).toBe('a'); expect(result[0].id).toBe('a');

View File

@@ -103,10 +103,7 @@ export class HybridSearchService {
* @param options - Search parameters including repositoryId and alpha blend. * @param options - Search parameters including repositoryId and alpha blend.
* @returns Ranked array of SnippetSearchResult, deduplicated by snippet ID. * @returns Ranked array of SnippetSearchResult, deduplicated by snippet ID.
*/ */
async search( async search(query: string, options: HybridSearchOptions): Promise<SnippetSearchResult[]> {
query: string,
options: HybridSearchOptions
): Promise<SnippetSearchResult[]> {
const limit = options.limit ?? 20; const limit = options.limit ?? 20;
const mode = options.searchMode ?? 'auto'; const mode = options.searchMode ?? 'auto';

View File

@@ -20,7 +20,7 @@
*/ */
export function preprocessQuery(raw: string): string { export function preprocessQuery(raw: string): string {
// 1. Trim and collapse whitespace. // 1. Trim and collapse whitespace.
let q = raw.trim().replace(/\s+/g, ' '); const q = raw.trim().replace(/\s+/g, ' ');
if (!q) return q; if (!q) return q;
@@ -91,7 +91,10 @@ export function preprocessQuery(raw: string): string {
} }
// Remove trailing operators // Remove trailing operators
while (finalTokens.length > 0 && ['AND', 'OR', 'NOT'].includes(finalTokens[finalTokens.length - 1])) { while (
finalTokens.length > 0 &&
['AND', 'OR', 'NOT'].includes(finalTokens[finalTokens.length - 1])
) {
finalTokens.pop(); finalTokens.pop();
} }

View File

@@ -32,9 +32,7 @@ export interface FusedItem {
* descending relevance (index 0 = most relevant). * descending relevance (index 0 = most relevant).
* @returns Fused array sorted by descending rrfScore, deduplicated by id. * @returns Fused array sorted by descending rrfScore, deduplicated by id.
*/ */
export function reciprocalRankFusion( export function reciprocalRankFusion(...rankings: Array<Array<RankedItem>>): Array<FusedItem> {
...rankings: Array<Array<RankedItem>>
): Array<FusedItem> {
const K = 60; // Standard RRF constant. const K = 60; // Standard RRF constant.
const scores = new Map<string, number>(); const scores = new Map<string, number>();

View File

@@ -674,7 +674,9 @@ describe('formatLibraryResults', () => {
id: '/facebook/react/v18', id: '/facebook/react/v18',
repositoryId: '/facebook/react', repositoryId: '/facebook/react',
tag: 'v18', tag: 'v18',
title: 'React 18', commitHash: null, state: 'indexed', title: 'React 18',
commitHash: null,
state: 'indexed',
totalSnippets: 1000, totalSnippets: 1000,
indexedAt: null, indexedAt: null,
createdAt: now createdAt: now
@@ -731,7 +733,9 @@ describe('formatLibraryResults', () => {
describe('formatSnippetResults', () => { describe('formatSnippetResults', () => {
const now = new Date(); const now = new Date();
function makeSnippetResult(overrides: Partial<Parameters<typeof formatSnippetResults>[0][number]> = {}): Parameters<typeof formatSnippetResults>[0][number] { function makeSnippetResult(
overrides: Partial<Parameters<typeof formatSnippetResults>[0][number]> = {}
): Parameters<typeof formatSnippetResults>[0][number] {
return { return {
snippet: { snippet: {
id: crypto.randomUUID(), id: crypto.randomUUID(),

View File

@@ -87,10 +87,7 @@ export class SearchService {
if (!processedQuery) return []; if (!processedQuery) return [];
// Build the WHERE clause dynamically based on optional filters. // Build the WHERE clause dynamically based on optional filters.
const conditions: string[] = [ const conditions: string[] = ['snippets_fts MATCH ?', 's.repository_id = ?'];
'snippets_fts MATCH ?',
's.repository_id = ?'
];
const params: unknown[] = [processedQuery, repositoryId]; const params: unknown[] = [processedQuery, repositoryId];
if (versionId !== undefined) { if (versionId !== undefined) {
@@ -132,10 +129,14 @@ export class SearchService {
const rows = this.db.prepare(sql).all(...params) as RawSnippetRow[]; const rows = this.db.prepare(sql).all(...params) as RawSnippetRow[];
return rows.map((row) => return rows.map((row) =>
SearchResultMapper.snippetFromEntity(new SnippetEntity(row), { SearchResultMapper.snippetFromEntity(
id: row.repo_id, new SnippetEntity(row),
title: row.repo_title {
}, row.score) id: row.repo_id,
title: row.repo_title
},
row.score
)
); );
} }
@@ -188,7 +189,11 @@ export class SearchService {
return rows.map((row) => { return rows.map((row) => {
const compositeScore = const compositeScore =
row.exact_match + row.prefix_match + row.desc_match + row.snippet_score + row.trust_component; row.exact_match +
row.prefix_match +
row.desc_match +
row.snippet_score +
row.trust_component;
return SearchResultMapper.libraryFromEntity( return SearchResultMapper.libraryFromEntity(
new RepositoryEntity(row), new RepositoryEntity(row),
this.getVersionEntities(row.id), this.getVersionEntities(row.id),
@@ -203,9 +208,7 @@ export class SearchService {
private getVersionEntities(repositoryId: string): RepositoryVersionEntity[] { private getVersionEntities(repositoryId: string): RepositoryVersionEntity[] {
return this.db return this.db
.prepare( .prepare(`SELECT * FROM repository_versions WHERE repository_id = ? ORDER BY created_at DESC`)
`SELECT * FROM repository_versions WHERE repository_id = ? ORDER BY created_at DESC`
)
.all(repositoryId) as RawVersionRow[]; .all(repositoryId) as RawVersionRow[];
} }
} }

View File

@@ -46,9 +46,7 @@ interface RawEmbeddingRow {
*/ */
export function cosineSimilarity(a: Float32Array, b: Float32Array): number { export function cosineSimilarity(a: Float32Array, b: Float32Array): number {
if (a.length !== b.length) { if (a.length !== b.length) {
throw new Error( throw new Error(`Embedding dimension mismatch: ${a.length} vs ${b.length}`);
`Embedding dimension mismatch: ${a.length} vs ${b.length}`
);
} }
let dot = 0; let dot = 0;

View File

@@ -27,10 +27,7 @@ function createTestDb(): Database.Database {
client.pragma('foreign_keys = ON'); client.pragma('foreign_keys = ON');
const migrationsFolder = join(import.meta.dirname, '../db/migrations'); const migrationsFolder = join(import.meta.dirname, '../db/migrations');
const migrationSql = readFileSync( const migrationSql = readFileSync(join(migrationsFolder, '0000_large_master_chief.sql'), 'utf-8');
join(migrationsFolder, '0000_large_master_chief.sql'),
'utf-8'
);
// Drizzle migration files use `--> statement-breakpoint` as separator. // Drizzle migration files use `--> statement-breakpoint` as separator.
const statements = migrationSql const statements = migrationSql
@@ -261,9 +258,7 @@ describe('RepositoryService.add()', () => {
}); });
it('throws InvalidInputError when sourceUrl is empty', () => { it('throws InvalidInputError when sourceUrl is empty', () => {
expect(() => expect(() => service.add({ source: 'github', sourceUrl: '' })).toThrow(InvalidInputError);
service.add({ source: 'github', sourceUrl: '' })
).toThrow(InvalidInputError);
}); });
it('stores description and branch when provided', () => { it('stores description and branch when provided', () => {
@@ -321,9 +316,7 @@ describe('RepositoryService.update()', () => {
}); });
it('throws NotFoundError for a non-existent repository', () => { it('throws NotFoundError for a non-existent repository', () => {
expect(() => expect(() => service.update('/not/found', { title: 'New Title' })).toThrow(NotFoundError);
service.update('/not/found', { title: 'New Title' })
).toThrow(NotFoundError);
}); });
}); });

View File

@@ -74,9 +74,7 @@ export class RepositoryService {
.get(state) as { n: number }; .get(state) as { n: number };
return row.n; return row.n;
} }
const row = this.db const row = this.db.prepare(`SELECT COUNT(*) as n FROM repositories`).get() as { n: number };
.prepare(`SELECT COUNT(*) as n FROM repositories`)
.get() as { n: number };
return row.n; return row.n;
} }
@@ -115,13 +113,13 @@ export class RepositoryService {
} }
// Default title from owner/repo // Default title from owner/repo
const parts = id.split('/').filter(Boolean); const parts = id.split('/').filter(Boolean);
title = input.title ?? (parts[1] ?? id); title = input.title ?? parts[1] ?? id;
} else { } else {
// local // local
const existing = this.list({ limit: 9999 }).map((r) => r.id); const existing = this.list({ limit: 9999 }).map((r) => r.id);
id = resolveLocalId(input.sourceUrl, existing); id = resolveLocalId(input.sourceUrl, existing);
const parts = input.sourceUrl.split('/'); const parts = input.sourceUrl.split('/');
title = input.title ?? (parts.at(-1) ?? 'local-repo'); title = input.title ?? parts.at(-1) ?? 'local-repo';
} }
// Check for collision // Check for collision

View File

@@ -25,14 +25,8 @@ function createTestDb(): Database.Database {
const migrationsFolder = join(import.meta.dirname, '../db/migrations'); const migrationsFolder = join(import.meta.dirname, '../db/migrations');
// Apply all migration files in order // Apply all migration files in order
const migration0 = readFileSync( const migration0 = readFileSync(join(migrationsFolder, '0000_large_master_chief.sql'), 'utf-8');
join(migrationsFolder, '0000_large_master_chief.sql'), const migration1 = readFileSync(join(migrationsFolder, '0001_quick_nighthawk.sql'), 'utf-8');
'utf-8'
);
const migration1 = readFileSync(
join(migrationsFolder, '0001_quick_nighthawk.sql'),
'utf-8'
);
// Apply first migration // Apply first migration
const statements0 = migration0 const statements0 = migration0
@@ -201,9 +195,7 @@ describe('VersionService.remove()', () => {
versionService.remove('/facebook/react', 'v18.3.0'); versionService.remove('/facebook/react', 'v18.3.0');
const doc = client const doc = client.prepare(`SELECT id FROM documents WHERE id = ?`).get(docId);
.prepare(`SELECT id FROM documents WHERE id = ?`)
.get(docId);
expect(doc).toBeUndefined(); expect(doc).toBeUndefined();
}); });
}); });

View File

@@ -40,12 +40,7 @@ export class VersionService {
* @throws NotFoundError when the parent repository does not exist * @throws NotFoundError when the parent repository does not exist
* @throws AlreadyExistsError when the tag is already registered * @throws AlreadyExistsError when the tag is already registered
*/ */
add( add(repositoryId: string, tag: string, title?: string, commitHash?: string): RepositoryVersion {
repositoryId: string,
tag: string,
title?: string,
commitHash?: string
): RepositoryVersion {
// Verify parent repository exists. // Verify parent repository exists.
const repo = this.db const repo = this.db
.prepare(`SELECT id, source, source_url FROM repositories WHERE id = ?`) .prepare(`SELECT id, source, source_url FROM repositories WHERE id = ?`)
@@ -115,9 +110,7 @@ export class VersionService {
*/ */
getByTag(repositoryId: string, tag: string): RepositoryVersion | null { getByTag(repositoryId: string, tag: string): RepositoryVersion | null {
const row = this.db const row = this.db
.prepare( .prepare(`SELECT * FROM repository_versions WHERE repository_id = ? AND tag = ?`)
`SELECT * FROM repository_versions WHERE repository_id = ? AND tag = ?`
)
.get(repositoryId, tag) as RepositoryVersionEntity | undefined; .get(repositoryId, tag) as RepositoryVersionEntity | undefined;
return row ? RepositoryVersionMapper.fromEntity(new RepositoryVersionEntity(row)) : null; return row ? RepositoryVersionMapper.fromEntity(new RepositoryVersionEntity(row)) : null;
} }
@@ -137,9 +130,9 @@ export class VersionService {
previousVersions: { tag: string; title: string; commitHash?: string }[] previousVersions: { tag: string; title: string; commitHash?: string }[]
): RepositoryVersion[] { ): RepositoryVersion[] {
// Verify parent repository exists. // Verify parent repository exists.
const repo = this.db const repo = this.db.prepare(`SELECT id FROM repositories WHERE id = ?`).get(repositoryId) as
.prepare(`SELECT id FROM repositories WHERE id = ?`) | { id: string }
.get(repositoryId) as { id: string } | undefined; | undefined;
if (!repo) { if (!repo) {
throw new NotFoundError(`Repository ${repositoryId} not found`); throw new NotFoundError(`Repository ${repositoryId} not found`);

View File

@@ -65,13 +65,10 @@ export function discoverVersionTags(options: DiscoverTagsOptions): string[] {
try { try {
// List all tags, sorted by commit date (newest first) // List all tags, sorted by commit date (newest first)
const output = execSync( const output = execSync(`git -C "${repoPath}" tag -l --sort=-creatordate`, {
`git -C "${repoPath}" tag -l --sort=-creatordate`, encoding: 'utf-8',
{ stdio: ['ignore', 'pipe', 'pipe']
encoding: 'utf-8', }).trim();
stdio: ['ignore', 'pipe', 'pipe']
}
).trim();
if (!output) return []; if (!output) return [];

View File

@@ -33,10 +33,11 @@ export function resolveLocalId(path: string, existingIds: string[]): string {
* Slugify a string to be safe for use in IDs. * Slugify a string to be safe for use in IDs.
*/ */
function slugify(str: string): string { function slugify(str: string): string {
return str return (
.toLowerCase() str
.replace(/[^a-z0-9-_]/g, '-') .toLowerCase()
.replace(/-+/g, '-') .replace(/[^a-z0-9-_]/g, '-')
.replace(/^-|-$/g, '') .replace(/-+/g, '-')
|| 'repo'; .replace(/^-|-$/g, '') || 'repo'
);
} }

View File

@@ -59,13 +59,10 @@ export function errorResponse(
status: number, status: number,
details?: Record<string, unknown> details?: Record<string, unknown>
): Response { ): Response {
return new Response( return new Response(JSON.stringify({ error, code, ...(details ? { details } : {}) }), {
JSON.stringify({ error, code, ...(details ? { details } : {}) }), status,
{ headers: { 'Content-Type': 'application/json' }
status, });
headers: { 'Content-Type': 'application/json' }
}
);
} }
/** /**

View File

@@ -20,15 +20,9 @@ import { createServer } from 'node:http';
import { Server } from '@modelcontextprotocol/sdk/server/index.js'; import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'; import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server/streamableHttp.js'; import { StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server/streamableHttp.js';
import { import { CallToolRequestSchema, ListToolsRequestSchema } from '@modelcontextprotocol/sdk/types.js';
CallToolRequestSchema,
ListToolsRequestSchema
} from '@modelcontextprotocol/sdk/types.js';
import { import { RESOLVE_LIBRARY_ID_TOOL, handleResolveLibraryId } from './tools/resolve-library-id.js';
RESOLVE_LIBRARY_ID_TOOL,
handleResolveLibraryId
} from './tools/resolve-library-id.js';
import { QUERY_DOCS_TOOL, handleQueryDocs } from './tools/query-docs.js'; import { QUERY_DOCS_TOOL, handleQueryDocs } from './tools/query-docs.js';
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------

View File

@@ -9,7 +9,9 @@ import { z } from 'zod';
import { searchLibraries } from '../client.js'; import { searchLibraries } from '../client.js';
export const ResolveLibraryIdSchema = z.object({ export const ResolveLibraryIdSchema = z.object({
libraryName: z.string().describe('Library name to search for and resolve to a TrueRef library ID'), libraryName: z
.string()
.describe('Library name to search for and resolve to a TrueRef library ID'),
query: z.string().describe("The user's question or context to help rank results") query: z.string().describe("The user's question or context to help rank results")
}); });

View File

@@ -150,7 +150,9 @@
{#if loading && jobs.length === 0} {#if loading && jobs.length === 0}
<div class="flex items-center justify-center py-12"> <div class="flex items-center justify-center py-12">
<div class="text-center"> <div class="text-center">
<div class="inline-block h-8 w-8 animate-spin rounded-full border-4 border-solid border-blue-600 border-r-transparent"></div> <div
class="inline-block h-8 w-8 animate-spin rounded-full border-4 border-solid border-blue-600 border-r-transparent"
></div>
<p class="mt-2 text-gray-600">Loading jobs...</p> <p class="mt-2 text-gray-600">Loading jobs...</p>
</div> </div>
</div> </div>
@@ -160,26 +162,38 @@
</div> </div>
{:else if jobs.length === 0} {:else if jobs.length === 0}
<div class="rounded-md bg-gray-50 p-8 text-center"> <div class="rounded-md bg-gray-50 p-8 text-center">
<p class="text-gray-600">No jobs found. Jobs will appear here when repositories are indexed.</p> <p class="text-gray-600">
No jobs found. Jobs will appear here when repositories are indexed.
</p>
</div> </div>
{:else} {:else}
<div class="overflow-x-auto rounded-lg border border-gray-200 bg-white shadow"> <div class="overflow-x-auto rounded-lg border border-gray-200 bg-white shadow">
<table class="min-w-full divide-y divide-gray-200"> <table class="min-w-full divide-y divide-gray-200">
<thead class="bg-gray-50"> <thead class="bg-gray-50">
<tr> <tr>
<th class="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider text-gray-500"> <th
class="px-6 py-3 text-left text-xs font-medium tracking-wider text-gray-500 uppercase"
>
Repository Repository
</th> </th>
<th class="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider text-gray-500"> <th
class="px-6 py-3 text-left text-xs font-medium tracking-wider text-gray-500 uppercase"
>
Status Status
</th> </th>
<th class="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider text-gray-500"> <th
class="px-6 py-3 text-left text-xs font-medium tracking-wider text-gray-500 uppercase"
>
Progress Progress
</th> </th>
<th class="px-6 py-3 text-left text-xs font-medium uppercase tracking-wider text-gray-500"> <th
class="px-6 py-3 text-left text-xs font-medium tracking-wider text-gray-500 uppercase"
>
Created Created
</th> </th>
<th class="px-6 py-3 text-right text-xs font-medium uppercase tracking-wider text-gray-500"> <th
class="px-6 py-3 text-right text-xs font-medium tracking-wider text-gray-500 uppercase"
>
Actions Actions
</th> </th>
</tr> </tr>
@@ -187,16 +201,16 @@
<tbody class="divide-y divide-gray-200 bg-white"> <tbody class="divide-y divide-gray-200 bg-white">
{#each jobs as job (job.id)} {#each jobs as job (job.id)}
<tr class="hover:bg-gray-50"> <tr class="hover:bg-gray-50">
<td class="whitespace-nowrap px-6 py-4 text-sm font-medium text-gray-900"> <td class="px-6 py-4 text-sm font-medium whitespace-nowrap text-gray-900">
{job.repositoryId} {job.repositoryId}
{#if job.versionId} {#if job.versionId}
<span class="ml-1 text-xs text-gray-500">@{job.versionId}</span> <span class="ml-1 text-xs text-gray-500">@{job.versionId}</span>
{/if} {/if}
</td> </td>
<td class="whitespace-nowrap px-6 py-4 text-sm text-gray-500"> <td class="px-6 py-4 text-sm whitespace-nowrap text-gray-500">
<JobStatusBadge status={job.status} /> <JobStatusBadge status={job.status} />
</td> </td>
<td class="whitespace-nowrap px-6 py-4 text-sm text-gray-500"> <td class="px-6 py-4 text-sm whitespace-nowrap text-gray-500">
<div class="flex items-center"> <div class="flex items-center">
<span class="mr-2">{job.progress}%</span> <span class="mr-2">{job.progress}%</span>
<div class="h-2 w-32 rounded-full bg-gray-200"> <div class="h-2 w-32 rounded-full bg-gray-200">
@@ -212,10 +226,10 @@
{/if} {/if}
</div> </div>
</td> </td>
<td class="whitespace-nowrap px-6 py-4 text-sm text-gray-500"> <td class="px-6 py-4 text-sm whitespace-nowrap text-gray-500">
{formatDate(job.createdAt)} {formatDate(job.createdAt)}
</td> </td>
<td class="whitespace-nowrap px-6 py-4 text-right text-sm font-medium"> <td class="px-6 py-4 text-right text-sm font-medium whitespace-nowrap">
<div class="flex justify-end gap-2"> <div class="flex justify-end gap-2">
{#if canPause(job.status)} {#if canPause(job.status)}
<button <button
@@ -256,9 +270,7 @@
</div> </div>
{#if loading} {#if loading}
<div class="mt-4 text-center text-sm text-gray-500"> <div class="mt-4 text-center text-sm text-gray-500">Refreshing...</div>
Refreshing...
</div>
{/if} {/if}
{/if} {/if}
</div> </div>

View File

@@ -20,11 +20,7 @@ import { createProviderFromProfile } from '$lib/server/embeddings/registry';
import type { EmbeddingProfile } from '$lib/server/db/schema'; import type { EmbeddingProfile } from '$lib/server/db/schema';
import { parseLibraryId } from '$lib/server/api/library-id'; import { parseLibraryId } from '$lib/server/api/library-id';
import { selectSnippetsWithinBudget, DEFAULT_TOKEN_BUDGET } from '$lib/server/api/token-budget'; import { selectSnippetsWithinBudget, DEFAULT_TOKEN_BUDGET } from '$lib/server/api/token-budget';
import { import { formatContextJson, formatContextTxt, CORS_HEADERS } from '$lib/server/api/formatters';
formatContextJson,
formatContextTxt,
CORS_HEADERS
} from '$lib/server/api/formatters';
import type { ContextResponseMetadata } from '$lib/server/mappers/context-response.mapper'; import type { ContextResponseMetadata } from '$lib/server/mappers/context-response.mapper';
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------
@@ -36,9 +32,10 @@ function getServices(db: ReturnType<typeof getClient>) {
// Load the active embedding profile from the database // Load the active embedding profile from the database
const profileRow = db const profileRow = db
.prepare<[], EmbeddingProfile>( .prepare<
'SELECT * FROM embedding_profiles WHERE is_default = 1 AND enabled = 1 LIMIT 1' [],
) EmbeddingProfile
>('SELECT * FROM embedding_profiles WHERE is_default = 1 AND enabled = 1 LIMIT 1')
.get(); .get();
const provider = profileRow ? createProviderFromProfile(profileRow) : null; const provider = profileRow ? createProviderFromProfile(profileRow) : null;
@@ -53,7 +50,10 @@ interface RawRepoConfig {
function getRules(db: ReturnType<typeof getClient>, repositoryId: string): string[] { function getRules(db: ReturnType<typeof getClient>, repositoryId: string): string[] {
const row = db const row = db
.prepare<[string], RawRepoConfig>(`SELECT rules FROM repository_configs WHERE repository_id = ?`) .prepare<
[string],
RawRepoConfig
>(`SELECT rules FROM repository_configs WHERE repository_id = ?`)
.get(repositoryId); .get(repositoryId);
if (!row?.rules) return []; if (!row?.rules) return [];
@@ -88,9 +88,10 @@ function getSnippetVersionTags(
const placeholders = versionIds.map(() => '?').join(', '); const placeholders = versionIds.map(() => '?').join(', ');
const rows = db const rows = db
.prepare<string[], RawVersionRow>( .prepare<
`SELECT id, tag FROM repository_versions WHERE id IN (${placeholders})` string[],
) RawVersionRow
>(`SELECT id, tag FROM repository_versions WHERE id IN (${placeholders})`)
.all(...versionIds); .all(...versionIds);
return Object.fromEntries(rows.map((row) => [row.id, row.tag])); return Object.fromEntries(rows.map((row) => [row.id, row.tag]));
@@ -116,13 +117,10 @@ export const GET: RequestHandler = async ({ url }) => {
const query = url.searchParams.get('query'); const query = url.searchParams.get('query');
if (!query || !query.trim()) { if (!query || !query.trim()) {
return new Response( return new Response(JSON.stringify({ error: 'query is required', code: 'MISSING_PARAMETER' }), {
JSON.stringify({ error: 'query is required', code: 'MISSING_PARAMETER' }), status: 400,
{ headers: { 'Content-Type': 'application/json', ...CORS_HEADERS }
status: 400, });
headers: { 'Content-Type': 'application/json', ...CORS_HEADERS }
}
);
} }
const responseType = url.searchParams.get('type') ?? 'json'; const responseType = url.searchParams.get('type') ?? 'json';
@@ -157,9 +155,10 @@ export const GET: RequestHandler = async ({ url }) => {
// Verify the repository exists and check its state. // Verify the repository exists and check its state.
const repo = db const repo = db
.prepare<[string], RawRepoState>( .prepare<
`SELECT id, state, title, source, source_url, branch FROM repositories WHERE id = ?` [string],
) RawRepoState
>(`SELECT id, state, title, source, source_url, branch FROM repositories WHERE id = ?`)
.get(parsed.repositoryId); .get(parsed.repositoryId);
if (!repo) { if (!repo) {
@@ -193,9 +192,10 @@ export const GET: RequestHandler = async ({ url }) => {
let resolvedVersion: RawVersionRow | undefined; let resolvedVersion: RawVersionRow | undefined;
if (parsed.version) { if (parsed.version) {
resolvedVersion = db resolvedVersion = db
.prepare<[string, string], RawVersionRow>( .prepare<
`SELECT id, tag FROM repository_versions WHERE repository_id = ? AND tag = ?` [string, string],
) RawVersionRow
>(`SELECT id, tag FROM repository_versions WHERE repository_id = ? AND tag = ?`)
.get(parsed.repositoryId, parsed.version); .get(parsed.repositoryId, parsed.version);
// Version not found is not fatal — fall back to default branch. // Version not found is not fatal — fall back to default branch.
@@ -240,13 +240,14 @@ export const GET: RequestHandler = async ({ url }) => {
sourceUrl: repo.source_url, sourceUrl: repo.source_url,
branch: repo.branch branch: repo.branch
}, },
version: parsed.version || resolvedVersion version:
? { parsed.version || resolvedVersion
requested: parsed.version ?? null, ? {
resolved: resolvedVersion?.tag ?? null, requested: parsed.version ?? null,
id: resolvedVersion?.id ?? null resolved: resolvedVersion?.tag ?? null,
} id: resolvedVersion?.id ?? null
: null, }
: null,
snippetVersions snippetVersions
}; };

View File

@@ -10,7 +10,7 @@ export const GET: RequestHandler = ({ url }) => {
let entries: { name: string; path: string; isGitRepo: boolean }[] = []; let entries: { name: string; path: string; isGitRepo: boolean }[] = [];
let error: string | null = null; let error: string | null = null;
let resolved = target; const resolved = target;
try { try {
const items = fs.readdirSync(target, { withFileTypes: true }); const items = fs.readdirSync(target, { withFileTypes: true });

View File

@@ -7,7 +7,11 @@ import type { RequestHandler } from './$types';
import { getClient } from '$lib/server/db/client.js'; import { getClient } from '$lib/server/db/client.js';
import { IndexingJobMapper } from '$lib/server/mappers/indexing-job.mapper.js'; import { IndexingJobMapper } from '$lib/server/mappers/indexing-job.mapper.js';
import { JobQueue } from '$lib/server/pipeline/job-queue.js'; import { JobQueue } from '$lib/server/pipeline/job-queue.js';
import { handleServiceError, NotFoundError, InvalidInputError } from '$lib/server/utils/validation.js'; import {
handleServiceError,
NotFoundError,
InvalidInputError
} from '$lib/server/utils/validation.js';
export const POST: RequestHandler = ({ params }) => { export const POST: RequestHandler = ({ params }) => {
try { try {
@@ -19,9 +23,7 @@ export const POST: RequestHandler = ({ params }) => {
const success = queue.cancelJob(params.id); const success = queue.cancelJob(params.id);
if (!success) { if (!success) {
throw new InvalidInputError( throw new InvalidInputError(`Cannot cancel job ${params.id} - job is already done or failed`);
`Cannot cancel job ${params.id} - job is already done or failed`
);
} }
// Fetch updated job // Fetch updated job

View File

@@ -7,7 +7,11 @@ import type { RequestHandler } from './$types';
import { getClient } from '$lib/server/db/client.js'; import { getClient } from '$lib/server/db/client.js';
import { IndexingJobMapper } from '$lib/server/mappers/indexing-job.mapper.js'; import { IndexingJobMapper } from '$lib/server/mappers/indexing-job.mapper.js';
import { JobQueue } from '$lib/server/pipeline/job-queue.js'; import { JobQueue } from '$lib/server/pipeline/job-queue.js';
import { handleServiceError, NotFoundError, InvalidInputError } from '$lib/server/utils/validation.js'; import {
handleServiceError,
NotFoundError,
InvalidInputError
} from '$lib/server/utils/validation.js';
export const POST: RequestHandler = ({ params }) => { export const POST: RequestHandler = ({ params }) => {
try { try {

View File

@@ -7,7 +7,11 @@ import type { RequestHandler } from './$types';
import { getClient } from '$lib/server/db/client.js'; import { getClient } from '$lib/server/db/client.js';
import { IndexingJobMapper } from '$lib/server/mappers/indexing-job.mapper.js'; import { IndexingJobMapper } from '$lib/server/mappers/indexing-job.mapper.js';
import { JobQueue } from '$lib/server/pipeline/job-queue.js'; import { JobQueue } from '$lib/server/pipeline/job-queue.js';
import { handleServiceError, NotFoundError, InvalidInputError } from '$lib/server/utils/validation.js'; import {
handleServiceError,
NotFoundError,
InvalidInputError
} from '$lib/server/utils/validation.js';
export const POST: RequestHandler = ({ params }) => { export const POST: RequestHandler = ({ params }) => {
try { try {
@@ -19,7 +23,9 @@ export const POST: RequestHandler = ({ params }) => {
const success = queue.resumeJob(params.id); const success = queue.resumeJob(params.id);
if (!success) { if (!success) {
throw new InvalidInputError(`Cannot resume job ${params.id} - only paused jobs can be resumed`); throw new InvalidInputError(
`Cannot resume job ${params.id} - only paused jobs can be resumed`
);
} }
// Fetch updated job // Fetch updated job

View File

@@ -58,9 +58,7 @@ export const POST: RequestHandler = async ({ request }) => {
let jobResponse: ReturnType<typeof IndexingJobMapper.toDto> | null = null; let jobResponse: ReturnType<typeof IndexingJobMapper.toDto> | null = null;
if (body.autoIndex !== false) { if (body.autoIndex !== false) {
const queue = getQueue(); const queue = getQueue();
const job = queue const job = queue ? queue.enqueue(repo.id) : service.createIndexingJob(repo.id);
? queue.enqueue(repo.id)
: service.createIndexingJob(repo.id);
jobResponse = IndexingJobMapper.toDto(job); jobResponse = IndexingJobMapper.toDto(job);
} }

View File

@@ -28,9 +28,7 @@ export const POST: RequestHandler = async ({ params, request }) => {
// Use the queue so processNext() is triggered immediately. // Use the queue so processNext() is triggered immediately.
// Falls back to direct DB insert if the queue isn't initialised yet. // Falls back to direct DB insert if the queue isn't initialised yet.
const queue = getQueue(); const queue = getQueue();
const job = queue const job = queue ? queue.enqueue(id, versionId) : service.createIndexingJob(id, versionId);
? queue.enqueue(id, versionId)
: service.createIndexingJob(id, versionId);
return json({ job: IndexingJobMapper.toDto(job) }, { status: 202 }); return json({ job: IndexingJobMapper.toDto(job) }, { status: 202 });
} catch (err) { } catch (err) {

View File

@@ -146,4 +146,3 @@ function sanitizeProfile(profile: EmbeddingProfile): EmbeddingProfile {
} }
return profile; return profile;
} }

View File

@@ -16,9 +16,10 @@ export const GET: RequestHandler = async () => {
try { try {
const db = getClient(); const db = getClient();
const profile = db const profile = db
.prepare<[], EmbeddingProfile>( .prepare<
'SELECT * FROM embedding_profiles WHERE is_default = 1 AND enabled = 1 LIMIT 1' [],
) EmbeddingProfile
>('SELECT * FROM embedding_profiles WHERE is_default = 1 AND enabled = 1 LIMIT 1')
.get(); .get();
if (!profile) { if (!profile) {
@@ -42,7 +43,6 @@ export const GET: RequestHandler = async () => {
} }
}; };
export const POST: RequestHandler = async ({ request }) => { export const POST: RequestHandler = async ({ request }) => {
try { try {
const body = await request.json(); const body = await request.json();

View File

@@ -8,7 +8,9 @@
let { data }: { data: PageData } = $props(); let { data }: { data: PageData } = $props();
// Initialized empty; $effect syncs from data prop on every navigation/reload. // Initialized empty; $effect syncs from data prop on every navigation/reload.
let repo = $state<Repository & { versions?: RepositoryVersion[] }>({} as Repository & { versions?: RepositoryVersion[] }); let repo = $state<Repository & { versions?: RepositoryVersion[] }>(
{} as Repository & { versions?: RepositoryVersion[] }
);
let recentJobs = $state<IndexingJob[]>([]); let recentJobs = $state<IndexingJob[]>([]);
$effect(() => { $effect(() => {
if (data.repo) repo = data.repo; if (data.repo) repo = data.repo;
@@ -189,7 +191,7 @@
<dl class="grid grid-cols-1 gap-y-2 text-sm sm:grid-cols-2"> <dl class="grid grid-cols-1 gap-y-2 text-sm sm:grid-cols-2">
<div class="flex gap-2"> <div class="flex gap-2">
<dt class="text-gray-500">Source</dt> <dt class="text-gray-500">Source</dt>
<dd class="font-medium capitalize text-gray-900">{repo.source}</dd> <dd class="font-medium text-gray-900 capitalize">{repo.source}</dd>
</div> </div>
<div class="flex gap-2"> <div class="flex gap-2">
<dt class="text-gray-500">Branch</dt> <dt class="text-gray-500">Branch</dt>

View File

@@ -227,7 +227,9 @@
<div class="rounded-xl border border-gray-200 bg-white p-6 shadow-sm"> <div class="rounded-xl border border-gray-200 bg-white p-6 shadow-sm">
<div class="mb-4 flex items-center gap-3"> <div class="mb-4 flex items-center gap-3">
<div class="flex min-w-0 flex-1 items-center gap-2"> <div class="flex min-w-0 flex-1 items-center gap-2">
<span class="shrink-0 rounded bg-green-100 px-2 py-0.5 text-xs font-medium text-green-700"> <span
class="shrink-0 rounded bg-green-100 px-2 py-0.5 text-xs font-medium text-green-700"
>
Selected Selected
</span> </span>
<span class="truncate font-mono text-sm text-gray-700">{selectedLibraryTitle}</span> <span class="truncate font-mono text-sm text-gray-700">{selectedLibraryTitle}</span>
@@ -285,7 +287,9 @@
{:else if query && !loadingSnippets && snippets.length === 0 && !snippetError} {:else if query && !loadingSnippets && snippets.length === 0 && !snippetError}
<div class="flex flex-col items-center py-16 text-center"> <div class="flex flex-col items-center py-16 text-center">
<p class="text-sm text-gray-500">No snippets found for that query.</p> <p class="text-sm text-gray-500">No snippets found for that query.</p>
<p class="mt-1 text-xs text-gray-400">Try a different question or select another library.</p> <p class="mt-1 text-xs text-gray-400">
Try a different question or select another library.
</p>
</div> </div>
{/if} {/if}
{/if} {/if}

View File

@@ -203,7 +203,11 @@
: 'border border-gray-200 text-gray-700 hover:bg-gray-50' : 'border border-gray-200 text-gray-700 hover:bg-gray-50'
].join(' ')} ].join(' ')}
> >
{p === 'none' ? 'None (FTS5 only)' : p === 'openai' ? 'OpenAI-compatible' : 'Local Model'} {p === 'none'
? 'None (FTS5 only)'
: p === 'openai'
? 'OpenAI-compatible'
: 'Local Model'}
</button> </button>
{/each} {/each}
</div> </div>