Compare commits

..

10 Commits

Author SHA1 Message Date
Giancarmine Salucci
e63279fcf6 improve readme, untrack agents 2026-03-29 18:35:47 +02:00
Giancarmine Salucci
a426f4305c Merge branch 'fix/MULTIVERSION-0001-trueref-config-crawl-result' 2026-03-29 12:44:47 +02:00
Giancarmine Salucci
23ea8f2b4b Merge branch 'fix/MULTIVERSION-0001-multi-version-indexing' 2026-03-29 12:44:47 +02:00
Giancarmine Salucci
0bf01e3057 last fix 2026-03-29 12:44:06 +02:00
Giancarmine Salucci
09c6f9f7c1 fix(MULTIVERSION-0001): eliminate NULL-row contamination in getRules
When a versioned query is made, getRules() now returns only the
version-specific repository_configs row. The NULL (HEAD/repo-wide)
row is no longer merged in, preventing v4 rules from bleeding into
v1/v2/v3 versioned context responses.

Tests updated to assert the isolation: versioned queries return only
their own rules row; a new test verifies that a version with no
config row returns an empty rules array even when a NULL row exists.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 11:47:31 +02:00
Giancarmine Salucci
bbc67f8064 fix(MULTIVERSION-0001): prevent version jobs from overwriting repo-wide NULL rules entry
Version jobs now write rules only to the version-specific (repo, versionId)
row. Previously every version job unconditionally wrote to the (repo, NULL)
row as well, causing whichever version indexed last to contaminate the
repo-wide rules that the context API merges into every query response.

Adds a regression test (Bug5b) that indexes the main branch, then indexes a
version with different rules, and asserts the NULL row still holds the
main-branch rules.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 01:15:58 +01:00
Giancarmine Salucci
cd4ea7112c fix(MULTIVERSION-0001): surface pre-parsed config in CrawlResult to fix rules persistence
When trueref.json specifies a `folders` allowlist (e.g. ["src/"]),
shouldIndexFile() excludes trueref.json itself because it lives at the
repo root. The indexing pipeline then searches crawlResult.files for the
config file, finds nothing, and never writes rules to repository_configs.

Fix (Option B): add a `config` field to CrawlResult so LocalCrawler
returns the pre-parsed config directly. The indexing pipeline now reads
crawlResult.config first instead of scanning files[], which resolves the
regression for all repos with a folders allowlist.

- Add `config?: RepoConfig` to CrawlResult in crawler/types.ts
- Return `config` from LocalCrawler.crawlDirectory()
- Update IndexingPipeline.crawl() to propagate CrawlResult.config
- Update IndexingPipeline.run() to prefer crawlResult.config over files
- Add regression tests covering the folders-allowlist exclusion scenario

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 17:27:53 +01:00
Giancarmine Salucci
666ec7d55f feat(MULTIVERSION-0001): wire trueref.json into pipeline + per-version rules
- Add migration 0003: recreate repository_configs with nullable version_id
  column and two partial unique indexes (repo-wide: version_id IS NULL,
  per-version: (repository_id, version_id) WHERE version_id IS NOT NULL)
- Update schema.ts to reflect the new composite structure with uniqueIndex
  partial constraints via drizzle-orm sql helper
- IndexingPipeline: parse trueref.json / context7.json after crawl, apply
  excludeFiles filter before diff computation, update totalFiles accordingly
- IndexingPipeline: persist repo-wide rules (version_id=null) and
  version-specific rules (when versionId set) via upsertRepoConfig helper
- Add matchesExcludePattern static helper supporting plain filename,
  glob prefix (docs/legacy*), and exact path patterns
- context endpoint: split getRules into repo-wide + version-specific lookup
  with dedup merge; pass versionId at call site
- Update test DB loaders to include migration 0003
- Add pipeline tests for excludeFiles, repo-wide rules persistence, and
  per-version rules persistence
- Add integration tests for merged rules, repo-only rules, and dedup logic

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:44:30 +01:00
Giancarmine Salucci
255838dcc0 fix(MULTIVERSION-0001): fix version isolation, 404 on unknown version, commit-hash lookup, and searchModeUsed
Bug 1: Thread version tag from run() into crawl() via getVersionTag() helper so
LocalCrawler and GithubCrawler receive the correct ref when indexing a named
version instead of always crawling HEAD.

Bug 2: Return HTTP 404 with code VERSION_NOT_FOUND when a requested version tag
is not found in repository_versions, instead of silently falling back to a
cross-version mixed result set.

Bug 4: Before returning 404, attempt a commit_hash prefix match (min 7 chars)
so callers can request a version by full or short SHA.

Bug 3: Change HybridSearchService.search() to return
{ results, searchModeUsed } and propagate searchModeUsed through
ContextResponseMetadata and ContextJsonResponseDto so callers can see which
strategy (keyword / semantic / hybrid / keyword_fallback) was actually used.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:31:15 +01:00
Giancarmine Salucci
417c6fd072 fix(MULTIVERSION-0001): fix version indexing pipeline state and UI reactivity
- Add updateVersion() helper to IndexingPipeline that writes to repository_versions
- Set version state to indexing/indexed/error at the appropriate pipeline stages
- Add computeVersionStats() to count snippets for a specific version
- Replace Map<string,string> with Record<string,string|undefined> for activeVersionJobs to fix Svelte 5 reactivity edge cases
- Remove premature loadVersions() call from handleIndexVersion (oncomplete fires it instead)
- Add refreshRepo() to version oncomplete callback so stat badges update after indexing
- Disable Index button when activeVersionJobs has an entry for that tag (not just version.state)
- Add three pipeline test cases covering versionId indexing, error, and no-touch paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:03:44 +01:00
26 changed files with 2094 additions and 142 deletions

1
.github/agents vendored
View File

@@ -1 +0,0 @@
/home/moze/Sources/copilot-agents/.github/agents

1
.github/schemas vendored
View File

@@ -1 +0,0 @@
/home/moze/Sources/copilot-agents/.github/schemas

1
.github/skills vendored
View File

@@ -1 +0,0 @@
/home/moze/Sources/copilot-agents/.github/skills

5
.gitignore vendored
View File

@@ -36,3 +36,8 @@ docs/docs_cache_state.yaml
# Claude Code — ignore local/machine-specific settings, keep project rules # Claude Code — ignore local/machine-specific settings, keep project rules
.claude/ .claude/
!.claude/rules/ !.claude/rules/
# Github Copilot
.github/agents
.github/schemas
.github/skills

170
README.md
View File

@@ -16,9 +16,12 @@ The goal is straightforward: give your assistants accurate, current, version-awa
- Stores metadata in SQLite. - Stores metadata in SQLite.
- Supports keyword search out of the box with SQLite FTS5. - Supports keyword search out of the box with SQLite FTS5.
- Supports semantic and hybrid search when an embedding provider is configured. - Supports semantic and hybrid search when an embedding provider is configured.
- Exposes REST endpoints for library discovery and documentation retrieval. - Supports multi-version indexing: index specific git tags independently, query a version by appending it to the library ID.
- Discovers available git tags from local repositories automatically.
- Stores per-version rules from `trueref.json` and prepends them to every `query-docs` response.
- Exposes REST endpoints for library discovery, documentation retrieval, and version management.
- Exposes an MCP server over stdio and HTTP for AI clients. - Exposes an MCP server over stdio and HTTP for AI clients.
- Provides a SvelteKit web UI for repository management, search, indexing jobs, and embedding settings. - Provides a SvelteKit web UI for repository management, version management, search, indexing jobs, and embedding settings.
- Supports repository-level configuration through `trueref.json` or `context7.json`. - Supports repository-level configuration through `trueref.json` or `context7.json`.
## Project status ## Project status
@@ -28,10 +31,12 @@ TrueRef is under active development. The current codebase already includes:
- repository management - repository management
- indexing jobs and recovery on restart - indexing jobs and recovery on restart
- local and GitHub crawling - local and GitHub crawling
- version registration support - multi-version indexing with git tag isolation
- automatic tag discovery for local git repositories
- per-version rules from `trueref.json` prepended to context responses
- context7-compatible API endpoints - context7-compatible API endpoints
- MCP stdio and HTTP transports - MCP stdio and HTTP transports
- configurable embedding providers - configurable embedding providers (none / OpenAI-compatible / local ONNX)
## Architecture ## Architecture
@@ -66,7 +71,15 @@ Each indexed repository becomes a library with an ID such as `/facebook/react`.
### Versions ### Versions
Libraries can register version tags. Queries can target a specific version by using a library ID such as `/facebook/react/v18.3.0`. Libraries can register version tags. Each version is indexed independently so snippets from different releases never mix.
Query a specific version by appending the tag to the library ID:
```
/facebook/react/v18.3.0
```
For local repositories, TrueRef can discover all available git tags automatically via the versions/discover endpoint. Tags can be added through the UI on the repository detail page or via the REST API.
### Snippets ### Snippets
@@ -76,6 +89,8 @@ Documents are split into code and info snippets. These snippets are what search
Repository rules defined in `trueref.json` are prepended to `query-docs` responses so assistants get usage constraints along with the retrieved content. Repository rules defined in `trueref.json` are prepended to `query-docs` responses so assistants get usage constraints along with the retrieved content.
Rules are stored per version when a version-specific config is found during indexing, so different releases can carry different usage guidance.
## Requirements ## Requirements
- Node.js 20+ - Node.js 20+
@@ -153,6 +168,12 @@ Use the main page to:
- delete an indexed repository - delete an indexed repository
- monitor active indexing jobs - monitor active indexing jobs
Open a repository's detail page to:
- view registered version tags
- discover available git tags (local repositories)
- trigger version-specific indexing jobs
### Search ### Search
Use the Search page to: Use the Search page to:
@@ -175,21 +196,40 @@ If no embedding provider is configured, TrueRef still works with FTS5-only searc
## Repository configuration ## Repository configuration
TrueRef supports a repository-local config file named `trueref.json`. You can place a `trueref.json` file at the **root** of any repository you index. TrueRef reads it during every indexing run to control what gets indexed and what gets shown to AI assistants.
For compatibility with existing context7-style repositories, `context7.json` is also supported. For backward compatibility with repositories that already have a `context7.json`, that file is also supported. When both files are present, `trueref.json` takes precedence.
### What the config controls ### Where to place it
- project display title ```
- project description my-library/
- included folders ├── trueref.json ← here, at the repository root
- excluded folders ├── src/
- excluded file names ├── docs/
- assistant-facing usage rules └── ...
- previously released versions ```
### Example `trueref.json` For GitHub repositories, TrueRef fetches the file from the default branch root. For local repositories, it reads it from the filesystem root of the indexed folder.
### Fields
| Field | Type | Required | Description |
|---|---|---|---|
| `$schema` | string | No | URL to the live JSON Schema for editor validation |
| `projectTitle` | string | No | Display name override (max 100 chars) |
| `description` | string | No | Library description used for search ranking (10500 chars) |
| `folders` | string[] | No | Path prefixes or regex strings to **include** (max 50 items). If absent, all folders are included |
| `excludeFolders` | string[] | No | Path prefixes or regex strings to **exclude** after the `folders` allowlist (max 50 items) |
| `excludeFiles` | string[] | No | Exact filenames to skip — no path, no glob (max 100 items) |
| `rules` | string[] | No | Best-practice rules prepended to every `query-docs` response (max 20 rules, 5500 chars each) |
| `previousVersions` | object[] | No | Version tags to register when the repository is indexed (max 50 entries) |
`previousVersions` entries each require a `tag` (e.g. `"v1.2.3"`) and a `title` (e.g. `"Version 1.2.3"`).
The parser is intentionally lenient: unknown keys are silently ignored, mistyped values are skipped with a warning, and oversized strings or arrays are truncated. Only invalid JSON or a non-object root is a hard error.
### Full example
```json ```json
{ {
@@ -197,30 +237,76 @@ For compatibility with existing context7-style repositories, `context7.json` is
"projectTitle": "My Internal SDK", "projectTitle": "My Internal SDK",
"description": "Internal SDK for billing, auth, and event ingestion.", "description": "Internal SDK for billing, auth, and event ingestion.",
"folders": ["src/", "docs/"], "folders": ["src/", "docs/"],
"excludeFolders": ["tests/", "fixtures/", "node_modules/"], "excludeFolders": ["tests/", "fixtures/", "node_modules/", "__mocks__/"],
"excludeFiles": ["CHANGELOG.md"], "excludeFiles": ["CHANGELOG.md", "jest.config.ts"],
"rules": [ "rules": [
"Prefer named imports over wildcard imports.", "Prefer named imports over wildcard imports.",
"Use the async client API for all network calls." "Use the async client API for all network calls.",
"Never import from internal sub-paths — use the package root only."
], ],
"previousVersions": [ "previousVersions": [
{ { "tag": "v2.0.0", "title": "Version 2.0.0" },
"tag": "v1.2.3", { "tag": "v1.2.3", "title": "Version 1.2.3 (legacy)" }
"title": "Version 1.2.3"
}
] ]
} }
``` ```
### JSON schema ### How `folders` and `excludeFolders` are matched
You can point your editor to the live schema served by TrueRef: Both fields accept strings that are matched against the full relative file path within the repository. A string is treated as a path prefix unless it starts with `^`, in which case it is compiled as a regex:
```text ```json
{
"folders": ["src/", "docs/", "^packages/core"],
"excludeFolders": ["src/internal/", "__tests__"]
}
```
- `"src/"` — includes any file whose path starts with `src/`
- `"^packages/core"` — regex, includes only `packages/core` not `packages/core-utils`
`excludeFolders` is applied **after** the `folders` allowlist, so you can narrow a broad include with a targeted exclude.
### How `rules` are used
Rules are stored in the database at index time and automatically prepended to every `query-docs` response for that library (and version). This means AI assistants receive them alongside the retrieved snippets without any extra configuration.
When a version is indexed, the rules from the config found at that version's checkout are stored separately. Different version tags can therefore carry different rules.
Example context response with rules prepended:
```
RULES:
- Prefer named imports over wildcard imports.
- Use the async client API for all network calls.
LIBRARY DOCUMENTATION:
...
```
### How `previousVersions` works
When TrueRef indexes a repository and finds `previousVersions`, it registers those tags in the versions table. The tags are then available for version-specific indexing and queries without any further manual registration.
This is useful when you want all historical releases available from a fresh TrueRef setup without manually triggering one indexing job per version.
### JSON Schema for editor support
TrueRef serves a live JSON Schema at:
```
http://localhost:5173/api/v1/schema/trueref-config.json http://localhost:5173/api/v1/schema/trueref-config.json
``` ```
That enables validation and autocomplete in editors that support JSON Schema references. Add it to your `trueref.json` via the `$schema` field to get inline validation and autocomplete in VS Code, IntelliJ, and any other editor that supports JSON Schema Draft 07:
```json
{
"$schema": "http://localhost:5173/api/v1/schema/trueref-config.json"
}
```
If you are running TrueRef on a server, replace `localhost:5173` with your actual host and port. The schema endpoint always reflects the version of TrueRef you are running.
## REST API ## REST API
@@ -299,6 +385,36 @@ curl "http://localhost:5173/api/v1/jobs"
curl "http://localhost:5173/api/v1/jobs/<job-id>" curl "http://localhost:5173/api/v1/jobs/<job-id>"
``` ```
### Version management
List registered versions for a library:
```sh
curl "http://localhost:5173/api/v1/libs/%2Ffacebook%2Freact/versions"
```
Index a specific version tag:
```sh
curl -X POST "http://localhost:5173/api/v1/libs/%2Ffacebook%2Freact/versions/v18.3.0/index"
```
Discover available git tags (local repositories only):
```sh
curl -X POST "http://localhost:5173/api/v1/libs/%2Fpath%2Fto%2Fmy-lib/versions/discover"
```
Returns `{ "tags": [{ "tag": "v1.0.0", "commitHash": "abc123" }, ...] }`. Returns an empty array for GitHub repositories.
### Version-targeted context retrieval
Append the version tag to the library ID to retrieve snippets from a specific indexed version:
```sh
curl "http://localhost:5173/api/v1/context?libraryId=/facebook/react/v18.3.0&query=how%20to%20use%20useEffect&type=txt"
```
### Response formats ### Response formats
The two search endpoints support: The two search endpoints support:

View File

@@ -24,7 +24,12 @@ import type { Handle } from '@sveltejs/kit';
try { try {
initializeDatabase(); initializeDatabase();
} catch (err) {
console.error('[hooks.server] FATAL: database initialisation failed:', err);
process.exit(1);
}
try {
const db = getClient(); const db = getClient();
const activeProfileRow = db const activeProfileRow = db
.prepare<[], EmbeddingProfileEntityProps>( .prepare<[], EmbeddingProfileEntityProps>(
@@ -46,7 +51,8 @@ try {
console.log('[hooks.server] Indexing pipeline initialised.'); console.log('[hooks.server] Indexing pipeline initialised.');
} catch (err) { } catch (err) {
console.error( console.error(
`[hooks.server] Failed to initialise server: ${err instanceof Error ? err.message : String(err)}` '[hooks.server] Failed to initialise pipeline:',
err instanceof Error ? err.message : String(err)
); );
} }

View File

@@ -1,13 +1,14 @@
<script lang="ts"> <script lang="ts">
import type { IndexingJob } from '$lib/types'; import type { IndexingJob } from '$lib/types';
let { jobId }: { jobId: string } = $props(); let { jobId, oncomplete }: { jobId: string; oncomplete?: () => void } = $props();
let job = $state<IndexingJob | null>(null); let job = $state<IndexingJob | null>(null);
$effect(() => { $effect(() => {
job = null; job = null;
let stopped = false; let stopped = false;
let completeFired = false;
async function poll() { async function poll() {
if (stopped) return; if (stopped) return;
@@ -16,6 +17,10 @@
if (res.ok) { if (res.ok) {
const data = await res.json(); const data = await res.json();
job = data.job; job = data.job;
if (!completeFired && (job?.status === 'done' || job?.status === 'failed')) {
completeFired = true;
oncomplete?.();
}
} }
} catch { } catch {
// ignore transient errors // ignore transient errors

View File

@@ -5,7 +5,7 @@ import RepositoryCard from './RepositoryCard.svelte';
describe('RepositoryCard.svelte', () => { describe('RepositoryCard.svelte', () => {
it('encodes slash-bearing repository ids in the details href', async () => { it('encodes slash-bearing repository ids in the details href', async () => {
render(RepositoryCard, { const { container } = await render(RepositoryCard, {
repo: { repo: {
id: '/facebook/react', id: '/facebook/react',
title: 'React', title: 'React',
@@ -26,7 +26,8 @@ describe('RepositoryCard.svelte', () => {
.element(page.getByRole('link', { name: 'Details' })) .element(page.getByRole('link', { name: 'Details' }))
.toHaveAttribute('href', '/repos/%2Ffacebook%2Freact'); .toHaveAttribute('href', '/repos/%2Ffacebook%2Freact');
await expect.element(page.getByText('1,200 embeddings')).toBeInTheDocument(); const text = container.textContent ?? '';
await expect.element(page.getByText('Indexed: main, v18.3.0')).toBeInTheDocument(); expect(text).toMatch(/1[,.\u00a0\u202f]?200 embeddings/);
expect(text).toContain('Indexed: main, v18.3.0');
}); });
}); });

View File

@@ -143,6 +143,9 @@ export function formatContextTxt(
} }
noResults.push(`Result count: ${metadata?.resultCount ?? 0}`); noResults.push(`Result count: ${metadata?.resultCount ?? 0}`);
if (metadata?.searchModeUsed) {
noResults.push(`Search mode: ${metadata.searchModeUsed}`);
}
parts.push(noResults.join('\n')); parts.push(noResults.join('\n'));
return parts.join('\n\n'); return parts.join('\n\n');

View File

@@ -413,6 +413,59 @@ describe('LocalCrawler.crawl() — config file detection', () => {
const result = await crawlRoot(); const result = await crawlRoot();
expect(result.files.some((f) => f.path === 'src/index.ts')).toBe(true); expect(result.files.some((f) => f.path === 'src/index.ts')).toBe(true);
}); });
it('populates CrawlResult.config with the parsed trueref.json even when folders allowlist excludes the root', async () => {
// Regression test for MULTIVERSION-0001:
// When folders: ["src/"] is set, trueref.json at the root is excluded from
// files[] by shouldIndexFile(). The config must still be returned in
// CrawlResult.config so the indexing pipeline can persist rules.
root = await makeTempRepo({
'trueref.json': JSON.stringify({
folders: ['src/'],
rules: ['Always document public APIs.']
}),
'src/index.ts': 'export {};',
'docs/guide.md': '# Guide'
});
const result = await crawlRoot();
// trueref.json must NOT appear in files (excluded by folders allowlist).
expect(result.files.some((f) => f.path === 'trueref.json')).toBe(false);
// docs/guide.md must NOT appear (outside src/).
expect(result.files.some((f) => f.path === 'docs/guide.md')).toBe(false);
// src/index.ts must appear (inside src/).
expect(result.files.some((f) => f.path === 'src/index.ts')).toBe(true);
// CrawlResult.config must carry the parsed config.
expect(result.config).toBeDefined();
expect(result.config?.rules).toEqual(['Always document public APIs.']);
});
it('populates CrawlResult.config with the parsed context7.json', async () => {
root = await makeTempRepo({
'context7.json': JSON.stringify({ rules: ['Rule from context7.'] }),
'src/index.ts': 'export {};'
});
const result = await crawlRoot();
expect(result.config).toBeDefined();
expect(result.config?.rules).toEqual(['Rule from context7.']);
});
it('CrawlResult.config is undefined when no config file is present', async () => {
root = await makeTempRepo({ 'src/index.ts': 'export {};' });
const result = await crawlRoot();
expect(result.config).toBeUndefined();
});
it('CrawlResult.config is undefined when caller supplies config (caller-provided takes precedence, no auto-detect)', async () => {
root = await makeTempRepo({
'trueref.json': JSON.stringify({ rules: ['From file.'] }),
'src/index.ts': 'export {};'
});
// Caller-supplied config prevents auto-detection; CrawlResult.config
// should carry the caller config (not the file content).
const result = await crawlRoot({ config: { rules: ['From caller.'] } });
expect(result.config?.rules).toEqual(['From caller.']);
});
}); });
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------

View File

@@ -230,7 +230,11 @@ export class LocalCrawler {
totalFiles: filteredPaths.length, totalFiles: filteredPaths.length,
skippedFiles: allRelPaths.length - filteredPaths.length, skippedFiles: allRelPaths.length - filteredPaths.length,
branch, branch,
commitSha commitSha,
// Surface the pre-parsed config so the indexing pipeline can read rules
// without needing to find trueref.json inside crawledFiles (which fails
// when a `folders` allowlist excludes the repo root).
config: config ?? undefined
}; };
} }

View File

@@ -35,6 +35,13 @@ export interface CrawlResult {
branch: string; branch: string;
/** HEAD commit SHA */ /** HEAD commit SHA */
commitSha: string; commitSha: string;
/**
* Pre-parsed trueref.json / context7.json configuration found at the repo
* root during crawling. Carried here so the indexing pipeline can consume it
* directly without having to locate the config file in `files` — which fails
* when a `folders` allowlist excludes the repo root.
*/
config?: RepoConfig;
} }
export interface CrawlOptions { export interface CrawlOptions {

View File

@@ -30,6 +30,7 @@ const __dirname = dirname(fileURLToPath(import.meta.url));
*/ */
export function initializeDatabase(): void { export function initializeDatabase(): void {
const migrationsFolder = join(__dirname, 'migrations'); const migrationsFolder = join(__dirname, 'migrations');
console.log(`[db] Running migrations from ${migrationsFolder}...`);
migrate(db, { migrationsFolder }); migrate(db, { migrationsFolder });
// Apply FTS5 virtual table and trigger DDL (not expressible via Drizzle). // Apply FTS5 virtual table and trigger DDL (not expressible via Drizzle).

View File

@@ -0,0 +1,30 @@
PRAGMA foreign_keys=OFF;
--> statement-breakpoint
CREATE TABLE `__new_repository_configs` (
`repository_id` text NOT NULL,
`version_id` text,
`project_title` text,
`description` text,
`folders` text,
`exclude_folders` text,
`exclude_files` text,
`rules` text,
`previous_versions` text,
`updated_at` integer NOT NULL,
FOREIGN KEY (`repository_id`) REFERENCES `repositories`(`id`) ON UPDATE no action ON DELETE cascade
);
--> statement-breakpoint
INSERT INTO `__new_repository_configs`
(repository_id, version_id, project_title, description, folders, exclude_folders, exclude_files, rules, previous_versions, updated_at)
SELECT repository_id, NULL, project_title, description, folders, exclude_folders, exclude_files, rules, previous_versions, updated_at
FROM `repository_configs`;
--> statement-breakpoint
DROP TABLE `repository_configs`;
--> statement-breakpoint
ALTER TABLE `__new_repository_configs` RENAME TO `repository_configs`;
--> statement-breakpoint
PRAGMA foreign_keys=ON;
--> statement-breakpoint
CREATE UNIQUE INDEX `uniq_repo_config_base` ON `repository_configs` (`repository_id`) WHERE `version_id` IS NULL;
--> statement-breakpoint
CREATE UNIQUE INDEX `uniq_repo_config_version` ON `repository_configs` (`repository_id`, `version_id`) WHERE `version_id` IS NOT NULL;

View File

@@ -0,0 +1,835 @@
{
"version": "6",
"dialect": "sqlite",
"id": "a7c2e4f8-3b1d-4e9a-8f0c-6d5e2a1b9c7f",
"prevId": "31531dab-a199-4fc5-a889-1884940039cd",
"tables": {
"documents": {
"name": "documents",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"repository_id": {
"name": "repository_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"version_id": {
"name": "version_id",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"file_path": {
"name": "file_path",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"language": {
"name": "language",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"token_count": {
"name": "token_count",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"checksum": {
"name": "checksum",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"indexed_at": {
"name": "indexed_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {
"documents_repository_id_repositories_id_fk": {
"name": "documents_repository_id_repositories_id_fk",
"tableFrom": "documents",
"tableTo": "repositories",
"columnsFrom": ["repository_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
},
"documents_version_id_repository_versions_id_fk": {
"name": "documents_version_id_repository_versions_id_fk",
"tableFrom": "documents",
"tableTo": "repository_versions",
"columnsFrom": ["version_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"embedding_profiles": {
"name": "embedding_profiles",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"provider_kind": {
"name": "provider_kind",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"enabled": {
"name": "enabled",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": true
},
"is_default": {
"name": "is_default",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": false
},
"model": {
"name": "model",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"dimensions": {
"name": "dimensions",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"config": {
"name": "config",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"updated_at": {
"name": "updated_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"indexing_jobs": {
"name": "indexing_jobs",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"repository_id": {
"name": "repository_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"version_id": {
"name": "version_id",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"status": {
"name": "status",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": "'queued'"
},
"progress": {
"name": "progress",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"total_files": {
"name": "total_files",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"processed_files": {
"name": "processed_files",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"error": {
"name": "error",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"started_at": {
"name": "started_at",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"completed_at": {
"name": "completed_at",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {
"indexing_jobs_repository_id_repositories_id_fk": {
"name": "indexing_jobs_repository_id_repositories_id_fk",
"tableFrom": "indexing_jobs",
"tableTo": "repositories",
"columnsFrom": ["repository_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"repositories": {
"name": "repositories",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"description": {
"name": "description",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"source": {
"name": "source",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"source_url": {
"name": "source_url",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"branch": {
"name": "branch",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": "'main'"
},
"state": {
"name": "state",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": "'pending'"
},
"total_snippets": {
"name": "total_snippets",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"total_tokens": {
"name": "total_tokens",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"trust_score": {
"name": "trust_score",
"type": "real",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"benchmark_score": {
"name": "benchmark_score",
"type": "real",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"stars": {
"name": "stars",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"github_token": {
"name": "github_token",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"last_indexed_at": {
"name": "last_indexed_at",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"updated_at": {
"name": "updated_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"repository_configs": {
"name": "repository_configs",
"columns": {
"repository_id": {
"name": "repository_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"version_id": {
"name": "version_id",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"project_title": {
"name": "project_title",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"description": {
"name": "description",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"folders": {
"name": "folders",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"exclude_folders": {
"name": "exclude_folders",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"exclude_files": {
"name": "exclude_files",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"rules": {
"name": "rules",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"previous_versions": {
"name": "previous_versions",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"updated_at": {
"name": "updated_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {
"uniq_repo_config_base": {
"name": "uniq_repo_config_base",
"columns": ["repository_id"],
"isUnique": true,
"where": "`version_id` IS NULL"
},
"uniq_repo_config_version": {
"name": "uniq_repo_config_version",
"columns": ["repository_id", "version_id"],
"isUnique": true,
"where": "`version_id` IS NOT NULL"
}
},
"foreignKeys": {
"repository_configs_repository_id_repositories_id_fk": {
"name": "repository_configs_repository_id_repositories_id_fk",
"tableFrom": "repository_configs",
"tableTo": "repositories",
"columnsFrom": ["repository_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"repository_versions": {
"name": "repository_versions",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"repository_id": {
"name": "repository_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"tag": {
"name": "tag",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"commit_hash": {
"name": "commit_hash",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"state": {
"name": "state",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": "'pending'"
},
"total_snippets": {
"name": "total_snippets",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"indexed_at": {
"name": "indexed_at",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {
"repository_versions_repository_id_repositories_id_fk": {
"name": "repository_versions_repository_id_repositories_id_fk",
"tableFrom": "repository_versions",
"tableTo": "repositories",
"columnsFrom": ["repository_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"settings": {
"name": "settings",
"columns": {
"key": {
"name": "key",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"value": {
"name": "value",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"updated_at": {
"name": "updated_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"snippet_embeddings": {
"name": "snippet_embeddings",
"columns": {
"snippet_id": {
"name": "snippet_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"profile_id": {
"name": "profile_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"model": {
"name": "model",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"dimensions": {
"name": "dimensions",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"embedding": {
"name": "embedding",
"type": "blob",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {
"snippet_embeddings_snippet_id_snippets_id_fk": {
"name": "snippet_embeddings_snippet_id_snippets_id_fk",
"tableFrom": "snippet_embeddings",
"tableTo": "snippets",
"columnsFrom": ["snippet_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
},
"snippet_embeddings_profile_id_embedding_profiles_id_fk": {
"name": "snippet_embeddings_profile_id_embedding_profiles_id_fk",
"tableFrom": "snippet_embeddings",
"tableTo": "embedding_profiles",
"columnsFrom": ["profile_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {
"snippet_embeddings_snippet_id_profile_id_pk": {
"columns": ["snippet_id", "profile_id"],
"name": "snippet_embeddings_snippet_id_profile_id_pk"
}
},
"uniqueConstraints": {},
"checkConstraints": {}
},
"snippets": {
"name": "snippets",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"document_id": {
"name": "document_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"repository_id": {
"name": "repository_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"version_id": {
"name": "version_id",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"type": {
"name": "type",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"content": {
"name": "content",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"language": {
"name": "language",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"breadcrumb": {
"name": "breadcrumb",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"token_count": {
"name": "token_count",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {
"snippets_document_id_documents_id_fk": {
"name": "snippets_document_id_documents_id_fk",
"tableFrom": "snippets",
"tableTo": "documents",
"columnsFrom": ["document_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
},
"snippets_repository_id_repositories_id_fk": {
"name": "snippets_repository_id_repositories_id_fk",
"tableFrom": "snippets",
"tableTo": "repositories",
"columnsFrom": ["repository_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
},
"snippets_version_id_repository_versions_id_fk": {
"name": "snippets_version_id_repository_versions_id_fk",
"tableFrom": "snippets",
"tableTo": "repository_versions",
"columnsFrom": ["version_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
}
},
"views": {},
"enums": {},
"schemas": {},
"sequences": {},
"_meta": {
"schemas": {},
"tables": {},
"columns": {}
},
"internal": {
"indexes": {}
}
}

View File

@@ -22,6 +22,13 @@
"when": 1774461897742, "when": 1774461897742,
"tag": "0002_silky_stellaris", "tag": "0002_silky_stellaris",
"breakpoints": true "breakpoints": true
},
{
"idx": 3,
"version": "6",
"when": 1743155877000,
"tag": "0003_multiversion_config",
"breakpoints": true
} }
] ]
} }

View File

@@ -1,4 +1,13 @@
import { blob, integer, primaryKey, real, sqliteTable, text } from 'drizzle-orm/sqlite-core'; import { sql } from 'drizzle-orm';
import {
blob,
integer,
primaryKey,
real,
sqliteTable,
text,
uniqueIndex
} from 'drizzle-orm/sqlite-core';
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------
// repositories // repositories
@@ -148,21 +157,33 @@ export const indexingJobs = sqliteTable('indexing_jobs', {
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------
// repository_configs // repository_configs
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------
export const repositoryConfigs = sqliteTable('repository_configs', { export const repositoryConfigs = sqliteTable(
repositoryId: text('repository_id') 'repository_configs',
.primaryKey() {
.references(() => repositories.id, { onDelete: 'cascade' }), repositoryId: text('repository_id')
projectTitle: text('project_title'), .notNull()
description: text('description'), .references(() => repositories.id, { onDelete: 'cascade' }),
folders: text('folders', { mode: 'json' }).$type<string[]>(), versionId: text('version_id'),
excludeFolders: text('exclude_folders', { mode: 'json' }).$type<string[]>(), projectTitle: text('project_title'),
excludeFiles: text('exclude_files', { mode: 'json' }).$type<string[]>(), description: text('description'),
rules: text('rules', { mode: 'json' }).$type<string[]>(), folders: text('folders', { mode: 'json' }).$type<string[]>(),
previousVersions: text('previous_versions', { mode: 'json' }).$type< excludeFolders: text('exclude_folders', { mode: 'json' }).$type<string[]>(),
{ tag: string; title: string; commitHash?: string }[] excludeFiles: text('exclude_files', { mode: 'json' }).$type<string[]>(),
>(), rules: text('rules', { mode: 'json' }).$type<string[]>(),
updatedAt: integer('updated_at', { mode: 'timestamp' }).notNull() previousVersions: text('previous_versions', { mode: 'json' }).$type<
}); { tag: string; title: string; commitHash?: string }[]
>(),
updatedAt: integer('updated_at', { mode: 'timestamp' }).notNull()
},
(table) => [
uniqueIndex('uniq_repo_config_base')
.on(table.repositoryId)
.where(sql`${table.versionId} IS NULL`),
uniqueIndex('uniq_repo_config_version')
.on(table.repositoryId, table.versionId)
.where(sql`${table.versionId} IS NOT NULL`)
]
);
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------
// settings // settings

View File

@@ -15,6 +15,7 @@ import { LibrarySearchResult, SnippetSearchResult } from '$lib/server/models/sea
export interface ContextResponseMetadata { export interface ContextResponseMetadata {
localSource: boolean; localSource: boolean;
resultCount: number; resultCount: number;
searchModeUsed: string;
repository: { repository: {
id: string; id: string;
title: string; title: string;
@@ -130,7 +131,8 @@ export class ContextResponseMapper {
id: metadata.version.id id: metadata.version.id
}) })
: null, : null,
resultCount: metadata?.resultCount ?? snippets.length resultCount: metadata?.resultCount ?? snippets.length,
searchModeUsed: metadata?.searchModeUsed ?? 'keyword'
}); });
} }
} }

View File

@@ -173,6 +173,7 @@ export class ContextJsonResponseDto {
repository: ContextRepositoryJsonDto | null; repository: ContextRepositoryJsonDto | null;
version: ContextVersionJsonDto | null; version: ContextVersionJsonDto | null;
resultCount: number; resultCount: number;
searchModeUsed: string;
constructor(props: ContextJsonResponseDto) { constructor(props: ContextJsonResponseDto) {
this.snippets = props.snippets; this.snippets = props.snippets;
@@ -182,5 +183,6 @@ export class ContextJsonResponseDto {
this.repository = props.repository; this.repository = props.repository;
this.version = props.version; this.version = props.version;
this.resultCount = props.resultCount; this.resultCount = props.resultCount;
this.searchModeUsed = props.searchModeUsed;
} }
} }

View File

@@ -26,7 +26,8 @@ function createTestDb(): Database.Database {
for (const migrationFile of [ for (const migrationFile of [
'0000_large_master_chief.sql', '0000_large_master_chief.sql',
'0001_quick_nighthawk.sql', '0001_quick_nighthawk.sql',
'0002_silky_stellaris.sql' '0002_silky_stellaris.sql',
'0003_multiversion_config.sql'
]) { ]) {
const migrationSql = readFileSync(join(migrationsFolder, migrationFile), 'utf-8'); const migrationSql = readFileSync(join(migrationsFolder, migrationFile), 'utf-8');
@@ -75,6 +76,28 @@ function insertRepo(db: Database.Database, overrides: Partial<Record<string, unk
); );
} }
function insertVersion(
db: Database.Database,
overrides: Partial<Record<string, unknown>> = {}
): string {
const id = crypto.randomUUID();
db.prepare(
`INSERT INTO repository_versions
(id, repository_id, tag, title, state, total_snippets, indexed_at, created_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)`
).run(
overrides.id ?? id,
overrides.repository_id ?? '/test/repo',
overrides.tag ?? 'v1.0.0',
overrides.title ?? null,
overrides.state ?? 'pending',
overrides.total_snippets ?? 0,
overrides.indexed_at ?? null,
overrides.created_at ?? now
);
return (overrides.id as string) ?? id;
}
function insertJob( function insertJob(
db: Database.Database, db: Database.Database,
overrides: Partial<Record<string, unknown>> = {} overrides: Partial<Record<string, unknown>> = {}
@@ -245,6 +268,8 @@ describe('IndexingPipeline', () => {
crawlResult: { crawlResult: {
files: Array<{ path: string; content: string; sha: string; language: string }>; files: Array<{ path: string; content: string; sha: string; language: string }>;
totalFiles: number; totalFiles: number;
/** Optional pre-parsed config — simulates LocalCrawler returning CrawlResult.config. */
config?: Record<string, unknown>;
} = { files: [], totalFiles: 0 }, } = { files: [], totalFiles: 0 },
embeddingService: EmbeddingService | null = null embeddingService: EmbeddingService | null = null
) { ) {
@@ -272,8 +297,12 @@ describe('IndexingPipeline', () => {
); );
} }
function makeJob(repositoryId = '/test/repo') { function makeJob(repositoryId = '/test/repo', versionId?: string) {
const jobId = insertJob(db, { repository_id: repositoryId, status: 'queued' }); const jobId = insertJob(db, {
repository_id: repositoryId,
version_id: versionId ?? null,
status: 'queued'
});
return db.prepare(`SELECT * FROM indexing_jobs WHERE id = ?`).get(jobId) as { return db.prepare(`SELECT * FROM indexing_jobs WHERE id = ?`).get(jobId) as {
id: string; id: string;
repositoryId?: string; repositoryId?: string;
@@ -644,4 +673,349 @@ describe('IndexingPipeline', () => {
expect(finalJob.status).toBe('done'); expect(finalJob.status).toBe('done');
expect(finalJob.progress).toBe(100); expect(finalJob.progress).toBe(100);
}); });
it('updates repository_versions state to indexing then indexed when job has versionId', async () => {
const versionId = insertVersion(db, { tag: 'v1.0.0', state: 'pending' });
const files = [
{
path: 'README.md',
content: '# Hello\n\nThis is documentation.',
sha: 'sha-readme',
language: 'markdown'
}
];
const pipeline = makePipeline({ files, totalFiles: 1 });
const job = makeJob('/test/repo', versionId);
await pipeline.run(job as never);
const version = db
.prepare(`SELECT state, total_snippets, indexed_at FROM repository_versions WHERE id = ?`)
.get(versionId) as { state: string; total_snippets: number; indexed_at: number | null };
expect(version.state).toBe('indexed');
expect(version.total_snippets).toBeGreaterThan(0);
expect(version.indexed_at).not.toBeNull();
});
it('updates repository_versions state to error when pipeline throws and job has versionId', async () => {
const versionId = insertVersion(db, { tag: 'v1.0.0', state: 'pending' });
const errorCrawl = vi.fn().mockRejectedValue(new Error('crawl failed'));
const pipeline = new IndexingPipeline(
db,
errorCrawl as never,
{ crawl: errorCrawl } as never,
null
);
const job = makeJob('/test/repo', versionId);
await expect(pipeline.run(job as never)).rejects.toThrow('crawl failed');
const version = db
.prepare(`SELECT state FROM repository_versions WHERE id = ?`)
.get(versionId) as { state: string };
expect(version.state).toBe('error');
});
it('does not touch repository_versions when job has no versionId', async () => {
const versionId = insertVersion(db, { tag: 'v1.0.0', state: 'pending' });
const pipeline = makePipeline({ files: [], totalFiles: 0 });
const job = makeJob('/test/repo'); // no versionId
await pipeline.run(job as never);
const version = db
.prepare(`SELECT state FROM repository_versions WHERE id = ?`)
.get(versionId) as { state: string };
// State should remain 'pending' — pipeline with no versionId must not touch it
expect(version.state).toBe('pending');
});
it('calls LocalCrawler with ref=v1.2.0 when job has a versionId with tag v1.2.0', async () => {
const versionId = insertVersion(db, { tag: 'v1.2.0', state: 'pending' });
const crawl = vi.fn().mockResolvedValue({
files: [],
totalFiles: 0,
skippedFiles: 0,
branch: 'main',
commitSha: 'abc'
});
const pipeline = new IndexingPipeline(db, vi.fn() as never, { crawl } as never, null);
const job = makeJob('/test/repo', versionId);
await pipeline.run(job as never);
expect(crawl).toHaveBeenCalledWith({
rootPath: '/tmp/test-repo',
ref: 'v1.2.0'
});
});
it('calls LocalCrawler with ref=undefined when job has no versionId (main-branch)', async () => {
const crawl = vi.fn().mockResolvedValue({
files: [],
totalFiles: 0,
skippedFiles: 0,
branch: 'main',
commitSha: 'abc'
});
const pipeline = new IndexingPipeline(db, vi.fn() as never, { crawl } as never, null);
const job = makeJob('/test/repo'); // no versionId
await pipeline.run(job as never);
expect(crawl).toHaveBeenCalledWith({
rootPath: '/tmp/test-repo',
ref: undefined
});
});
it('excludes files matching excludeFiles patterns from trueref.json', async () => {
const truerefConfig = JSON.stringify({
excludeFiles: ['migration-guide.md', 'docs/legacy*']
});
const files = [
{
path: 'trueref.json',
content: truerefConfig,
sha: 'sha-config',
language: 'json'
},
{
path: 'README.md',
content: '# Hello\n\nThis is documentation.',
sha: 'sha-readme',
language: 'markdown'
},
{
path: 'migration-guide.md',
content: '# Migration Guide\n\nThis should be excluded.',
sha: 'sha-migration',
language: 'markdown'
},
{
path: 'docs/legacy-api.md',
content: '# Legacy API\n\nShould be excluded by glob prefix.',
sha: 'sha-legacy',
language: 'markdown'
}
];
const pipeline = makePipeline({ files, totalFiles: files.length });
const job = makeJob();
await pipeline.run(job as never);
const docs = db
.prepare(`SELECT file_path FROM documents ORDER BY file_path`)
.all() as { file_path: string }[];
const filePaths = docs.map((d) => d.file_path);
// migration-guide.md and docs/legacy-api.md must be absent.
expect(filePaths).not.toContain('migration-guide.md');
expect(filePaths).not.toContain('docs/legacy-api.md');
// README.md must still be indexed.
expect(filePaths).toContain('README.md');
});
it('persists repo-wide rules from trueref.json to repository_configs after indexing', async () => {
const truerefConfig = JSON.stringify({
rules: ['Always use TypeScript strict mode', 'Prefer async/await over callbacks']
});
const files = [
{
path: 'trueref.json',
content: truerefConfig,
sha: 'sha-config',
language: 'json'
}
];
const pipeline = makePipeline({ files, totalFiles: files.length });
const job = makeJob();
await pipeline.run(job as never);
const row = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id IS NULL`
)
.get() as { rules: string } | undefined;
expect(row).toBeDefined();
const rules = JSON.parse(row!.rules);
expect(rules).toEqual(['Always use TypeScript strict mode', 'Prefer async/await over callbacks']);
});
it('persists version-specific rules under (repositoryId, versionId) when job has versionId', async () => {
const versionId = insertVersion(db, { tag: 'v2.0.0', state: 'pending' });
const truerefConfig = JSON.stringify({
rules: ['This is v2. Use the new Builder API.']
});
const files = [
{
path: 'trueref.json',
content: truerefConfig,
sha: 'sha-config',
language: 'json'
}
];
const pipeline = makePipeline({ files, totalFiles: files.length });
const job = makeJob('/test/repo', versionId);
await pipeline.run(job as never);
// Repo-wide row (version_id IS NULL) must NOT be written by a version job —
// writing it here would contaminate the NULL entry with version-specific rules
// (Bug 5b regression guard).
const repoRow = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id IS NULL`
)
.get() as { rules: string } | undefined;
expect(repoRow).toBeUndefined();
// Version-specific row must exist with the correct rules.
const versionRow = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id = ?`
)
.get(versionId) as { rules: string } | undefined;
expect(versionRow).toBeDefined();
const rules = JSON.parse(versionRow!.rules);
expect(rules).toEqual(['This is v2. Use the new Builder API.']);
});
it('regression(Bug5b): version job does not overwrite the repo-wide NULL rules entry', async () => {
// Arrange: index the main branch first to establish a repo-wide rules entry.
const mainBranchRules = ['Always use TypeScript strict mode.'];
const mainPipeline = makePipeline({
files: [
{
path: 'trueref.json',
content: JSON.stringify({ rules: mainBranchRules }),
sha: 'sha-main-config',
language: 'json'
}
],
totalFiles: 1
});
const mainJob = makeJob('/test/repo'); // no versionId → main-branch job
await mainPipeline.run(mainJob as never);
// Confirm the repo-wide entry was written.
const afterMain = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id IS NULL`
)
.get() as { rules: string } | undefined;
expect(afterMain).toBeDefined();
expect(JSON.parse(afterMain!.rules)).toEqual(mainBranchRules);
// Act: index a version with different rules.
const versionId = insertVersion(db, { tag: 'v3.0.0', state: 'pending' });
const versionRules = ['v3 only: use the streaming API.'];
const versionPipeline = makePipeline({
files: [
{
path: 'trueref.json',
content: JSON.stringify({ rules: versionRules }),
sha: 'sha-v3-config',
language: 'json'
}
],
totalFiles: 1
});
const versionJob = makeJob('/test/repo', versionId);
await versionPipeline.run(versionJob as never);
// Assert: the repo-wide NULL entry must still contain the main-branch rules,
// not the version-specific ones.
const afterVersion = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id IS NULL`
)
.get() as { rules: string } | undefined;
expect(afterVersion).toBeDefined();
expect(JSON.parse(afterVersion!.rules)).toEqual(mainBranchRules);
// And the version-specific row must contain the version rules.
const versionRow = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id = ?`
)
.get(versionId) as { rules: string } | undefined;
expect(versionRow).toBeDefined();
expect(JSON.parse(versionRow!.rules)).toEqual(versionRules);
});
it('persists rules from CrawlResult.config even when trueref.json is absent from files (folders allowlist bug)', async () => {
// Regression test for MULTIVERSION-0001:
// When trueref.json specifies a `folders` allowlist (e.g. ["src/"]),
// shouldIndexFile() excludes trueref.json itself because it lives at the
// repo root. The LocalCrawler now carries the pre-parsed config in
// CrawlResult.config so the pipeline no longer needs to find the file in
// crawlResult.files[].
const pipeline = makePipeline({
// trueref.json is NOT in files — simulates it being excluded by folders allowlist.
files: [
{
path: 'src/index.ts',
content: 'export const x = 1;',
sha: 'sha-src',
language: 'typescript'
}
],
totalFiles: 1,
// The pre-parsed config is carried here instead (set by LocalCrawler).
config: { rules: ['Use strict TypeScript.', 'Avoid any.'] }
});
const job = makeJob();
await pipeline.run(job as never);
const row = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id IS NULL`
)
.get() as { rules: string } | undefined;
expect(row).toBeDefined();
const rules = JSON.parse(row!.rules);
expect(rules).toEqual(['Use strict TypeScript.', 'Avoid any.']);
});
it('persists version-specific rules from CrawlResult.config when trueref.json is excluded by folders allowlist', async () => {
const versionId = insertVersion(db, { tag: 'v3.0.0', state: 'pending' });
const pipeline = makePipeline({
files: [
{
path: 'src/index.ts',
content: 'export const x = 1;',
sha: 'sha-src',
language: 'typescript'
}
],
totalFiles: 1,
config: { rules: ['v3: use the streaming API.'] }
});
const job = makeJob('/test/repo', versionId);
await pipeline.run(job as never);
const versionRow = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id = ?`
)
.get(versionId) as { rules: string } | undefined;
expect(versionRow).toBeDefined();
const rules = JSON.parse(versionRow!.rules);
expect(rules).toEqual(['v3: use the streaming API.']);
});
}); });

View File

@@ -15,13 +15,14 @@
import { createHash, randomUUID } from 'node:crypto'; import { createHash, randomUUID } from 'node:crypto';
import type Database from 'better-sqlite3'; import type Database from 'better-sqlite3';
import type { Document, NewDocument, NewSnippet } from '$lib/types'; import type { Document, NewDocument, NewSnippet, TrueRefConfig } from '$lib/types';
import type { crawl as GithubCrawlFn } from '$lib/server/crawler/github.crawler.js'; import type { crawl as GithubCrawlFn } from '$lib/server/crawler/github.crawler.js';
import type { LocalCrawler } from '$lib/server/crawler/local.crawler.js'; import type { LocalCrawler } from '$lib/server/crawler/local.crawler.js';
import type { EmbeddingService } from '$lib/server/embeddings/embedding.service.js'; import type { EmbeddingService } from '$lib/server/embeddings/embedding.service.js';
import { RepositoryMapper } from '$lib/server/mappers/repository.mapper.js'; import { RepositoryMapper } from '$lib/server/mappers/repository.mapper.js';
import { IndexingJob } from '$lib/server/models/indexing-job.js'; import { IndexingJob } from '$lib/server/models/indexing-job.js';
import { Repository, RepositoryEntity } from '$lib/server/models/repository.js'; import { Repository, RepositoryEntity } from '$lib/server/models/repository.js';
import { resolveConfig, type ParsedConfig } from '$lib/server/config/config-parser.js';
import { parseFile } from '$lib/server/parser/index.js'; import { parseFile } from '$lib/server/parser/index.js';
import { computeTrustScore } from '$lib/server/search/trust-score.js'; import { computeTrustScore } from '$lib/server/search/trust-score.js';
import { computeDiff } from './diff.js'; import { computeDiff } from './diff.js';
@@ -90,18 +91,53 @@ export class IndexingPipeline {
// Mark repo as actively indexing. // Mark repo as actively indexing.
this.updateRepo(repo.id, { state: 'indexing' }); this.updateRepo(repo.id, { state: 'indexing' });
if (normJob.versionId) {
this.updateVersion(normJob.versionId, { state: 'indexing' });
}
// ---- Stage 1: Crawl ------------------------------------------------- // ---- Stage 1: Crawl -------------------------------------------------
const crawlResult = await this.crawl(repo); const versionTag = normJob.versionId
const totalFiles = crawlResult.totalFiles; ? this.getVersionTag(normJob.versionId)
: undefined;
const crawlResult = await this.crawl(repo, versionTag);
// Resolve trueref.json / context7.json configuration.
// Prefer the pre-parsed config carried in the CrawlResult (set by
// LocalCrawler so it is available even when a `folders` allowlist
// excludes the repo root and trueref.json never appears in files[]).
// Fall back to locating the file in crawlResult.files for GitHub crawls
// which do not yet populate CrawlResult.config.
let parsedConfig: ReturnType<typeof resolveConfig> | null = null;
if (crawlResult.config) {
// Config was pre-parsed by the crawler — wrap it in a ParsedConfig
// shell so the rest of the pipeline can use it uniformly.
parsedConfig = { config: crawlResult.config, source: 'trueref.json', warnings: [] } satisfies ParsedConfig;
} else {
const configFile = crawlResult.files.find(
(f) => f.path === 'trueref.json' || f.path === 'context7.json'
);
parsedConfig = configFile
? resolveConfig([{ filename: configFile.path, content: configFile.content }])
: null;
}
const excludeFiles: string[] = parsedConfig?.config.excludeFiles ?? [];
// Filter out excluded files before diff computation.
const filteredFiles =
excludeFiles.length > 0
? crawlResult.files.filter(
(f) => !excludeFiles.some((pattern) => IndexingPipeline.matchesExcludePattern(f.path, pattern))
)
: crawlResult.files;
const totalFiles = filteredFiles.length;
this.updateJob(job.id, { totalFiles }); this.updateJob(job.id, { totalFiles });
// ---- Stage 2: Parse & diff ------------------------------------------ // ---- Stage 2: Parse & diff ------------------------------------------
// Load all existing documents for this repo so computeDiff can // Load all existing documents for this repo so computeDiff can
// classify every crawled file and detect deletions. // classify every crawled file and detect deletions.
const existingDocs = this.getExistingDocuments(repo.id, normJob.versionId); const existingDocs = this.getExistingDocuments(repo.id, normJob.versionId);
const diff = computeDiff(crawlResult.files, existingDocs); const diff = computeDiff(filteredFiles, existingDocs);
// Accumulate new documents/snippets; skip unchanged files. // Accumulate new documents/snippets; skip unchanged files.
const newDocuments: NewDocument[] = []; const newDocuments: NewDocument[] = [];
@@ -229,6 +265,28 @@ export class IndexingPipeline {
lastIndexedAt: Math.floor(Date.now() / 1000) lastIndexedAt: Math.floor(Date.now() / 1000)
}); });
if (normJob.versionId) {
const versionStats = this.computeVersionStats(normJob.versionId);
this.updateVersion(normJob.versionId, {
state: 'indexed',
totalSnippets: versionStats.totalSnippets,
indexedAt: Math.floor(Date.now() / 1000)
});
}
// ---- Stage 6: Persist rules from config ----------------------------
if (parsedConfig?.config.rules?.length) {
if (!normJob.versionId) {
// Main-branch job: write the repo-wide entry only.
this.upsertRepoConfig(repo.id, null, parsedConfig.config.rules);
} else {
// Version job: write only the version-specific entry.
// Writing to the NULL row here would overwrite repo-wide rules
// with whatever the last-indexed version happened to carry.
this.upsertRepoConfig(repo.id, normJob.versionId, parsedConfig.config.rules);
}
}
this.updateJob(job.id, { this.updateJob(job.id, {
status: 'done', status: 'done',
progress: 100, progress: 100,
@@ -246,6 +304,9 @@ export class IndexingPipeline {
// Restore repo to error state but preserve any existing indexed data. // Restore repo to error state but preserve any existing indexed data.
this.updateRepo(repositoryId, { state: 'error' }); this.updateRepo(repositoryId, { state: 'error' });
if (normJob.versionId) {
this.updateVersion(normJob.versionId, { state: 'error' });
}
throw error; throw error;
} }
@@ -255,9 +316,11 @@ export class IndexingPipeline {
// Private — crawl // Private — crawl
// ------------------------------------------------------------------------- // -------------------------------------------------------------------------
private async crawl(repo: Repository): Promise<{ private async crawl(repo: Repository, ref?: string): Promise<{
files: Array<{ path: string; content: string; sha: string; size: number; language: string }>; files: Array<{ path: string; content: string; sha: string; size: number; language: string }>;
totalFiles: number; totalFiles: number;
/** Pre-parsed trueref.json / context7.json, or undefined when absent. */
config?: TrueRefConfig;
}> { }> {
if (repo.source === 'github') { if (repo.source === 'github') {
// Parse owner/repo from the canonical ID: "/owner/repo" // Parse owner/repo from the canonical ID: "/owner/repo"
@@ -272,7 +335,7 @@ export class IndexingPipeline {
const result = await this.githubCrawl({ const result = await this.githubCrawl({
owner, owner,
repo: repoName, repo: repoName,
ref: repo.branch ?? undefined, ref: ref ?? repo.branch ?? undefined,
token: repo.githubToken ?? undefined token: repo.githubToken ?? undefined
}); });
@@ -281,13 +344,20 @@ export class IndexingPipeline {
// Local filesystem crawl. // Local filesystem crawl.
const result = await this.localCrawler.crawl({ const result = await this.localCrawler.crawl({
rootPath: repo.sourceUrl, rootPath: repo.sourceUrl,
ref: repo.branch !== 'main' ? (repo.branch ?? undefined) : undefined ref: ref ?? (repo.branch !== 'main' ? (repo.branch ?? undefined) : undefined)
}); });
return { files: result.files, totalFiles: result.totalFiles }; return { files: result.files, totalFiles: result.totalFiles, config: result.config };
} }
} }
private getVersionTag(versionId: string): string | undefined {
const row = this.db
.prepare<[string], { tag: string }>(`SELECT tag FROM repository_versions WHERE id = ?`)
.get(versionId);
return row?.tag;
}
// ------------------------------------------------------------------------- // -------------------------------------------------------------------------
// Private — atomic snippet replacement // Private — atomic snippet replacement
// ------------------------------------------------------------------------- // -------------------------------------------------------------------------
@@ -384,6 +454,16 @@ export class IndexingPipeline {
}; };
} }
private computeVersionStats(versionId: string): { totalSnippets: number } {
const row = this.db
.prepare<[string], { total_snippets: number }>(
`SELECT COUNT(*) as total_snippets FROM snippets WHERE version_id = ?`
)
.get(versionId);
return { totalSnippets: row?.total_snippets ?? 0 };
}
// ------------------------------------------------------------------------- // -------------------------------------------------------------------------
// Private — DB helpers // Private — DB helpers
// ------------------------------------------------------------------------- // -------------------------------------------------------------------------
@@ -433,6 +513,73 @@ export class IndexingPipeline {
const values = [...Object.values(allFields), id]; const values = [...Object.values(allFields), id];
this.db.prepare(`UPDATE repositories SET ${sets} WHERE id = ?`).run(...values); this.db.prepare(`UPDATE repositories SET ${sets} WHERE id = ?`).run(...values);
} }
private updateVersion(id: string, fields: Record<string, unknown>): void {
const sets = Object.keys(fields)
.map((k) => `${toSnake(k)} = ?`)
.join(', ');
const values = [...Object.values(fields), id];
this.db.prepare(`UPDATE repository_versions SET ${sets} WHERE id = ?`).run(...values);
}
private upsertRepoConfig(
repositoryId: string,
versionId: string | null,
rules: string[]
): void {
const now = Math.floor(Date.now() / 1000);
// Use DELETE + INSERT because ON CONFLICT … DO UPDATE doesn't work reliably
// with partial unique indexes in all SQLite versions.
if (versionId === null) {
this.db
.prepare(
`DELETE FROM repository_configs WHERE repository_id = ? AND version_id IS NULL`
)
.run(repositoryId);
} else {
this.db
.prepare(
`DELETE FROM repository_configs WHERE repository_id = ? AND version_id = ?`
)
.run(repositoryId, versionId);
}
this.db
.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, ?, ?, ?)`
)
.run(repositoryId, versionId, JSON.stringify(rules), now);
}
// -------------------------------------------------------------------------
// Private — static helpers
// -------------------------------------------------------------------------
/**
* Returns true when `filePath` matches the given exclude `pattern`.
*
* Supported patterns:
* - Plain filename: `migration-guide.md` matches any path ending in `/migration-guide.md`
* or equal to `migration-guide.md`.
* - Glob prefix with wildcard: `docs/migration*` matches paths that start with `docs/migration`.
* - Exact path: `src/legacy/old-api.ts` matches exactly that path.
*/
private static matchesExcludePattern(filePath: string, pattern: string): boolean {
if (pattern.includes('*')) {
// Glob-style: treat everything before the '*' as a required prefix.
const prefix = pattern.slice(0, pattern.indexOf('*'));
return filePath.startsWith(prefix);
}
// No wildcard — treat as plain name or exact path.
if (!pattern.includes('/')) {
// Plain filename: match basename (path ends with /<pattern> or equals pattern).
return filePath === pattern || filePath.endsWith('/' + pattern);
}
// Contains a slash — exact path match.
return filePath === pattern;
}
} }
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------

View File

@@ -395,7 +395,7 @@ describe('HybridSearchService', () => {
seedSnippet(client, { repositoryId: repoId, documentId: docId, content: 'hello world' }); seedSnippet(client, { repositoryId: repoId, documentId: docId, content: 'hello world' });
const svc = new HybridSearchService(client, searchService, null); const svc = new HybridSearchService(client, searchService, null);
const results = await svc.search('hello', { repositoryId: repoId }); const { results } = await svc.search('hello', { repositoryId: repoId });
expect(results.length).toBeGreaterThan(0); expect(results.length).toBeGreaterThan(0);
expect(results[0].snippet.content).toBe('hello world'); expect(results[0].snippet.content).toBe('hello world');
@@ -406,14 +406,14 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]); const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider); const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('alpha zero', { repositoryId: repoId, alpha: 0 }); const { results } = await svc.search('alpha zero', { repositoryId: repoId, alpha: 0 });
expect(results.length).toBeGreaterThan(0); expect(results.length).toBeGreaterThan(0);
}); });
it('returns empty array when FTS5 query is blank and no provider', async () => { it('returns empty array when FTS5 query is blank and no provider', async () => {
const svc = new HybridSearchService(client, searchService, null); const svc = new HybridSearchService(client, searchService, null);
const results = await svc.search(' ', { repositoryId: repoId }); const { results } = await svc.search(' ', { repositoryId: repoId });
expect(results).toHaveLength(0); expect(results).toHaveLength(0);
}); });
@@ -425,7 +425,7 @@ describe('HybridSearchService', () => {
}); });
const svc = new HybridSearchService(client, searchService, makeNoopProvider()); const svc = new HybridSearchService(client, searchService, makeNoopProvider());
const results = await svc.search('noop fallback', { repositoryId: repoId }); const { results } = await svc.search('noop fallback', { repositoryId: repoId });
expect(results.length).toBeGreaterThan(0); expect(results.length).toBeGreaterThan(0);
}); });
@@ -445,7 +445,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0, 0, 0]]); const provider = makeMockProvider([[1, 0, 0, 0]]);
const svc = new HybridSearchService(client, searchService, provider); const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('hybrid search', { const { results } = await svc.search('hybrid search', {
repositoryId: repoId, repositoryId: repoId,
alpha: 0.5 alpha: 0.5
}); });
@@ -464,7 +464,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]); const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider); const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('deduplicate snippet', { const { results } = await svc.search('deduplicate snippet', {
repositoryId: repoId, repositoryId: repoId,
alpha: 0.5 alpha: 0.5
}); });
@@ -487,7 +487,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]); const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider); const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('pagination test', { const { results } = await svc.search('pagination test', {
repositoryId: repoId, repositoryId: repoId,
limit: 3, limit: 3,
alpha: 0.5 alpha: 0.5
@@ -519,7 +519,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]); const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider); const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('anything', { const { results } = await svc.search('anything', {
repositoryId: repoId, repositoryId: repoId,
alpha: 1 alpha: 1
}); });
@@ -543,7 +543,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]); const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider); const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('metadata check', { const { results } = await svc.search('metadata check', {
repositoryId: repoId, repositoryId: repoId,
alpha: 0.5 alpha: 0.5
}); });
@@ -580,7 +580,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]); const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider); const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('repository keyword', { const { results } = await svc.search('repository keyword', {
repositoryId: repoId, repositoryId: repoId,
alpha: 0.5 alpha: 0.5
}); });
@@ -607,7 +607,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]); const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider); const svc = new HybridSearchService(client, searchService, provider);
const codeResults = await svc.search('function example', { const { results: codeResults } = await svc.search('function example', {
repositoryId: repoId, repositoryId: repoId,
type: 'code', type: 'code',
alpha: 0.5 alpha: 0.5
@@ -632,7 +632,7 @@ describe('HybridSearchService', () => {
const svc = new HybridSearchService(client, searchService, provider); const svc = new HybridSearchService(client, searchService, provider);
// Should not throw and should return results. // Should not throw and should return results.
const results = await svc.search('default alpha hybrid', { repositoryId: repoId }); const { results } = await svc.search('default alpha hybrid', { repositoryId: repoId });
expect(Array.isArray(results)).toBe(true); expect(Array.isArray(results)).toBe(true);
}); });
@@ -761,7 +761,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client); const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, mockProvider); const hybridService = new HybridSearchService(client, searchService, mockProvider);
const results = await hybridService.search('keyword', { const { results } = await hybridService.search('keyword', {
repositoryId: repoId, repositoryId: repoId,
searchMode: 'keyword' searchMode: 'keyword'
}); });
@@ -820,7 +820,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client); const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, mockProvider); const hybridService = new HybridSearchService(client, searchService, mockProvider);
const results = await hybridService.search('semantic', { const { results } = await hybridService.search('semantic', {
repositoryId: repoId, repositoryId: repoId,
searchMode: 'semantic', searchMode: 'semantic',
profileId: 'test-profile' profileId: 'test-profile'
@@ -848,7 +848,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client); const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, null); const hybridService = new HybridSearchService(client, searchService, null);
const results = await hybridService.search('test query', { const { results } = await hybridService.search('test query', {
repositoryId: repoId, repositoryId: repoId,
searchMode: 'semantic' searchMode: 'semantic'
}); });
@@ -867,7 +867,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client); const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, mockProvider); const hybridService = new HybridSearchService(client, searchService, mockProvider);
const results = await hybridService.search(' ', { const { results } = await hybridService.search(' ', {
repositoryId: repoId, repositoryId: repoId,
searchMode: 'semantic' searchMode: 'semantic'
}); });
@@ -885,7 +885,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client); const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, noopProvider); const hybridService = new HybridSearchService(client, searchService, noopProvider);
const results = await hybridService.search('test query', { const { results } = await hybridService.search('test query', {
repositoryId: repoId, repositoryId: repoId,
searchMode: 'semantic' searchMode: 'semantic'
}); });
@@ -951,7 +951,7 @@ describe('HybridSearchService', () => {
const hybridService = new HybridSearchService(client, searchService, mockProvider); const hybridService = new HybridSearchService(client, searchService, mockProvider);
// Query with heavy punctuation that preprocesses to nothing. // Query with heavy punctuation that preprocesses to nothing.
const results = await hybridService.search('!!!@@@###', { const { results } = await hybridService.search('!!!@@@###', {
repositoryId: repoId, repositoryId: repoId,
searchMode: 'auto', searchMode: 'auto',
profileId: 'test-profile' profileId: 'test-profile'
@@ -978,7 +978,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client); const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, mockProvider); const hybridService = new HybridSearchService(client, searchService, mockProvider);
const results = await hybridService.search('hello', { const { results } = await hybridService.search('hello', {
repositoryId: repoId, repositoryId: repoId,
searchMode: 'auto' searchMode: 'auto'
}); });
@@ -1038,7 +1038,7 @@ describe('HybridSearchService', () => {
const hybridService = new HybridSearchService(client, searchService, mockProvider); const hybridService = new HybridSearchService(client, searchService, mockProvider);
// Query that won't match through FTS after punctuation normalization. // Query that won't match through FTS after punctuation normalization.
const results = await hybridService.search('%%%vector%%%', { const { results } = await hybridService.search('%%%vector%%%', {
repositoryId: repoId, repositoryId: repoId,
searchMode: 'hybrid', searchMode: 'hybrid',
alpha: 0.5, alpha: 0.5,
@@ -1064,7 +1064,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client); const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, null); const hybridService = new HybridSearchService(client, searchService, null);
const results = await hybridService.search('!!!@@@###$$$', { const { results } = await hybridService.search('!!!@@@###$$$', {
repositoryId: repoId repositoryId: repoId
}); });

View File

@@ -101,9 +101,12 @@ export class HybridSearchService {
* *
* @param query - Raw search string (preprocessing handled by SearchService). * @param query - Raw search string (preprocessing handled by SearchService).
* @param options - Search parameters including repositoryId and alpha blend. * @param options - Search parameters including repositoryId and alpha blend.
* @returns Ranked array of SnippetSearchResult, deduplicated by snippet ID. * @returns Object with ranked results array and the search mode actually used.
*/ */
async search(query: string, options: HybridSearchOptions): Promise<SnippetSearchResult[]> { async search(
query: string,
options: HybridSearchOptions
): Promise<{ results: SnippetSearchResult[]; searchModeUsed: string }> {
const limit = options.limit ?? 20; const limit = options.limit ?? 20;
const mode = options.searchMode ?? 'auto'; const mode = options.searchMode ?? 'auto';
@@ -127,12 +130,12 @@ export class HybridSearchService {
// Semantic mode: skip FTS entirely and use vector search only. // Semantic mode: skip FTS entirely and use vector search only.
if (mode === 'semantic') { if (mode === 'semantic') {
if (!this.embeddingProvider || !query.trim()) { if (!this.embeddingProvider || !query.trim()) {
return []; return { results: [], searchModeUsed: 'semantic' };
} }
const embeddings = await this.embeddingProvider.embed([query]); const embeddings = await this.embeddingProvider.embed([query]);
if (embeddings.length === 0) { if (embeddings.length === 0) {
return []; return { results: [], searchModeUsed: 'semantic' };
} }
const queryEmbedding = embeddings[0].values; const queryEmbedding = embeddings[0].values;
@@ -144,7 +147,10 @@ export class HybridSearchService {
}); });
const topIds = vectorResults.slice(0, limit).map((r) => r.snippetId); const topIds = vectorResults.slice(0, limit).map((r) => r.snippetId);
return this.fetchSnippetsByIds(topIds, options.repositoryId, options.type); return {
results: this.fetchSnippetsByIds(topIds, options.repositoryId, options.type),
searchModeUsed: 'semantic'
};
} }
// FTS5 mode (keyword) or hybrid/auto modes: try FTS first. // FTS5 mode (keyword) or hybrid/auto modes: try FTS first.
@@ -157,7 +163,7 @@ export class HybridSearchService {
// Degenerate cases: no provider or pure FTS5 mode. // Degenerate cases: no provider or pure FTS5 mode.
if (!this.embeddingProvider || alpha === 0) { if (!this.embeddingProvider || alpha === 0) {
return ftsResults.slice(0, limit); return { results: ftsResults.slice(0, limit), searchModeUsed: 'keyword' };
} }
// For auto/hybrid modes: if FTS yielded results, use them; otherwise try vector. // For auto/hybrid modes: if FTS yielded results, use them; otherwise try vector.
@@ -168,14 +174,14 @@ export class HybridSearchService {
// No FTS results: try vector search as a fallback in auto/hybrid modes. // No FTS results: try vector search as a fallback in auto/hybrid modes.
if (!query.trim()) { if (!query.trim()) {
// Query is empty; no point embedding it. // Query is empty; no point embedding it.
return []; return { results: [], searchModeUsed: 'keyword_fallback' };
} }
const embeddings = await this.embeddingProvider.embed([query]); const embeddings = await this.embeddingProvider.embed([query]);
// If provider fails (Noop returns empty array), we're done. // If provider fails (Noop returns empty array), we're done.
if (embeddings.length === 0) { if (embeddings.length === 0) {
return []; return { results: [], searchModeUsed: 'keyword_fallback' };
} }
const queryEmbedding = embeddings[0].values; const queryEmbedding = embeddings[0].values;
@@ -187,7 +193,10 @@ export class HybridSearchService {
}); });
const topIds = vectorResults.slice(0, limit).map((r) => r.snippetId); const topIds = vectorResults.slice(0, limit).map((r) => r.snippetId);
return this.fetchSnippetsByIds(topIds, options.repositoryId, options.type); return {
results: this.fetchSnippetsByIds(topIds, options.repositoryId, options.type),
searchModeUsed: 'keyword_fallback'
};
} }
// FTS has results: use RRF to blend with vector search (if alpha < 1). // FTS has results: use RRF to blend with vector search (if alpha < 1).
@@ -195,7 +204,7 @@ export class HybridSearchService {
// Provider may be a Noop (returns empty array) — fall back to FTS gracefully. // Provider may be a Noop (returns empty array) — fall back to FTS gracefully.
if (embeddings.length === 0) { if (embeddings.length === 0) {
return ftsResults.slice(0, limit); return { results: ftsResults.slice(0, limit), searchModeUsed: 'keyword' };
} }
const queryEmbedding = embeddings[0].values; const queryEmbedding = embeddings[0].values;
@@ -210,7 +219,10 @@ export class HybridSearchService {
// Pure vector mode: skip RRF and return vector results directly. // Pure vector mode: skip RRF and return vector results directly.
if (alpha === 1) { if (alpha === 1) {
const topIds = vectorResults.slice(0, limit).map((r) => r.snippetId); const topIds = vectorResults.slice(0, limit).map((r) => r.snippetId);
return this.fetchSnippetsByIds(topIds, options.repositoryId, options.type); return {
results: this.fetchSnippetsByIds(topIds, options.repositoryId, options.type),
searchModeUsed: 'semantic'
};
} }
// Build ranked lists for RRF. Score field is unused by RRF — only // Build ranked lists for RRF. Score field is unused by RRF — only
@@ -221,7 +233,10 @@ export class HybridSearchService {
const fused = reciprocalRankFusion(ftsRanked, vecRanked); const fused = reciprocalRankFusion(ftsRanked, vecRanked);
const topIds = fused.slice(0, limit).map((r) => r.id); const topIds = fused.slice(0, limit).map((r) => r.id);
return this.fetchSnippetsByIds(topIds, options.repositoryId, options.type); return {
results: this.fetchSnippetsByIds(topIds, options.repositoryId, options.type),
searchModeUsed: 'hybrid'
};
} }
// ------------------------------------------------------------------------- // -------------------------------------------------------------------------

View File

@@ -55,6 +55,7 @@ function createTestDb(): Database.Database {
const migration0 = readFileSync(join(migrationsFolder, '0000_large_master_chief.sql'), 'utf-8'); const migration0 = readFileSync(join(migrationsFolder, '0000_large_master_chief.sql'), 'utf-8');
const migration1 = readFileSync(join(migrationsFolder, '0001_quick_nighthawk.sql'), 'utf-8'); const migration1 = readFileSync(join(migrationsFolder, '0001_quick_nighthawk.sql'), 'utf-8');
const migration2 = readFileSync(join(migrationsFolder, '0002_silky_stellaris.sql'), 'utf-8'); const migration2 = readFileSync(join(migrationsFolder, '0002_silky_stellaris.sql'), 'utf-8');
const migration3 = readFileSync(join(migrationsFolder, '0003_multiversion_config.sql'), 'utf-8');
// Apply first migration // Apply first migration
const statements0 = migration0 const statements0 = migration0
@@ -85,6 +86,15 @@ function createTestDb(): Database.Database {
client.exec(statement); client.exec(statement);
} }
const statements3 = migration3
.split('--> statement-breakpoint')
.map((statement) => statement.trim())
.filter(Boolean);
for (const statement of statements3) {
client.exec(statement);
}
client.exec(readFileSync(ftsFile, 'utf-8')); client.exec(readFileSync(ftsFile, 'utf-8'));
return client; return client;
@@ -436,7 +446,11 @@ describe('API contract integration', () => {
const repositoryId = seedRepo(db); const repositoryId = seedRepo(db);
const versionId = seedVersion(db, repositoryId, 'v18.3.0'); const versionId = seedVersion(db, repositoryId, 'v18.3.0');
const documentId = seedDocument(db, repositoryId, versionId); const documentId = seedDocument(db, repositoryId, versionId);
seedRules(db, repositoryId, ['Prefer hooks over classes']); // Insert version-specific rules (versioned queries no longer inherit the NULL row).
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, ?, ?, ?)`
).run(repositoryId, versionId, JSON.stringify(['Prefer hooks over classes']), NOW_S);
seedSnippet(db, { seedSnippet(db, {
documentId, documentId,
repositoryId, repositoryId,
@@ -486,4 +500,198 @@ describe('API contract integration', () => {
isLocal: false isLocal: false
}); });
}); });
it('GET /api/v1/context returns only version-specific rules for versioned queries (no NULL row contamination)', async () => {
const repositoryId = seedRepo(db);
const versionId = seedVersion(db, repositoryId, 'v2.0.0');
const documentId = seedDocument(db, repositoryId, versionId);
// Insert repo-wide rules (version_id IS NULL) — these must NOT appear in versioned queries.
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, NULL, ?, ?)`
).run(repositoryId, JSON.stringify(['Repo-wide rule']), NOW_S);
// Insert version-specific rules.
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, ?, ?, ?)`
).run(repositoryId, versionId, JSON.stringify(['Version-specific rule']), NOW_S);
seedSnippet(db, {
documentId,
repositoryId,
versionId,
content: 'some versioned content'
});
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(`${repositoryId}/v2.0.0`)}&query=${encodeURIComponent('versioned content')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
// Only the version-specific rule should appear — NULL row must not contaminate.
expect(body.rules).toEqual(['Version-specific rule']);
});
it('GET /api/v1/context returns only repo-wide rules when no version is requested', async () => {
const repositoryId = seedRepo(db);
const documentId = seedDocument(db, repositoryId);
// Insert repo-wide rules (version_id IS NULL).
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, NULL, ?, ?)`
).run(repositoryId, JSON.stringify(['Repo-wide rule only']), NOW_S);
seedSnippet(db, { documentId, repositoryId, content: 'some content' });
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(repositoryId)}&query=${encodeURIComponent('some content')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
expect(body.rules).toEqual(['Repo-wide rule only']);
});
it('GET /api/v1/context versioned query returns only the version-specific rules row', async () => {
const repositoryId = seedRepo(db);
const versionId = seedVersion(db, repositoryId, 'v3.0.0');
const documentId = seedDocument(db, repositoryId, versionId);
const sharedRule = 'Use TypeScript strict mode';
// Insert repo-wide NULL row — must NOT bleed into versioned query results.
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, NULL, ?, ?)`
).run(repositoryId, JSON.stringify([sharedRule]), NOW_S);
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, ?, ?, ?)`
).run(repositoryId, versionId, JSON.stringify([sharedRule, 'Version-only rule']), NOW_S);
seedSnippet(db, { documentId, repositoryId, versionId, content: 'dedup test content' });
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(`${repositoryId}/v3.0.0`)}&query=${encodeURIComponent('dedup test')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
// Returns only the version-specific row as stored — no NULL row merge.
expect(body.rules).toEqual([sharedRule, 'Version-only rule']);
});
it('GET /api/v1/context versioned query returns empty rules when only NULL row exists (no NULL contamination)', async () => {
const repositoryId = seedRepo(db);
const versionId = seedVersion(db, repositoryId, 'v1.0.0');
const documentId = seedDocument(db, repositoryId, versionId);
// Only a repo-wide NULL row exists — no version-specific config.
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, NULL, ?, ?)`
).run(repositoryId, JSON.stringify(['HEAD rules that must not contaminate v1']), NOW_S);
seedSnippet(db, { documentId, repositoryId, versionId, content: 'v1 content' });
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(`${repositoryId}/v1.0.0`)}&query=${encodeURIComponent('v1 content')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
// No version-specific config row → empty rules. NULL row must not bleed in.
expect(body.rules).toEqual([]);
});
it('GET /api/v1/context returns 404 with VERSION_NOT_FOUND when version does not exist', async () => {
const repositoryId = seedRepo(db);
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(`${repositoryId}/v99.0.0`)}&query=${encodeURIComponent('foo')}`
)
} as never);
expect(response.status).toBe(404);
const body = await response.json();
expect(body.code).toBe('VERSION_NOT_FOUND');
});
it('GET /api/v1/context resolves a version by full commit SHA', async () => {
const repositoryId = seedRepo(db);
const fullSha = 'a'.repeat(40);
// Insert version with a commit_hash
db.prepare(
`INSERT INTO repository_versions
(id, repository_id, tag, commit_hash, state, total_snippets, indexed_at, created_at)
VALUES (?, ?, ?, ?, 'indexed', 0, ?, ?)`
).run(`${repositoryId}/v2.0.0`, repositoryId, 'v2.0.0', fullSha, NOW_S, NOW_S);
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(`${repositoryId}/${fullSha}`)}&query=${encodeURIComponent('anything')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
expect(body.version?.resolved).toBe('v2.0.0');
});
it('GET /api/v1/context resolves a version by short SHA prefix (8 chars)', async () => {
const repositoryId = seedRepo(db);
const fullSha = 'b1c2d3e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b9c0';
const shortSha = fullSha.slice(0, 8);
db.prepare(
`INSERT INTO repository_versions
(id, repository_id, tag, commit_hash, state, total_snippets, indexed_at, created_at)
VALUES (?, ?, ?, ?, 'indexed', 0, ?, ?)`
).run(`${repositoryId}/v3.0.0`, repositoryId, 'v3.0.0', fullSha, NOW_S, NOW_S);
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(`${repositoryId}/${shortSha}`)}&query=${encodeURIComponent('anything')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
expect(body.version?.resolved).toBe('v3.0.0');
});
it('GET /api/v1/context includes searchModeUsed in JSON response', async () => {
const repositoryId = seedRepo(db);
const documentId = seedDocument(db, repositoryId);
seedSnippet(db, {
documentId,
repositoryId,
content: 'search mode used test snippet'
});
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(repositoryId)}&query=${encodeURIComponent('search mode used')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
expect(body.searchModeUsed).toBeDefined();
expect(['keyword', 'semantic', 'hybrid', 'keyword_fallback']).toContain(body.searchModeUsed);
});
}); });

View File

@@ -54,24 +54,42 @@ interface RawRepoConfig {
rules: string | null; rules: string | null;
} }
function getRules(db: ReturnType<typeof getClient>, repositoryId: string): string[] { function parseRulesJson(raw: string | null | undefined): string[] {
const row = db if (!raw) return [];
.prepare<
[string],
RawRepoConfig
>(`SELECT rules FROM repository_configs WHERE repository_id = ?`)
.get(repositoryId);
if (!row?.rules) return [];
try { try {
const parsed = JSON.parse(row.rules); const parsed = JSON.parse(raw);
return Array.isArray(parsed) ? (parsed as string[]) : []; return Array.isArray(parsed) ? (parsed as string[]) : [];
} catch { } catch {
return []; return [];
} }
} }
function getRules(
db: ReturnType<typeof getClient>,
repositoryId: string,
versionId?: string
): string[] {
if (!versionId) {
// Unversioned query: return repo-wide (HEAD) rules only.
const row = db
.prepare<
[string],
RawRepoConfig
>(`SELECT rules FROM repository_configs WHERE repository_id = ? AND version_id IS NULL`)
.get(repositoryId);
return parseRulesJson(row?.rules);
}
// Versioned query: return only version-specific rules (no NULL row merge).
const row = db
.prepare<
[string, string],
RawRepoConfig
>(`SELECT rules FROM repository_configs WHERE repository_id = ? AND version_id = ?`)
.get(repositoryId, versionId);
return parseRulesJson(row?.rules);
}
interface RawRepoState { interface RawRepoState {
state: 'pending' | 'indexing' | 'indexed' | 'error'; state: 'pending' | 'indexing' | 'indexed' | 'error';
id: string; id: string;
@@ -198,6 +216,7 @@ export const GET: RequestHandler = async ({ url }) => {
let versionId: string | undefined; let versionId: string | undefined;
let resolvedVersion: RawVersionRow | undefined; let resolvedVersion: RawVersionRow | undefined;
if (parsed.version) { if (parsed.version) {
// Try exact tag match first.
resolvedVersion = db resolvedVersion = db
.prepare< .prepare<
[string, string], [string, string],
@@ -205,12 +224,33 @@ export const GET: RequestHandler = async ({ url }) => {
>(`SELECT id, tag FROM repository_versions WHERE repository_id = ? AND tag = ?`) >(`SELECT id, tag FROM repository_versions WHERE repository_id = ? AND tag = ?`)
.get(parsed.repositoryId, parsed.version); .get(parsed.repositoryId, parsed.version);
// Version not found is not fatal — fall back to default branch. // Fall back to commit hash prefix match (min 7 chars).
versionId = resolvedVersion?.id; if (!resolvedVersion && parsed.version.length >= 7) {
resolvedVersion = db
.prepare<
[string, string],
RawVersionRow
>(
`SELECT id, tag FROM repository_versions
WHERE repository_id = ? AND commit_hash LIKE ?`
)
.get(parsed.repositoryId, `${parsed.version}%`);
}
if (!resolvedVersion) {
return new Response(
JSON.stringify({
error: `Version ${parsed.version} not found for library ${parsed.repositoryId}`,
code: 'VERSION_NOT_FOUND'
}),
{ status: 404, headers: { 'Content-Type': 'application/json', ...CORS_HEADERS } }
);
}
versionId = resolvedVersion.id;
} }
// Execute hybrid search (falls back to FTS5 when no embedding provider is set). // Execute hybrid search (falls back to FTS5 when no embedding provider is set).
const searchResults = await hybridService.search(query, { const { results: searchResults, searchModeUsed } = await hybridService.search(query, {
repositoryId: parsed.repositoryId, repositoryId: parsed.repositoryId,
versionId, versionId,
limit: 50, // fetch more than needed; token budget will trim limit: 50, // fetch more than needed; token budget will trim
@@ -242,6 +282,7 @@ export const GET: RequestHandler = async ({ url }) => {
const metadata: ContextResponseMetadata = { const metadata: ContextResponseMetadata = {
localSource: repo.source === 'local', localSource: repo.source === 'local',
resultCount: selectedResults.length, resultCount: selectedResults.length,
searchModeUsed,
repository: { repository: {
id: repo.id, id: repo.id,
title: repo.title, title: repo.title,
@@ -260,8 +301,8 @@ export const GET: RequestHandler = async ({ url }) => {
snippetVersions snippetVersions
}; };
// Load rules from repository_configs. // Load rules from repository_configs (repo-wide + version-specific merged).
const rules = getRules(db, parsed.repositoryId); const rules = getRules(db, parsed.repositoryId, versionId);
if (responseType === 'txt') { if (responseType === 'txt') {
const text = formatContextTxt(selectedResults, rules, metadata); const text = formatContextTxt(selectedResults, rules, metadata);

View File

@@ -52,6 +52,9 @@
let showDiscoverPanel = $state(false); let showDiscoverPanel = $state(false);
let registerBusy = $state(false); let registerBusy = $state(false);
// Active version indexing jobs: tag -> jobId
let activeVersionJobs = $state<Record<string, string | undefined>>({});
// Remove confirm // Remove confirm
let removeTag = $state<string | null>(null); let removeTag = $state<string | null>(null);
@@ -115,6 +118,16 @@
activeJobId = d.job.id; activeJobId = d.job.id;
} }
const versionCount = d.versionJobs?.length ?? 0; const versionCount = d.versionJobs?.length ?? 0;
if (versionCount > 0) {
let next = { ...activeVersionJobs };
for (const vj of d.versionJobs) {
const matched = versions.find((v) => v.id === vj.versionId);
if (matched) {
next = { ...next, [matched.tag]: vj.id };
}
}
activeVersionJobs = next;
}
successMessage = successMessage =
versionCount > 0 versionCount > 0
? `Re-indexing started. Also queued ${versionCount} version job${versionCount === 1 ? '' : 's'}.` ? `Re-indexing started. Also queued ${versionCount} version job${versionCount === 1 ? '' : 's'}.`
@@ -157,6 +170,10 @@
const d = await res.json(); const d = await res.json();
throw new Error(d.error ?? 'Failed to add version'); throw new Error(d.error ?? 'Failed to add version');
} }
const d = await res.json();
if (d.job?.id) {
activeVersionJobs = { ...activeVersionJobs, [tag]: d.job.id };
}
addVersionTag = ''; addVersionTag = '';
await loadVersions(); await loadVersions();
} catch (e) { } catch (e) {
@@ -177,7 +194,10 @@
const d = await res.json(); const d = await res.json();
throw new Error(d.error ?? 'Failed to queue version indexing'); throw new Error(d.error ?? 'Failed to queue version indexing');
} }
await loadVersions(); const d = await res.json();
if (d.job?.id) {
activeVersionJobs = { ...activeVersionJobs, [tag]: d.job.id };
}
} catch (e) { } catch (e) {
errorMessage = (e as Error).message; errorMessage = (e as Error).message;
} }
@@ -244,8 +264,9 @@
registerBusy = true; registerBusy = true;
errorMessage = null; errorMessage = null;
try { try {
await Promise.all( const tags = [...selectedDiscoveredTags];
[...selectedDiscoveredTags].map((tag) => const responses = await Promise.all(
tags.map((tag) =>
fetch(`/api/v1/libs/${encodeURIComponent(repo.id)}/versions`, { fetch(`/api/v1/libs/${encodeURIComponent(repo.id)}/versions`, {
method: 'POST', method: 'POST',
headers: { 'Content-Type': 'application/json' }, headers: { 'Content-Type': 'application/json' },
@@ -253,6 +274,15 @@
}) })
) )
); );
const results = await Promise.all(responses.map((r) => (r.ok ? r.json() : null)));
let next = { ...activeVersionJobs };
for (let i = 0; i < tags.length; i++) {
const result = results[i];
if (result?.job?.id) {
next = { ...next, [tags[i]]: result.job.id };
}
}
activeVersionJobs = next;
showDiscoverPanel = false; showDiscoverPanel = false;
discoveredTags = []; discoveredTags = [];
selectedDiscoveredTags = new Set(); selectedDiscoveredTags = new Set();
@@ -346,7 +376,13 @@
{#if activeJobId} {#if activeJobId}
<div class="mt-4 rounded-xl border border-blue-100 bg-blue-50 p-4"> <div class="mt-4 rounded-xl border border-blue-100 bg-blue-50 p-4">
<p class="mb-2 text-sm font-medium text-blue-700">Indexing in progress</p> <p class="mb-2 text-sm font-medium text-blue-700">Indexing in progress</p>
<IndexingProgress jobId={activeJobId} /> <IndexingProgress
jobId={activeJobId}
oncomplete={() => {
activeJobId = null;
refreshRepo();
}}
/>
</div> </div>
{:else if repo.state === 'error'} {:else if repo.state === 'error'}
<div class="mt-4 rounded-xl border border-red-100 bg-red-50 p-4"> <div class="mt-4 rounded-xl border border-red-100 bg-red-50 p-4">
@@ -461,31 +497,67 @@
{:else} {:else}
<div class="divide-y divide-gray-100"> <div class="divide-y divide-gray-100">
{#each versions as version (version.id)} {#each versions as version (version.id)}
<div class="flex items-center justify-between py-2.5"> <div class="py-2.5">
<div class="flex items-center gap-3"> <div class="flex items-center justify-between">
<span class="font-mono text-sm font-medium text-gray-900">{version.tag}</span> <div class="flex items-center gap-3">
<span <span class="font-mono text-sm font-medium text-gray-900">{version.tag}</span>
class="rounded-full px-2 py-0.5 text-xs font-medium {stateColors[version.state] ?? <span
'bg-gray-100 text-gray-600'}" class="rounded-full px-2 py-0.5 text-xs font-medium {stateColors[version.state] ??
> 'bg-gray-100 text-gray-600'}"
{stateLabels[version.state] ?? version.state} >
</span> {stateLabels[version.state] ?? version.state}
</div> </span>
<div class="flex items-center gap-2"> </div>
<button <div class="flex items-center gap-2">
onclick={() => handleIndexVersion(version.tag)} <button
disabled={version.state === 'indexing'} onclick={() => handleIndexVersion(version.tag)}
class="rounded-lg border border-blue-200 px-3 py-1 text-xs font-medium text-blue-600 hover:bg-blue-50 disabled:cursor-not-allowed disabled:opacity-50" disabled={version.state === 'indexing' || !!activeVersionJobs[version.tag]}
> class="rounded-lg border border-blue-200 px-3 py-1 text-xs font-medium text-blue-600 hover:bg-blue-50 disabled:cursor-not-allowed disabled:opacity-50"
{version.state === 'indexing' ? 'Indexing...' : 'Index'} >
</button> {version.state === 'indexing' || !!activeVersionJobs[version.tag] ? 'Indexing...' : 'Index'}
<button </button>
onclick={() => (removeTag = version.tag)} <button
class="rounded-lg border border-red-100 px-3 py-1 text-xs font-medium text-red-500 hover:bg-red-50" onclick={() => (removeTag = version.tag)}
> class="rounded-lg border border-red-100 px-3 py-1 text-xs font-medium text-red-500 hover:bg-red-50"
Remove >
</button> Remove
</button>
</div>
</div> </div>
{#if version.totalSnippets > 0 || version.commitHash || version.indexedAt}
{@const metaParts = (
[
version.totalSnippets > 0
? { text: `${version.totalSnippets} snippets`, mono: false }
: null,
version.commitHash
? { text: version.commitHash.slice(0, 8), mono: true }
: null,
version.indexedAt
? { text: formatDate(version.indexedAt), mono: false }
: null
] as Array<{ text: string; mono: boolean } | null>
).filter((p): p is { text: string; mono: boolean } => p !== null)}
<div class="mt-1 flex items-center gap-1.5">
{#each metaParts as part, i (i)}
{#if i > 0}
<span class="text-xs text-gray-300">·</span>
{/if}
<span class="text-xs text-gray-400{part.mono ? ' font-mono' : ''}">{part.text}</span>
{/each}
</div>
{/if}
{#if !!activeVersionJobs[version.tag]}
<IndexingProgress
jobId={activeVersionJobs[version.tag]!}
oncomplete={() => {
const { [version.tag]: _, ...rest } = activeVersionJobs;
activeVersionJobs = rest;
loadVersions();
refreshRepo();
}}
/>
{/if}
</div> </div>
{/each} {/each}
</div> </div>