Compare commits

..

10 Commits

Author SHA1 Message Date
Giancarmine Salucci
e63279fcf6 improve readme, untrack agents 2026-03-29 18:35:47 +02:00
Giancarmine Salucci
a426f4305c Merge branch 'fix/MULTIVERSION-0001-trueref-config-crawl-result' 2026-03-29 12:44:47 +02:00
Giancarmine Salucci
23ea8f2b4b Merge branch 'fix/MULTIVERSION-0001-multi-version-indexing' 2026-03-29 12:44:47 +02:00
Giancarmine Salucci
0bf01e3057 last fix 2026-03-29 12:44:06 +02:00
Giancarmine Salucci
09c6f9f7c1 fix(MULTIVERSION-0001): eliminate NULL-row contamination in getRules
When a versioned query is made, getRules() now returns only the
version-specific repository_configs row. The NULL (HEAD/repo-wide)
row is no longer merged in, preventing v4 rules from bleeding into
v1/v2/v3 versioned context responses.

Tests updated to assert the isolation: versioned queries return only
their own rules row; a new test verifies that a version with no
config row returns an empty rules array even when a NULL row exists.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 11:47:31 +02:00
Giancarmine Salucci
bbc67f8064 fix(MULTIVERSION-0001): prevent version jobs from overwriting repo-wide NULL rules entry
Version jobs now write rules only to the version-specific (repo, versionId)
row. Previously every version job unconditionally wrote to the (repo, NULL)
row as well, causing whichever version indexed last to contaminate the
repo-wide rules that the context API merges into every query response.

Adds a regression test (Bug5b) that indexes the main branch, then indexes a
version with different rules, and asserts the NULL row still holds the
main-branch rules.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 01:15:58 +01:00
Giancarmine Salucci
cd4ea7112c fix(MULTIVERSION-0001): surface pre-parsed config in CrawlResult to fix rules persistence
When trueref.json specifies a `folders` allowlist (e.g. ["src/"]),
shouldIndexFile() excludes trueref.json itself because it lives at the
repo root. The indexing pipeline then searches crawlResult.files for the
config file, finds nothing, and never writes rules to repository_configs.

Fix (Option B): add a `config` field to CrawlResult so LocalCrawler
returns the pre-parsed config directly. The indexing pipeline now reads
crawlResult.config first instead of scanning files[], which resolves the
regression for all repos with a folders allowlist.

- Add `config?: RepoConfig` to CrawlResult in crawler/types.ts
- Return `config` from LocalCrawler.crawlDirectory()
- Update IndexingPipeline.crawl() to propagate CrawlResult.config
- Update IndexingPipeline.run() to prefer crawlResult.config over files
- Add regression tests covering the folders-allowlist exclusion scenario

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 17:27:53 +01:00
Giancarmine Salucci
666ec7d55f feat(MULTIVERSION-0001): wire trueref.json into pipeline + per-version rules
- Add migration 0003: recreate repository_configs with nullable version_id
  column and two partial unique indexes (repo-wide: version_id IS NULL,
  per-version: (repository_id, version_id) WHERE version_id IS NOT NULL)
- Update schema.ts to reflect the new composite structure with uniqueIndex
  partial constraints via drizzle-orm sql helper
- IndexingPipeline: parse trueref.json / context7.json after crawl, apply
  excludeFiles filter before diff computation, update totalFiles accordingly
- IndexingPipeline: persist repo-wide rules (version_id=null) and
  version-specific rules (when versionId set) via upsertRepoConfig helper
- Add matchesExcludePattern static helper supporting plain filename,
  glob prefix (docs/legacy*), and exact path patterns
- context endpoint: split getRules into repo-wide + version-specific lookup
  with dedup merge; pass versionId at call site
- Update test DB loaders to include migration 0003
- Add pipeline tests for excludeFiles, repo-wide rules persistence, and
  per-version rules persistence
- Add integration tests for merged rules, repo-only rules, and dedup logic

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:44:30 +01:00
Giancarmine Salucci
255838dcc0 fix(MULTIVERSION-0001): fix version isolation, 404 on unknown version, commit-hash lookup, and searchModeUsed
Bug 1: Thread version tag from run() into crawl() via getVersionTag() helper so
LocalCrawler and GithubCrawler receive the correct ref when indexing a named
version instead of always crawling HEAD.

Bug 2: Return HTTP 404 with code VERSION_NOT_FOUND when a requested version tag
is not found in repository_versions, instead of silently falling back to a
cross-version mixed result set.

Bug 4: Before returning 404, attempt a commit_hash prefix match (min 7 chars)
so callers can request a version by full or short SHA.

Bug 3: Change HybridSearchService.search() to return
{ results, searchModeUsed } and propagate searchModeUsed through
ContextResponseMetadata and ContextJsonResponseDto so callers can see which
strategy (keyword / semantic / hybrid / keyword_fallback) was actually used.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:31:15 +01:00
Giancarmine Salucci
417c6fd072 fix(MULTIVERSION-0001): fix version indexing pipeline state and UI reactivity
- Add updateVersion() helper to IndexingPipeline that writes to repository_versions
- Set version state to indexing/indexed/error at the appropriate pipeline stages
- Add computeVersionStats() to count snippets for a specific version
- Replace Map<string,string> with Record<string,string|undefined> for activeVersionJobs to fix Svelte 5 reactivity edge cases
- Remove premature loadVersions() call from handleIndexVersion (oncomplete fires it instead)
- Add refreshRepo() to version oncomplete callback so stat badges update after indexing
- Disable Index button when activeVersionJobs has an entry for that tag (not just version.state)
- Add three pipeline test cases covering versionId indexing, error, and no-touch paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:03:44 +01:00
26 changed files with 2094 additions and 142 deletions

1
.github/agents vendored
View File

@@ -1 +0,0 @@
/home/moze/Sources/copilot-agents/.github/agents

1
.github/schemas vendored
View File

@@ -1 +0,0 @@
/home/moze/Sources/copilot-agents/.github/schemas

1
.github/skills vendored
View File

@@ -1 +0,0 @@
/home/moze/Sources/copilot-agents/.github/skills

5
.gitignore vendored
View File

@@ -36,3 +36,8 @@ docs/docs_cache_state.yaml
# Claude Code — ignore local/machine-specific settings, keep project rules
.claude/
!.claude/rules/
# Github Copilot
.github/agents
.github/schemas
.github/skills

170
README.md
View File

@@ -16,9 +16,12 @@ The goal is straightforward: give your assistants accurate, current, version-awa
- Stores metadata in SQLite.
- Supports keyword search out of the box with SQLite FTS5.
- Supports semantic and hybrid search when an embedding provider is configured.
- Exposes REST endpoints for library discovery and documentation retrieval.
- Supports multi-version indexing: index specific git tags independently, query a version by appending it to the library ID.
- Discovers available git tags from local repositories automatically.
- Stores per-version rules from `trueref.json` and prepends them to every `query-docs` response.
- Exposes REST endpoints for library discovery, documentation retrieval, and version management.
- Exposes an MCP server over stdio and HTTP for AI clients.
- Provides a SvelteKit web UI for repository management, search, indexing jobs, and embedding settings.
- Provides a SvelteKit web UI for repository management, version management, search, indexing jobs, and embedding settings.
- Supports repository-level configuration through `trueref.json` or `context7.json`.
## Project status
@@ -28,10 +31,12 @@ TrueRef is under active development. The current codebase already includes:
- repository management
- indexing jobs and recovery on restart
- local and GitHub crawling
- version registration support
- multi-version indexing with git tag isolation
- automatic tag discovery for local git repositories
- per-version rules from `trueref.json` prepended to context responses
- context7-compatible API endpoints
- MCP stdio and HTTP transports
- configurable embedding providers
- configurable embedding providers (none / OpenAI-compatible / local ONNX)
## Architecture
@@ -66,7 +71,15 @@ Each indexed repository becomes a library with an ID such as `/facebook/react`.
### Versions
Libraries can register version tags. Queries can target a specific version by using a library ID such as `/facebook/react/v18.3.0`.
Libraries can register version tags. Each version is indexed independently so snippets from different releases never mix.
Query a specific version by appending the tag to the library ID:
```
/facebook/react/v18.3.0
```
For local repositories, TrueRef can discover all available git tags automatically via the versions/discover endpoint. Tags can be added through the UI on the repository detail page or via the REST API.
### Snippets
@@ -76,6 +89,8 @@ Documents are split into code and info snippets. These snippets are what search
Repository rules defined in `trueref.json` are prepended to `query-docs` responses so assistants get usage constraints along with the retrieved content.
Rules are stored per version when a version-specific config is found during indexing, so different releases can carry different usage guidance.
## Requirements
- Node.js 20+
@@ -153,6 +168,12 @@ Use the main page to:
- delete an indexed repository
- monitor active indexing jobs
Open a repository's detail page to:
- view registered version tags
- discover available git tags (local repositories)
- trigger version-specific indexing jobs
### Search
Use the Search page to:
@@ -175,21 +196,40 @@ If no embedding provider is configured, TrueRef still works with FTS5-only searc
## Repository configuration
TrueRef supports a repository-local config file named `trueref.json`.
You can place a `trueref.json` file at the **root** of any repository you index. TrueRef reads it during every indexing run to control what gets indexed and what gets shown to AI assistants.
For compatibility with existing context7-style repositories, `context7.json` is also supported.
For backward compatibility with repositories that already have a `context7.json`, that file is also supported. When both files are present, `trueref.json` takes precedence.
### What the config controls
### Where to place it
- project display title
- project description
- included folders
- excluded folders
- excluded file names
- assistant-facing usage rules
- previously released versions
```
my-library/
├── trueref.json ← here, at the repository root
├── src/
├── docs/
└── ...
```
### Example `trueref.json`
For GitHub repositories, TrueRef fetches the file from the default branch root. For local repositories, it reads it from the filesystem root of the indexed folder.
### Fields
| Field | Type | Required | Description |
|---|---|---|---|
| `$schema` | string | No | URL to the live JSON Schema for editor validation |
| `projectTitle` | string | No | Display name override (max 100 chars) |
| `description` | string | No | Library description used for search ranking (10500 chars) |
| `folders` | string[] | No | Path prefixes or regex strings to **include** (max 50 items). If absent, all folders are included |
| `excludeFolders` | string[] | No | Path prefixes or regex strings to **exclude** after the `folders` allowlist (max 50 items) |
| `excludeFiles` | string[] | No | Exact filenames to skip — no path, no glob (max 100 items) |
| `rules` | string[] | No | Best-practice rules prepended to every `query-docs` response (max 20 rules, 5500 chars each) |
| `previousVersions` | object[] | No | Version tags to register when the repository is indexed (max 50 entries) |
`previousVersions` entries each require a `tag` (e.g. `"v1.2.3"`) and a `title` (e.g. `"Version 1.2.3"`).
The parser is intentionally lenient: unknown keys are silently ignored, mistyped values are skipped with a warning, and oversized strings or arrays are truncated. Only invalid JSON or a non-object root is a hard error.
### Full example
```json
{
@@ -197,30 +237,76 @@ For compatibility with existing context7-style repositories, `context7.json` is
"projectTitle": "My Internal SDK",
"description": "Internal SDK for billing, auth, and event ingestion.",
"folders": ["src/", "docs/"],
"excludeFolders": ["tests/", "fixtures/", "node_modules/"],
"excludeFiles": ["CHANGELOG.md"],
"excludeFolders": ["tests/", "fixtures/", "node_modules/", "__mocks__/"],
"excludeFiles": ["CHANGELOG.md", "jest.config.ts"],
"rules": [
"Prefer named imports over wildcard imports.",
"Use the async client API for all network calls."
"Use the async client API for all network calls.",
"Never import from internal sub-paths — use the package root only."
],
"previousVersions": [
{
"tag": "v1.2.3",
"title": "Version 1.2.3"
}
{ "tag": "v2.0.0", "title": "Version 2.0.0" },
{ "tag": "v1.2.3", "title": "Version 1.2.3 (legacy)" }
]
}
```
### JSON schema
### How `folders` and `excludeFolders` are matched
You can point your editor to the live schema served by TrueRef:
Both fields accept strings that are matched against the full relative file path within the repository. A string is treated as a path prefix unless it starts with `^`, in which case it is compiled as a regex:
```text
```json
{
"folders": ["src/", "docs/", "^packages/core"],
"excludeFolders": ["src/internal/", "__tests__"]
}
```
- `"src/"` — includes any file whose path starts with `src/`
- `"^packages/core"` — regex, includes only `packages/core` not `packages/core-utils`
`excludeFolders` is applied **after** the `folders` allowlist, so you can narrow a broad include with a targeted exclude.
### How `rules` are used
Rules are stored in the database at index time and automatically prepended to every `query-docs` response for that library (and version). This means AI assistants receive them alongside the retrieved snippets without any extra configuration.
When a version is indexed, the rules from the config found at that version's checkout are stored separately. Different version tags can therefore carry different rules.
Example context response with rules prepended:
```
RULES:
- Prefer named imports over wildcard imports.
- Use the async client API for all network calls.
LIBRARY DOCUMENTATION:
...
```
### How `previousVersions` works
When TrueRef indexes a repository and finds `previousVersions`, it registers those tags in the versions table. The tags are then available for version-specific indexing and queries without any further manual registration.
This is useful when you want all historical releases available from a fresh TrueRef setup without manually triggering one indexing job per version.
### JSON Schema for editor support
TrueRef serves a live JSON Schema at:
```
http://localhost:5173/api/v1/schema/trueref-config.json
```
That enables validation and autocomplete in editors that support JSON Schema references.
Add it to your `trueref.json` via the `$schema` field to get inline validation and autocomplete in VS Code, IntelliJ, and any other editor that supports JSON Schema Draft 07:
```json
{
"$schema": "http://localhost:5173/api/v1/schema/trueref-config.json"
}
```
If you are running TrueRef on a server, replace `localhost:5173` with your actual host and port. The schema endpoint always reflects the version of TrueRef you are running.
## REST API
@@ -299,6 +385,36 @@ curl "http://localhost:5173/api/v1/jobs"
curl "http://localhost:5173/api/v1/jobs/<job-id>"
```
### Version management
List registered versions for a library:
```sh
curl "http://localhost:5173/api/v1/libs/%2Ffacebook%2Freact/versions"
```
Index a specific version tag:
```sh
curl -X POST "http://localhost:5173/api/v1/libs/%2Ffacebook%2Freact/versions/v18.3.0/index"
```
Discover available git tags (local repositories only):
```sh
curl -X POST "http://localhost:5173/api/v1/libs/%2Fpath%2Fto%2Fmy-lib/versions/discover"
```
Returns `{ "tags": [{ "tag": "v1.0.0", "commitHash": "abc123" }, ...] }`. Returns an empty array for GitHub repositories.
### Version-targeted context retrieval
Append the version tag to the library ID to retrieve snippets from a specific indexed version:
```sh
curl "http://localhost:5173/api/v1/context?libraryId=/facebook/react/v18.3.0&query=how%20to%20use%20useEffect&type=txt"
```
### Response formats
The two search endpoints support:

View File

@@ -24,7 +24,12 @@ import type { Handle } from '@sveltejs/kit';
try {
initializeDatabase();
} catch (err) {
console.error('[hooks.server] FATAL: database initialisation failed:', err);
process.exit(1);
}
try {
const db = getClient();
const activeProfileRow = db
.prepare<[], EmbeddingProfileEntityProps>(
@@ -46,7 +51,8 @@ try {
console.log('[hooks.server] Indexing pipeline initialised.');
} catch (err) {
console.error(
`[hooks.server] Failed to initialise server: ${err instanceof Error ? err.message : String(err)}`
'[hooks.server] Failed to initialise pipeline:',
err instanceof Error ? err.message : String(err)
);
}

View File

@@ -1,13 +1,14 @@
<script lang="ts">
import type { IndexingJob } from '$lib/types';
let { jobId }: { jobId: string } = $props();
let { jobId, oncomplete }: { jobId: string; oncomplete?: () => void } = $props();
let job = $state<IndexingJob | null>(null);
$effect(() => {
job = null;
let stopped = false;
let completeFired = false;
async function poll() {
if (stopped) return;
@@ -16,6 +17,10 @@
if (res.ok) {
const data = await res.json();
job = data.job;
if (!completeFired && (job?.status === 'done' || job?.status === 'failed')) {
completeFired = true;
oncomplete?.();
}
}
} catch {
// ignore transient errors

View File

@@ -5,7 +5,7 @@ import RepositoryCard from './RepositoryCard.svelte';
describe('RepositoryCard.svelte', () => {
it('encodes slash-bearing repository ids in the details href', async () => {
render(RepositoryCard, {
const { container } = await render(RepositoryCard, {
repo: {
id: '/facebook/react',
title: 'React',
@@ -26,7 +26,8 @@ describe('RepositoryCard.svelte', () => {
.element(page.getByRole('link', { name: 'Details' }))
.toHaveAttribute('href', '/repos/%2Ffacebook%2Freact');
await expect.element(page.getByText('1,200 embeddings')).toBeInTheDocument();
await expect.element(page.getByText('Indexed: main, v18.3.0')).toBeInTheDocument();
const text = container.textContent ?? '';
expect(text).toMatch(/1[,.\u00a0\u202f]?200 embeddings/);
expect(text).toContain('Indexed: main, v18.3.0');
});
});

View File

@@ -143,6 +143,9 @@ export function formatContextTxt(
}
noResults.push(`Result count: ${metadata?.resultCount ?? 0}`);
if (metadata?.searchModeUsed) {
noResults.push(`Search mode: ${metadata.searchModeUsed}`);
}
parts.push(noResults.join('\n'));
return parts.join('\n\n');

View File

@@ -413,6 +413,59 @@ describe('LocalCrawler.crawl() — config file detection', () => {
const result = await crawlRoot();
expect(result.files.some((f) => f.path === 'src/index.ts')).toBe(true);
});
it('populates CrawlResult.config with the parsed trueref.json even when folders allowlist excludes the root', async () => {
// Regression test for MULTIVERSION-0001:
// When folders: ["src/"] is set, trueref.json at the root is excluded from
// files[] by shouldIndexFile(). The config must still be returned in
// CrawlResult.config so the indexing pipeline can persist rules.
root = await makeTempRepo({
'trueref.json': JSON.stringify({
folders: ['src/'],
rules: ['Always document public APIs.']
}),
'src/index.ts': 'export {};',
'docs/guide.md': '# Guide'
});
const result = await crawlRoot();
// trueref.json must NOT appear in files (excluded by folders allowlist).
expect(result.files.some((f) => f.path === 'trueref.json')).toBe(false);
// docs/guide.md must NOT appear (outside src/).
expect(result.files.some((f) => f.path === 'docs/guide.md')).toBe(false);
// src/index.ts must appear (inside src/).
expect(result.files.some((f) => f.path === 'src/index.ts')).toBe(true);
// CrawlResult.config must carry the parsed config.
expect(result.config).toBeDefined();
expect(result.config?.rules).toEqual(['Always document public APIs.']);
});
it('populates CrawlResult.config with the parsed context7.json', async () => {
root = await makeTempRepo({
'context7.json': JSON.stringify({ rules: ['Rule from context7.'] }),
'src/index.ts': 'export {};'
});
const result = await crawlRoot();
expect(result.config).toBeDefined();
expect(result.config?.rules).toEqual(['Rule from context7.']);
});
it('CrawlResult.config is undefined when no config file is present', async () => {
root = await makeTempRepo({ 'src/index.ts': 'export {};' });
const result = await crawlRoot();
expect(result.config).toBeUndefined();
});
it('CrawlResult.config is undefined when caller supplies config (caller-provided takes precedence, no auto-detect)', async () => {
root = await makeTempRepo({
'trueref.json': JSON.stringify({ rules: ['From file.'] }),
'src/index.ts': 'export {};'
});
// Caller-supplied config prevents auto-detection; CrawlResult.config
// should carry the caller config (not the file content).
const result = await crawlRoot({ config: { rules: ['From caller.'] } });
expect(result.config?.rules).toEqual(['From caller.']);
});
});
// ---------------------------------------------------------------------------

View File

@@ -230,7 +230,11 @@ export class LocalCrawler {
totalFiles: filteredPaths.length,
skippedFiles: allRelPaths.length - filteredPaths.length,
branch,
commitSha
commitSha,
// Surface the pre-parsed config so the indexing pipeline can read rules
// without needing to find trueref.json inside crawledFiles (which fails
// when a `folders` allowlist excludes the repo root).
config: config ?? undefined
};
}

View File

@@ -35,6 +35,13 @@ export interface CrawlResult {
branch: string;
/** HEAD commit SHA */
commitSha: string;
/**
* Pre-parsed trueref.json / context7.json configuration found at the repo
* root during crawling. Carried here so the indexing pipeline can consume it
* directly without having to locate the config file in `files` — which fails
* when a `folders` allowlist excludes the repo root.
*/
config?: RepoConfig;
}
export interface CrawlOptions {

View File

@@ -30,6 +30,7 @@ const __dirname = dirname(fileURLToPath(import.meta.url));
*/
export function initializeDatabase(): void {
const migrationsFolder = join(__dirname, 'migrations');
console.log(`[db] Running migrations from ${migrationsFolder}...`);
migrate(db, { migrationsFolder });
// Apply FTS5 virtual table and trigger DDL (not expressible via Drizzle).

View File

@@ -0,0 +1,30 @@
PRAGMA foreign_keys=OFF;
--> statement-breakpoint
CREATE TABLE `__new_repository_configs` (
`repository_id` text NOT NULL,
`version_id` text,
`project_title` text,
`description` text,
`folders` text,
`exclude_folders` text,
`exclude_files` text,
`rules` text,
`previous_versions` text,
`updated_at` integer NOT NULL,
FOREIGN KEY (`repository_id`) REFERENCES `repositories`(`id`) ON UPDATE no action ON DELETE cascade
);
--> statement-breakpoint
INSERT INTO `__new_repository_configs`
(repository_id, version_id, project_title, description, folders, exclude_folders, exclude_files, rules, previous_versions, updated_at)
SELECT repository_id, NULL, project_title, description, folders, exclude_folders, exclude_files, rules, previous_versions, updated_at
FROM `repository_configs`;
--> statement-breakpoint
DROP TABLE `repository_configs`;
--> statement-breakpoint
ALTER TABLE `__new_repository_configs` RENAME TO `repository_configs`;
--> statement-breakpoint
PRAGMA foreign_keys=ON;
--> statement-breakpoint
CREATE UNIQUE INDEX `uniq_repo_config_base` ON `repository_configs` (`repository_id`) WHERE `version_id` IS NULL;
--> statement-breakpoint
CREATE UNIQUE INDEX `uniq_repo_config_version` ON `repository_configs` (`repository_id`, `version_id`) WHERE `version_id` IS NOT NULL;

View File

@@ -0,0 +1,835 @@
{
"version": "6",
"dialect": "sqlite",
"id": "a7c2e4f8-3b1d-4e9a-8f0c-6d5e2a1b9c7f",
"prevId": "31531dab-a199-4fc5-a889-1884940039cd",
"tables": {
"documents": {
"name": "documents",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"repository_id": {
"name": "repository_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"version_id": {
"name": "version_id",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"file_path": {
"name": "file_path",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"language": {
"name": "language",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"token_count": {
"name": "token_count",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"checksum": {
"name": "checksum",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"indexed_at": {
"name": "indexed_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {
"documents_repository_id_repositories_id_fk": {
"name": "documents_repository_id_repositories_id_fk",
"tableFrom": "documents",
"tableTo": "repositories",
"columnsFrom": ["repository_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
},
"documents_version_id_repository_versions_id_fk": {
"name": "documents_version_id_repository_versions_id_fk",
"tableFrom": "documents",
"tableTo": "repository_versions",
"columnsFrom": ["version_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"embedding_profiles": {
"name": "embedding_profiles",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"provider_kind": {
"name": "provider_kind",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"enabled": {
"name": "enabled",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": true
},
"is_default": {
"name": "is_default",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": false
},
"model": {
"name": "model",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"dimensions": {
"name": "dimensions",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"config": {
"name": "config",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"updated_at": {
"name": "updated_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"indexing_jobs": {
"name": "indexing_jobs",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"repository_id": {
"name": "repository_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"version_id": {
"name": "version_id",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"status": {
"name": "status",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": "'queued'"
},
"progress": {
"name": "progress",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"total_files": {
"name": "total_files",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"processed_files": {
"name": "processed_files",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"error": {
"name": "error",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"started_at": {
"name": "started_at",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"completed_at": {
"name": "completed_at",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {
"indexing_jobs_repository_id_repositories_id_fk": {
"name": "indexing_jobs_repository_id_repositories_id_fk",
"tableFrom": "indexing_jobs",
"tableTo": "repositories",
"columnsFrom": ["repository_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"repositories": {
"name": "repositories",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"description": {
"name": "description",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"source": {
"name": "source",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"source_url": {
"name": "source_url",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"branch": {
"name": "branch",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": "'main'"
},
"state": {
"name": "state",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": "'pending'"
},
"total_snippets": {
"name": "total_snippets",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"total_tokens": {
"name": "total_tokens",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"trust_score": {
"name": "trust_score",
"type": "real",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"benchmark_score": {
"name": "benchmark_score",
"type": "real",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"stars": {
"name": "stars",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"github_token": {
"name": "github_token",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"last_indexed_at": {
"name": "last_indexed_at",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"updated_at": {
"name": "updated_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"repository_configs": {
"name": "repository_configs",
"columns": {
"repository_id": {
"name": "repository_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"version_id": {
"name": "version_id",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"project_title": {
"name": "project_title",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"description": {
"name": "description",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"folders": {
"name": "folders",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"exclude_folders": {
"name": "exclude_folders",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"exclude_files": {
"name": "exclude_files",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"rules": {
"name": "rules",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"previous_versions": {
"name": "previous_versions",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"updated_at": {
"name": "updated_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {
"uniq_repo_config_base": {
"name": "uniq_repo_config_base",
"columns": ["repository_id"],
"isUnique": true,
"where": "`version_id` IS NULL"
},
"uniq_repo_config_version": {
"name": "uniq_repo_config_version",
"columns": ["repository_id", "version_id"],
"isUnique": true,
"where": "`version_id` IS NOT NULL"
}
},
"foreignKeys": {
"repository_configs_repository_id_repositories_id_fk": {
"name": "repository_configs_repository_id_repositories_id_fk",
"tableFrom": "repository_configs",
"tableTo": "repositories",
"columnsFrom": ["repository_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"repository_versions": {
"name": "repository_versions",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"repository_id": {
"name": "repository_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"tag": {
"name": "tag",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"commit_hash": {
"name": "commit_hash",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"state": {
"name": "state",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false,
"default": "'pending'"
},
"total_snippets": {
"name": "total_snippets",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"indexed_at": {
"name": "indexed_at",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {
"repository_versions_repository_id_repositories_id_fk": {
"name": "repository_versions_repository_id_repositories_id_fk",
"tableFrom": "repository_versions",
"tableTo": "repositories",
"columnsFrom": ["repository_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"settings": {
"name": "settings",
"columns": {
"key": {
"name": "key",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"value": {
"name": "value",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"updated_at": {
"name": "updated_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
},
"snippet_embeddings": {
"name": "snippet_embeddings",
"columns": {
"snippet_id": {
"name": "snippet_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"profile_id": {
"name": "profile_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"model": {
"name": "model",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"dimensions": {
"name": "dimensions",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"embedding": {
"name": "embedding",
"type": "blob",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {
"snippet_embeddings_snippet_id_snippets_id_fk": {
"name": "snippet_embeddings_snippet_id_snippets_id_fk",
"tableFrom": "snippet_embeddings",
"tableTo": "snippets",
"columnsFrom": ["snippet_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
},
"snippet_embeddings_profile_id_embedding_profiles_id_fk": {
"name": "snippet_embeddings_profile_id_embedding_profiles_id_fk",
"tableFrom": "snippet_embeddings",
"tableTo": "embedding_profiles",
"columnsFrom": ["profile_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {
"snippet_embeddings_snippet_id_profile_id_pk": {
"columns": ["snippet_id", "profile_id"],
"name": "snippet_embeddings_snippet_id_profile_id_pk"
}
},
"uniqueConstraints": {},
"checkConstraints": {}
},
"snippets": {
"name": "snippets",
"columns": {
"id": {
"name": "id",
"type": "text",
"primaryKey": true,
"notNull": true,
"autoincrement": false
},
"document_id": {
"name": "document_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"repository_id": {
"name": "repository_id",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"version_id": {
"name": "version_id",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"type": {
"name": "type",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"title": {
"name": "title",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"content": {
"name": "content",
"type": "text",
"primaryKey": false,
"notNull": true,
"autoincrement": false
},
"language": {
"name": "language",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"breadcrumb": {
"name": "breadcrumb",
"type": "text",
"primaryKey": false,
"notNull": false,
"autoincrement": false
},
"token_count": {
"name": "token_count",
"type": "integer",
"primaryKey": false,
"notNull": false,
"autoincrement": false,
"default": 0
},
"created_at": {
"name": "created_at",
"type": "integer",
"primaryKey": false,
"notNull": true,
"autoincrement": false
}
},
"indexes": {},
"foreignKeys": {
"snippets_document_id_documents_id_fk": {
"name": "snippets_document_id_documents_id_fk",
"tableFrom": "snippets",
"tableTo": "documents",
"columnsFrom": ["document_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
},
"snippets_repository_id_repositories_id_fk": {
"name": "snippets_repository_id_repositories_id_fk",
"tableFrom": "snippets",
"tableTo": "repositories",
"columnsFrom": ["repository_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
},
"snippets_version_id_repository_versions_id_fk": {
"name": "snippets_version_id_repository_versions_id_fk",
"tableFrom": "snippets",
"tableTo": "repository_versions",
"columnsFrom": ["version_id"],
"columnsTo": ["id"],
"onDelete": "cascade",
"onUpdate": "no action"
}
},
"compositePrimaryKeys": {},
"uniqueConstraints": {},
"checkConstraints": {}
}
},
"views": {},
"enums": {},
"schemas": {},
"sequences": {},
"_meta": {
"schemas": {},
"tables": {},
"columns": {}
},
"internal": {
"indexes": {}
}
}

View File

@@ -22,6 +22,13 @@
"when": 1774461897742,
"tag": "0002_silky_stellaris",
"breakpoints": true
},
{
"idx": 3,
"version": "6",
"when": 1743155877000,
"tag": "0003_multiversion_config",
"breakpoints": true
}
]
}

View File

@@ -1,4 +1,13 @@
import { blob, integer, primaryKey, real, sqliteTable, text } from 'drizzle-orm/sqlite-core';
import { sql } from 'drizzle-orm';
import {
blob,
integer,
primaryKey,
real,
sqliteTable,
text,
uniqueIndex
} from 'drizzle-orm/sqlite-core';
// ---------------------------------------------------------------------------
// repositories
@@ -148,21 +157,33 @@ export const indexingJobs = sqliteTable('indexing_jobs', {
// ---------------------------------------------------------------------------
// repository_configs
// ---------------------------------------------------------------------------
export const repositoryConfigs = sqliteTable('repository_configs', {
repositoryId: text('repository_id')
.primaryKey()
.references(() => repositories.id, { onDelete: 'cascade' }),
projectTitle: text('project_title'),
description: text('description'),
folders: text('folders', { mode: 'json' }).$type<string[]>(),
excludeFolders: text('exclude_folders', { mode: 'json' }).$type<string[]>(),
excludeFiles: text('exclude_files', { mode: 'json' }).$type<string[]>(),
rules: text('rules', { mode: 'json' }).$type<string[]>(),
previousVersions: text('previous_versions', { mode: 'json' }).$type<
{ tag: string; title: string; commitHash?: string }[]
>(),
updatedAt: integer('updated_at', { mode: 'timestamp' }).notNull()
});
export const repositoryConfigs = sqliteTable(
'repository_configs',
{
repositoryId: text('repository_id')
.notNull()
.references(() => repositories.id, { onDelete: 'cascade' }),
versionId: text('version_id'),
projectTitle: text('project_title'),
description: text('description'),
folders: text('folders', { mode: 'json' }).$type<string[]>(),
excludeFolders: text('exclude_folders', { mode: 'json' }).$type<string[]>(),
excludeFiles: text('exclude_files', { mode: 'json' }).$type<string[]>(),
rules: text('rules', { mode: 'json' }).$type<string[]>(),
previousVersions: text('previous_versions', { mode: 'json' }).$type<
{ tag: string; title: string; commitHash?: string }[]
>(),
updatedAt: integer('updated_at', { mode: 'timestamp' }).notNull()
},
(table) => [
uniqueIndex('uniq_repo_config_base')
.on(table.repositoryId)
.where(sql`${table.versionId} IS NULL`),
uniqueIndex('uniq_repo_config_version')
.on(table.repositoryId, table.versionId)
.where(sql`${table.versionId} IS NOT NULL`)
]
);
// ---------------------------------------------------------------------------
// settings

View File

@@ -15,6 +15,7 @@ import { LibrarySearchResult, SnippetSearchResult } from '$lib/server/models/sea
export interface ContextResponseMetadata {
localSource: boolean;
resultCount: number;
searchModeUsed: string;
repository: {
id: string;
title: string;
@@ -130,7 +131,8 @@ export class ContextResponseMapper {
id: metadata.version.id
})
: null,
resultCount: metadata?.resultCount ?? snippets.length
resultCount: metadata?.resultCount ?? snippets.length,
searchModeUsed: metadata?.searchModeUsed ?? 'keyword'
});
}
}

View File

@@ -173,6 +173,7 @@ export class ContextJsonResponseDto {
repository: ContextRepositoryJsonDto | null;
version: ContextVersionJsonDto | null;
resultCount: number;
searchModeUsed: string;
constructor(props: ContextJsonResponseDto) {
this.snippets = props.snippets;
@@ -182,5 +183,6 @@ export class ContextJsonResponseDto {
this.repository = props.repository;
this.version = props.version;
this.resultCount = props.resultCount;
this.searchModeUsed = props.searchModeUsed;
}
}

View File

@@ -26,7 +26,8 @@ function createTestDb(): Database.Database {
for (const migrationFile of [
'0000_large_master_chief.sql',
'0001_quick_nighthawk.sql',
'0002_silky_stellaris.sql'
'0002_silky_stellaris.sql',
'0003_multiversion_config.sql'
]) {
const migrationSql = readFileSync(join(migrationsFolder, migrationFile), 'utf-8');
@@ -75,6 +76,28 @@ function insertRepo(db: Database.Database, overrides: Partial<Record<string, unk
);
}
function insertVersion(
db: Database.Database,
overrides: Partial<Record<string, unknown>> = {}
): string {
const id = crypto.randomUUID();
db.prepare(
`INSERT INTO repository_versions
(id, repository_id, tag, title, state, total_snippets, indexed_at, created_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)`
).run(
overrides.id ?? id,
overrides.repository_id ?? '/test/repo',
overrides.tag ?? 'v1.0.0',
overrides.title ?? null,
overrides.state ?? 'pending',
overrides.total_snippets ?? 0,
overrides.indexed_at ?? null,
overrides.created_at ?? now
);
return (overrides.id as string) ?? id;
}
function insertJob(
db: Database.Database,
overrides: Partial<Record<string, unknown>> = {}
@@ -245,6 +268,8 @@ describe('IndexingPipeline', () => {
crawlResult: {
files: Array<{ path: string; content: string; sha: string; language: string }>;
totalFiles: number;
/** Optional pre-parsed config — simulates LocalCrawler returning CrawlResult.config. */
config?: Record<string, unknown>;
} = { files: [], totalFiles: 0 },
embeddingService: EmbeddingService | null = null
) {
@@ -272,8 +297,12 @@ describe('IndexingPipeline', () => {
);
}
function makeJob(repositoryId = '/test/repo') {
const jobId = insertJob(db, { repository_id: repositoryId, status: 'queued' });
function makeJob(repositoryId = '/test/repo', versionId?: string) {
const jobId = insertJob(db, {
repository_id: repositoryId,
version_id: versionId ?? null,
status: 'queued'
});
return db.prepare(`SELECT * FROM indexing_jobs WHERE id = ?`).get(jobId) as {
id: string;
repositoryId?: string;
@@ -644,4 +673,349 @@ describe('IndexingPipeline', () => {
expect(finalJob.status).toBe('done');
expect(finalJob.progress).toBe(100);
});
it('updates repository_versions state to indexing then indexed when job has versionId', async () => {
const versionId = insertVersion(db, { tag: 'v1.0.0', state: 'pending' });
const files = [
{
path: 'README.md',
content: '# Hello\n\nThis is documentation.',
sha: 'sha-readme',
language: 'markdown'
}
];
const pipeline = makePipeline({ files, totalFiles: 1 });
const job = makeJob('/test/repo', versionId);
await pipeline.run(job as never);
const version = db
.prepare(`SELECT state, total_snippets, indexed_at FROM repository_versions WHERE id = ?`)
.get(versionId) as { state: string; total_snippets: number; indexed_at: number | null };
expect(version.state).toBe('indexed');
expect(version.total_snippets).toBeGreaterThan(0);
expect(version.indexed_at).not.toBeNull();
});
it('updates repository_versions state to error when pipeline throws and job has versionId', async () => {
const versionId = insertVersion(db, { tag: 'v1.0.0', state: 'pending' });
const errorCrawl = vi.fn().mockRejectedValue(new Error('crawl failed'));
const pipeline = new IndexingPipeline(
db,
errorCrawl as never,
{ crawl: errorCrawl } as never,
null
);
const job = makeJob('/test/repo', versionId);
await expect(pipeline.run(job as never)).rejects.toThrow('crawl failed');
const version = db
.prepare(`SELECT state FROM repository_versions WHERE id = ?`)
.get(versionId) as { state: string };
expect(version.state).toBe('error');
});
it('does not touch repository_versions when job has no versionId', async () => {
const versionId = insertVersion(db, { tag: 'v1.0.0', state: 'pending' });
const pipeline = makePipeline({ files: [], totalFiles: 0 });
const job = makeJob('/test/repo'); // no versionId
await pipeline.run(job as never);
const version = db
.prepare(`SELECT state FROM repository_versions WHERE id = ?`)
.get(versionId) as { state: string };
// State should remain 'pending' — pipeline with no versionId must not touch it
expect(version.state).toBe('pending');
});
it('calls LocalCrawler with ref=v1.2.0 when job has a versionId with tag v1.2.0', async () => {
const versionId = insertVersion(db, { tag: 'v1.2.0', state: 'pending' });
const crawl = vi.fn().mockResolvedValue({
files: [],
totalFiles: 0,
skippedFiles: 0,
branch: 'main',
commitSha: 'abc'
});
const pipeline = new IndexingPipeline(db, vi.fn() as never, { crawl } as never, null);
const job = makeJob('/test/repo', versionId);
await pipeline.run(job as never);
expect(crawl).toHaveBeenCalledWith({
rootPath: '/tmp/test-repo',
ref: 'v1.2.0'
});
});
it('calls LocalCrawler with ref=undefined when job has no versionId (main-branch)', async () => {
const crawl = vi.fn().mockResolvedValue({
files: [],
totalFiles: 0,
skippedFiles: 0,
branch: 'main',
commitSha: 'abc'
});
const pipeline = new IndexingPipeline(db, vi.fn() as never, { crawl } as never, null);
const job = makeJob('/test/repo'); // no versionId
await pipeline.run(job as never);
expect(crawl).toHaveBeenCalledWith({
rootPath: '/tmp/test-repo',
ref: undefined
});
});
it('excludes files matching excludeFiles patterns from trueref.json', async () => {
const truerefConfig = JSON.stringify({
excludeFiles: ['migration-guide.md', 'docs/legacy*']
});
const files = [
{
path: 'trueref.json',
content: truerefConfig,
sha: 'sha-config',
language: 'json'
},
{
path: 'README.md',
content: '# Hello\n\nThis is documentation.',
sha: 'sha-readme',
language: 'markdown'
},
{
path: 'migration-guide.md',
content: '# Migration Guide\n\nThis should be excluded.',
sha: 'sha-migration',
language: 'markdown'
},
{
path: 'docs/legacy-api.md',
content: '# Legacy API\n\nShould be excluded by glob prefix.',
sha: 'sha-legacy',
language: 'markdown'
}
];
const pipeline = makePipeline({ files, totalFiles: files.length });
const job = makeJob();
await pipeline.run(job as never);
const docs = db
.prepare(`SELECT file_path FROM documents ORDER BY file_path`)
.all() as { file_path: string }[];
const filePaths = docs.map((d) => d.file_path);
// migration-guide.md and docs/legacy-api.md must be absent.
expect(filePaths).not.toContain('migration-guide.md');
expect(filePaths).not.toContain('docs/legacy-api.md');
// README.md must still be indexed.
expect(filePaths).toContain('README.md');
});
it('persists repo-wide rules from trueref.json to repository_configs after indexing', async () => {
const truerefConfig = JSON.stringify({
rules: ['Always use TypeScript strict mode', 'Prefer async/await over callbacks']
});
const files = [
{
path: 'trueref.json',
content: truerefConfig,
sha: 'sha-config',
language: 'json'
}
];
const pipeline = makePipeline({ files, totalFiles: files.length });
const job = makeJob();
await pipeline.run(job as never);
const row = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id IS NULL`
)
.get() as { rules: string } | undefined;
expect(row).toBeDefined();
const rules = JSON.parse(row!.rules);
expect(rules).toEqual(['Always use TypeScript strict mode', 'Prefer async/await over callbacks']);
});
it('persists version-specific rules under (repositoryId, versionId) when job has versionId', async () => {
const versionId = insertVersion(db, { tag: 'v2.0.0', state: 'pending' });
const truerefConfig = JSON.stringify({
rules: ['This is v2. Use the new Builder API.']
});
const files = [
{
path: 'trueref.json',
content: truerefConfig,
sha: 'sha-config',
language: 'json'
}
];
const pipeline = makePipeline({ files, totalFiles: files.length });
const job = makeJob('/test/repo', versionId);
await pipeline.run(job as never);
// Repo-wide row (version_id IS NULL) must NOT be written by a version job —
// writing it here would contaminate the NULL entry with version-specific rules
// (Bug 5b regression guard).
const repoRow = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id IS NULL`
)
.get() as { rules: string } | undefined;
expect(repoRow).toBeUndefined();
// Version-specific row must exist with the correct rules.
const versionRow = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id = ?`
)
.get(versionId) as { rules: string } | undefined;
expect(versionRow).toBeDefined();
const rules = JSON.parse(versionRow!.rules);
expect(rules).toEqual(['This is v2. Use the new Builder API.']);
});
it('regression(Bug5b): version job does not overwrite the repo-wide NULL rules entry', async () => {
// Arrange: index the main branch first to establish a repo-wide rules entry.
const mainBranchRules = ['Always use TypeScript strict mode.'];
const mainPipeline = makePipeline({
files: [
{
path: 'trueref.json',
content: JSON.stringify({ rules: mainBranchRules }),
sha: 'sha-main-config',
language: 'json'
}
],
totalFiles: 1
});
const mainJob = makeJob('/test/repo'); // no versionId → main-branch job
await mainPipeline.run(mainJob as never);
// Confirm the repo-wide entry was written.
const afterMain = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id IS NULL`
)
.get() as { rules: string } | undefined;
expect(afterMain).toBeDefined();
expect(JSON.parse(afterMain!.rules)).toEqual(mainBranchRules);
// Act: index a version with different rules.
const versionId = insertVersion(db, { tag: 'v3.0.0', state: 'pending' });
const versionRules = ['v3 only: use the streaming API.'];
const versionPipeline = makePipeline({
files: [
{
path: 'trueref.json',
content: JSON.stringify({ rules: versionRules }),
sha: 'sha-v3-config',
language: 'json'
}
],
totalFiles: 1
});
const versionJob = makeJob('/test/repo', versionId);
await versionPipeline.run(versionJob as never);
// Assert: the repo-wide NULL entry must still contain the main-branch rules,
// not the version-specific ones.
const afterVersion = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id IS NULL`
)
.get() as { rules: string } | undefined;
expect(afterVersion).toBeDefined();
expect(JSON.parse(afterVersion!.rules)).toEqual(mainBranchRules);
// And the version-specific row must contain the version rules.
const versionRow = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id = ?`
)
.get(versionId) as { rules: string } | undefined;
expect(versionRow).toBeDefined();
expect(JSON.parse(versionRow!.rules)).toEqual(versionRules);
});
it('persists rules from CrawlResult.config even when trueref.json is absent from files (folders allowlist bug)', async () => {
// Regression test for MULTIVERSION-0001:
// When trueref.json specifies a `folders` allowlist (e.g. ["src/"]),
// shouldIndexFile() excludes trueref.json itself because it lives at the
// repo root. The LocalCrawler now carries the pre-parsed config in
// CrawlResult.config so the pipeline no longer needs to find the file in
// crawlResult.files[].
const pipeline = makePipeline({
// trueref.json is NOT in files — simulates it being excluded by folders allowlist.
files: [
{
path: 'src/index.ts',
content: 'export const x = 1;',
sha: 'sha-src',
language: 'typescript'
}
],
totalFiles: 1,
// The pre-parsed config is carried here instead (set by LocalCrawler).
config: { rules: ['Use strict TypeScript.', 'Avoid any.'] }
});
const job = makeJob();
await pipeline.run(job as never);
const row = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id IS NULL`
)
.get() as { rules: string } | undefined;
expect(row).toBeDefined();
const rules = JSON.parse(row!.rules);
expect(rules).toEqual(['Use strict TypeScript.', 'Avoid any.']);
});
it('persists version-specific rules from CrawlResult.config when trueref.json is excluded by folders allowlist', async () => {
const versionId = insertVersion(db, { tag: 'v3.0.0', state: 'pending' });
const pipeline = makePipeline({
files: [
{
path: 'src/index.ts',
content: 'export const x = 1;',
sha: 'sha-src',
language: 'typescript'
}
],
totalFiles: 1,
config: { rules: ['v3: use the streaming API.'] }
});
const job = makeJob('/test/repo', versionId);
await pipeline.run(job as never);
const versionRow = db
.prepare(
`SELECT rules FROM repository_configs WHERE repository_id = '/test/repo' AND version_id = ?`
)
.get(versionId) as { rules: string } | undefined;
expect(versionRow).toBeDefined();
const rules = JSON.parse(versionRow!.rules);
expect(rules).toEqual(['v3: use the streaming API.']);
});
});

View File

@@ -15,13 +15,14 @@
import { createHash, randomUUID } from 'node:crypto';
import type Database from 'better-sqlite3';
import type { Document, NewDocument, NewSnippet } from '$lib/types';
import type { Document, NewDocument, NewSnippet, TrueRefConfig } from '$lib/types';
import type { crawl as GithubCrawlFn } from '$lib/server/crawler/github.crawler.js';
import type { LocalCrawler } from '$lib/server/crawler/local.crawler.js';
import type { EmbeddingService } from '$lib/server/embeddings/embedding.service.js';
import { RepositoryMapper } from '$lib/server/mappers/repository.mapper.js';
import { IndexingJob } from '$lib/server/models/indexing-job.js';
import { Repository, RepositoryEntity } from '$lib/server/models/repository.js';
import { resolveConfig, type ParsedConfig } from '$lib/server/config/config-parser.js';
import { parseFile } from '$lib/server/parser/index.js';
import { computeTrustScore } from '$lib/server/search/trust-score.js';
import { computeDiff } from './diff.js';
@@ -90,18 +91,53 @@ export class IndexingPipeline {
// Mark repo as actively indexing.
this.updateRepo(repo.id, { state: 'indexing' });
if (normJob.versionId) {
this.updateVersion(normJob.versionId, { state: 'indexing' });
}
// ---- Stage 1: Crawl -------------------------------------------------
const crawlResult = await this.crawl(repo);
const totalFiles = crawlResult.totalFiles;
const versionTag = normJob.versionId
? this.getVersionTag(normJob.versionId)
: undefined;
const crawlResult = await this.crawl(repo, versionTag);
// Resolve trueref.json / context7.json configuration.
// Prefer the pre-parsed config carried in the CrawlResult (set by
// LocalCrawler so it is available even when a `folders` allowlist
// excludes the repo root and trueref.json never appears in files[]).
// Fall back to locating the file in crawlResult.files for GitHub crawls
// which do not yet populate CrawlResult.config.
let parsedConfig: ReturnType<typeof resolveConfig> | null = null;
if (crawlResult.config) {
// Config was pre-parsed by the crawler — wrap it in a ParsedConfig
// shell so the rest of the pipeline can use it uniformly.
parsedConfig = { config: crawlResult.config, source: 'trueref.json', warnings: [] } satisfies ParsedConfig;
} else {
const configFile = crawlResult.files.find(
(f) => f.path === 'trueref.json' || f.path === 'context7.json'
);
parsedConfig = configFile
? resolveConfig([{ filename: configFile.path, content: configFile.content }])
: null;
}
const excludeFiles: string[] = parsedConfig?.config.excludeFiles ?? [];
// Filter out excluded files before diff computation.
const filteredFiles =
excludeFiles.length > 0
? crawlResult.files.filter(
(f) => !excludeFiles.some((pattern) => IndexingPipeline.matchesExcludePattern(f.path, pattern))
)
: crawlResult.files;
const totalFiles = filteredFiles.length;
this.updateJob(job.id, { totalFiles });
// ---- Stage 2: Parse & diff ------------------------------------------
// Load all existing documents for this repo so computeDiff can
// classify every crawled file and detect deletions.
const existingDocs = this.getExistingDocuments(repo.id, normJob.versionId);
const diff = computeDiff(crawlResult.files, existingDocs);
const diff = computeDiff(filteredFiles, existingDocs);
// Accumulate new documents/snippets; skip unchanged files.
const newDocuments: NewDocument[] = [];
@@ -229,6 +265,28 @@ export class IndexingPipeline {
lastIndexedAt: Math.floor(Date.now() / 1000)
});
if (normJob.versionId) {
const versionStats = this.computeVersionStats(normJob.versionId);
this.updateVersion(normJob.versionId, {
state: 'indexed',
totalSnippets: versionStats.totalSnippets,
indexedAt: Math.floor(Date.now() / 1000)
});
}
// ---- Stage 6: Persist rules from config ----------------------------
if (parsedConfig?.config.rules?.length) {
if (!normJob.versionId) {
// Main-branch job: write the repo-wide entry only.
this.upsertRepoConfig(repo.id, null, parsedConfig.config.rules);
} else {
// Version job: write only the version-specific entry.
// Writing to the NULL row here would overwrite repo-wide rules
// with whatever the last-indexed version happened to carry.
this.upsertRepoConfig(repo.id, normJob.versionId, parsedConfig.config.rules);
}
}
this.updateJob(job.id, {
status: 'done',
progress: 100,
@@ -246,6 +304,9 @@ export class IndexingPipeline {
// Restore repo to error state but preserve any existing indexed data.
this.updateRepo(repositoryId, { state: 'error' });
if (normJob.versionId) {
this.updateVersion(normJob.versionId, { state: 'error' });
}
throw error;
}
@@ -255,9 +316,11 @@ export class IndexingPipeline {
// Private — crawl
// -------------------------------------------------------------------------
private async crawl(repo: Repository): Promise<{
private async crawl(repo: Repository, ref?: string): Promise<{
files: Array<{ path: string; content: string; sha: string; size: number; language: string }>;
totalFiles: number;
/** Pre-parsed trueref.json / context7.json, or undefined when absent. */
config?: TrueRefConfig;
}> {
if (repo.source === 'github') {
// Parse owner/repo from the canonical ID: "/owner/repo"
@@ -272,7 +335,7 @@ export class IndexingPipeline {
const result = await this.githubCrawl({
owner,
repo: repoName,
ref: repo.branch ?? undefined,
ref: ref ?? repo.branch ?? undefined,
token: repo.githubToken ?? undefined
});
@@ -281,13 +344,20 @@ export class IndexingPipeline {
// Local filesystem crawl.
const result = await this.localCrawler.crawl({
rootPath: repo.sourceUrl,
ref: repo.branch !== 'main' ? (repo.branch ?? undefined) : undefined
ref: ref ?? (repo.branch !== 'main' ? (repo.branch ?? undefined) : undefined)
});
return { files: result.files, totalFiles: result.totalFiles };
return { files: result.files, totalFiles: result.totalFiles, config: result.config };
}
}
private getVersionTag(versionId: string): string | undefined {
const row = this.db
.prepare<[string], { tag: string }>(`SELECT tag FROM repository_versions WHERE id = ?`)
.get(versionId);
return row?.tag;
}
// -------------------------------------------------------------------------
// Private — atomic snippet replacement
// -------------------------------------------------------------------------
@@ -384,6 +454,16 @@ export class IndexingPipeline {
};
}
private computeVersionStats(versionId: string): { totalSnippets: number } {
const row = this.db
.prepare<[string], { total_snippets: number }>(
`SELECT COUNT(*) as total_snippets FROM snippets WHERE version_id = ?`
)
.get(versionId);
return { totalSnippets: row?.total_snippets ?? 0 };
}
// -------------------------------------------------------------------------
// Private — DB helpers
// -------------------------------------------------------------------------
@@ -433,6 +513,73 @@ export class IndexingPipeline {
const values = [...Object.values(allFields), id];
this.db.prepare(`UPDATE repositories SET ${sets} WHERE id = ?`).run(...values);
}
private updateVersion(id: string, fields: Record<string, unknown>): void {
const sets = Object.keys(fields)
.map((k) => `${toSnake(k)} = ?`)
.join(', ');
const values = [...Object.values(fields), id];
this.db.prepare(`UPDATE repository_versions SET ${sets} WHERE id = ?`).run(...values);
}
private upsertRepoConfig(
repositoryId: string,
versionId: string | null,
rules: string[]
): void {
const now = Math.floor(Date.now() / 1000);
// Use DELETE + INSERT because ON CONFLICT … DO UPDATE doesn't work reliably
// with partial unique indexes in all SQLite versions.
if (versionId === null) {
this.db
.prepare(
`DELETE FROM repository_configs WHERE repository_id = ? AND version_id IS NULL`
)
.run(repositoryId);
} else {
this.db
.prepare(
`DELETE FROM repository_configs WHERE repository_id = ? AND version_id = ?`
)
.run(repositoryId, versionId);
}
this.db
.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, ?, ?, ?)`
)
.run(repositoryId, versionId, JSON.stringify(rules), now);
}
// -------------------------------------------------------------------------
// Private — static helpers
// -------------------------------------------------------------------------
/**
* Returns true when `filePath` matches the given exclude `pattern`.
*
* Supported patterns:
* - Plain filename: `migration-guide.md` matches any path ending in `/migration-guide.md`
* or equal to `migration-guide.md`.
* - Glob prefix with wildcard: `docs/migration*` matches paths that start with `docs/migration`.
* - Exact path: `src/legacy/old-api.ts` matches exactly that path.
*/
private static matchesExcludePattern(filePath: string, pattern: string): boolean {
if (pattern.includes('*')) {
// Glob-style: treat everything before the '*' as a required prefix.
const prefix = pattern.slice(0, pattern.indexOf('*'));
return filePath.startsWith(prefix);
}
// No wildcard — treat as plain name or exact path.
if (!pattern.includes('/')) {
// Plain filename: match basename (path ends with /<pattern> or equals pattern).
return filePath === pattern || filePath.endsWith('/' + pattern);
}
// Contains a slash — exact path match.
return filePath === pattern;
}
}
// ---------------------------------------------------------------------------

View File

@@ -395,7 +395,7 @@ describe('HybridSearchService', () => {
seedSnippet(client, { repositoryId: repoId, documentId: docId, content: 'hello world' });
const svc = new HybridSearchService(client, searchService, null);
const results = await svc.search('hello', { repositoryId: repoId });
const { results } = await svc.search('hello', { repositoryId: repoId });
expect(results.length).toBeGreaterThan(0);
expect(results[0].snippet.content).toBe('hello world');
@@ -406,14 +406,14 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('alpha zero', { repositoryId: repoId, alpha: 0 });
const { results } = await svc.search('alpha zero', { repositoryId: repoId, alpha: 0 });
expect(results.length).toBeGreaterThan(0);
});
it('returns empty array when FTS5 query is blank and no provider', async () => {
const svc = new HybridSearchService(client, searchService, null);
const results = await svc.search(' ', { repositoryId: repoId });
const { results } = await svc.search(' ', { repositoryId: repoId });
expect(results).toHaveLength(0);
});
@@ -425,7 +425,7 @@ describe('HybridSearchService', () => {
});
const svc = new HybridSearchService(client, searchService, makeNoopProvider());
const results = await svc.search('noop fallback', { repositoryId: repoId });
const { results } = await svc.search('noop fallback', { repositoryId: repoId });
expect(results.length).toBeGreaterThan(0);
});
@@ -445,7 +445,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0, 0, 0]]);
const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('hybrid search', {
const { results } = await svc.search('hybrid search', {
repositoryId: repoId,
alpha: 0.5
});
@@ -464,7 +464,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('deduplicate snippet', {
const { results } = await svc.search('deduplicate snippet', {
repositoryId: repoId,
alpha: 0.5
});
@@ -487,7 +487,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('pagination test', {
const { results } = await svc.search('pagination test', {
repositoryId: repoId,
limit: 3,
alpha: 0.5
@@ -519,7 +519,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('anything', {
const { results } = await svc.search('anything', {
repositoryId: repoId,
alpha: 1
});
@@ -543,7 +543,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('metadata check', {
const { results } = await svc.search('metadata check', {
repositoryId: repoId,
alpha: 0.5
});
@@ -580,7 +580,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider);
const results = await svc.search('repository keyword', {
const { results } = await svc.search('repository keyword', {
repositoryId: repoId,
alpha: 0.5
});
@@ -607,7 +607,7 @@ describe('HybridSearchService', () => {
const provider = makeMockProvider([[1, 0]]);
const svc = new HybridSearchService(client, searchService, provider);
const codeResults = await svc.search('function example', {
const { results: codeResults } = await svc.search('function example', {
repositoryId: repoId,
type: 'code',
alpha: 0.5
@@ -632,7 +632,7 @@ describe('HybridSearchService', () => {
const svc = new HybridSearchService(client, searchService, provider);
// Should not throw and should return results.
const results = await svc.search('default alpha hybrid', { repositoryId: repoId });
const { results } = await svc.search('default alpha hybrid', { repositoryId: repoId });
expect(Array.isArray(results)).toBe(true);
});
@@ -761,7 +761,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, mockProvider);
const results = await hybridService.search('keyword', {
const { results } = await hybridService.search('keyword', {
repositoryId: repoId,
searchMode: 'keyword'
});
@@ -820,7 +820,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, mockProvider);
const results = await hybridService.search('semantic', {
const { results } = await hybridService.search('semantic', {
repositoryId: repoId,
searchMode: 'semantic',
profileId: 'test-profile'
@@ -848,7 +848,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, null);
const results = await hybridService.search('test query', {
const { results } = await hybridService.search('test query', {
repositoryId: repoId,
searchMode: 'semantic'
});
@@ -867,7 +867,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, mockProvider);
const results = await hybridService.search(' ', {
const { results } = await hybridService.search(' ', {
repositoryId: repoId,
searchMode: 'semantic'
});
@@ -885,7 +885,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, noopProvider);
const results = await hybridService.search('test query', {
const { results } = await hybridService.search('test query', {
repositoryId: repoId,
searchMode: 'semantic'
});
@@ -951,7 +951,7 @@ describe('HybridSearchService', () => {
const hybridService = new HybridSearchService(client, searchService, mockProvider);
// Query with heavy punctuation that preprocesses to nothing.
const results = await hybridService.search('!!!@@@###', {
const { results } = await hybridService.search('!!!@@@###', {
repositoryId: repoId,
searchMode: 'auto',
profileId: 'test-profile'
@@ -978,7 +978,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, mockProvider);
const results = await hybridService.search('hello', {
const { results } = await hybridService.search('hello', {
repositoryId: repoId,
searchMode: 'auto'
});
@@ -1038,7 +1038,7 @@ describe('HybridSearchService', () => {
const hybridService = new HybridSearchService(client, searchService, mockProvider);
// Query that won't match through FTS after punctuation normalization.
const results = await hybridService.search('%%%vector%%%', {
const { results } = await hybridService.search('%%%vector%%%', {
repositoryId: repoId,
searchMode: 'hybrid',
alpha: 0.5,
@@ -1064,7 +1064,7 @@ describe('HybridSearchService', () => {
const searchService = new SearchService(client);
const hybridService = new HybridSearchService(client, searchService, null);
const results = await hybridService.search('!!!@@@###$$$', {
const { results } = await hybridService.search('!!!@@@###$$$', {
repositoryId: repoId
});

View File

@@ -101,9 +101,12 @@ export class HybridSearchService {
*
* @param query - Raw search string (preprocessing handled by SearchService).
* @param options - Search parameters including repositoryId and alpha blend.
* @returns Ranked array of SnippetSearchResult, deduplicated by snippet ID.
* @returns Object with ranked results array and the search mode actually used.
*/
async search(query: string, options: HybridSearchOptions): Promise<SnippetSearchResult[]> {
async search(
query: string,
options: HybridSearchOptions
): Promise<{ results: SnippetSearchResult[]; searchModeUsed: string }> {
const limit = options.limit ?? 20;
const mode = options.searchMode ?? 'auto';
@@ -127,12 +130,12 @@ export class HybridSearchService {
// Semantic mode: skip FTS entirely and use vector search only.
if (mode === 'semantic') {
if (!this.embeddingProvider || !query.trim()) {
return [];
return { results: [], searchModeUsed: 'semantic' };
}
const embeddings = await this.embeddingProvider.embed([query]);
if (embeddings.length === 0) {
return [];
return { results: [], searchModeUsed: 'semantic' };
}
const queryEmbedding = embeddings[0].values;
@@ -144,7 +147,10 @@ export class HybridSearchService {
});
const topIds = vectorResults.slice(0, limit).map((r) => r.snippetId);
return this.fetchSnippetsByIds(topIds, options.repositoryId, options.type);
return {
results: this.fetchSnippetsByIds(topIds, options.repositoryId, options.type),
searchModeUsed: 'semantic'
};
}
// FTS5 mode (keyword) or hybrid/auto modes: try FTS first.
@@ -157,7 +163,7 @@ export class HybridSearchService {
// Degenerate cases: no provider or pure FTS5 mode.
if (!this.embeddingProvider || alpha === 0) {
return ftsResults.slice(0, limit);
return { results: ftsResults.slice(0, limit), searchModeUsed: 'keyword' };
}
// For auto/hybrid modes: if FTS yielded results, use them; otherwise try vector.
@@ -168,14 +174,14 @@ export class HybridSearchService {
// No FTS results: try vector search as a fallback in auto/hybrid modes.
if (!query.trim()) {
// Query is empty; no point embedding it.
return [];
return { results: [], searchModeUsed: 'keyword_fallback' };
}
const embeddings = await this.embeddingProvider.embed([query]);
// If provider fails (Noop returns empty array), we're done.
if (embeddings.length === 0) {
return [];
return { results: [], searchModeUsed: 'keyword_fallback' };
}
const queryEmbedding = embeddings[0].values;
@@ -187,7 +193,10 @@ export class HybridSearchService {
});
const topIds = vectorResults.slice(0, limit).map((r) => r.snippetId);
return this.fetchSnippetsByIds(topIds, options.repositoryId, options.type);
return {
results: this.fetchSnippetsByIds(topIds, options.repositoryId, options.type),
searchModeUsed: 'keyword_fallback'
};
}
// FTS has results: use RRF to blend with vector search (if alpha < 1).
@@ -195,7 +204,7 @@ export class HybridSearchService {
// Provider may be a Noop (returns empty array) — fall back to FTS gracefully.
if (embeddings.length === 0) {
return ftsResults.slice(0, limit);
return { results: ftsResults.slice(0, limit), searchModeUsed: 'keyword' };
}
const queryEmbedding = embeddings[0].values;
@@ -210,7 +219,10 @@ export class HybridSearchService {
// Pure vector mode: skip RRF and return vector results directly.
if (alpha === 1) {
const topIds = vectorResults.slice(0, limit).map((r) => r.snippetId);
return this.fetchSnippetsByIds(topIds, options.repositoryId, options.type);
return {
results: this.fetchSnippetsByIds(topIds, options.repositoryId, options.type),
searchModeUsed: 'semantic'
};
}
// Build ranked lists for RRF. Score field is unused by RRF — only
@@ -221,7 +233,10 @@ export class HybridSearchService {
const fused = reciprocalRankFusion(ftsRanked, vecRanked);
const topIds = fused.slice(0, limit).map((r) => r.id);
return this.fetchSnippetsByIds(topIds, options.repositoryId, options.type);
return {
results: this.fetchSnippetsByIds(topIds, options.repositoryId, options.type),
searchModeUsed: 'hybrid'
};
}
// -------------------------------------------------------------------------

View File

@@ -55,6 +55,7 @@ function createTestDb(): Database.Database {
const migration0 = readFileSync(join(migrationsFolder, '0000_large_master_chief.sql'), 'utf-8');
const migration1 = readFileSync(join(migrationsFolder, '0001_quick_nighthawk.sql'), 'utf-8');
const migration2 = readFileSync(join(migrationsFolder, '0002_silky_stellaris.sql'), 'utf-8');
const migration3 = readFileSync(join(migrationsFolder, '0003_multiversion_config.sql'), 'utf-8');
// Apply first migration
const statements0 = migration0
@@ -85,6 +86,15 @@ function createTestDb(): Database.Database {
client.exec(statement);
}
const statements3 = migration3
.split('--> statement-breakpoint')
.map((statement) => statement.trim())
.filter(Boolean);
for (const statement of statements3) {
client.exec(statement);
}
client.exec(readFileSync(ftsFile, 'utf-8'));
return client;
@@ -436,7 +446,11 @@ describe('API contract integration', () => {
const repositoryId = seedRepo(db);
const versionId = seedVersion(db, repositoryId, 'v18.3.0');
const documentId = seedDocument(db, repositoryId, versionId);
seedRules(db, repositoryId, ['Prefer hooks over classes']);
// Insert version-specific rules (versioned queries no longer inherit the NULL row).
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, ?, ?, ?)`
).run(repositoryId, versionId, JSON.stringify(['Prefer hooks over classes']), NOW_S);
seedSnippet(db, {
documentId,
repositoryId,
@@ -486,4 +500,198 @@ describe('API contract integration', () => {
isLocal: false
});
});
it('GET /api/v1/context returns only version-specific rules for versioned queries (no NULL row contamination)', async () => {
const repositoryId = seedRepo(db);
const versionId = seedVersion(db, repositoryId, 'v2.0.0');
const documentId = seedDocument(db, repositoryId, versionId);
// Insert repo-wide rules (version_id IS NULL) — these must NOT appear in versioned queries.
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, NULL, ?, ?)`
).run(repositoryId, JSON.stringify(['Repo-wide rule']), NOW_S);
// Insert version-specific rules.
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, ?, ?, ?)`
).run(repositoryId, versionId, JSON.stringify(['Version-specific rule']), NOW_S);
seedSnippet(db, {
documentId,
repositoryId,
versionId,
content: 'some versioned content'
});
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(`${repositoryId}/v2.0.0`)}&query=${encodeURIComponent('versioned content')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
// Only the version-specific rule should appear — NULL row must not contaminate.
expect(body.rules).toEqual(['Version-specific rule']);
});
it('GET /api/v1/context returns only repo-wide rules when no version is requested', async () => {
const repositoryId = seedRepo(db);
const documentId = seedDocument(db, repositoryId);
// Insert repo-wide rules (version_id IS NULL).
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, NULL, ?, ?)`
).run(repositoryId, JSON.stringify(['Repo-wide rule only']), NOW_S);
seedSnippet(db, { documentId, repositoryId, content: 'some content' });
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(repositoryId)}&query=${encodeURIComponent('some content')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
expect(body.rules).toEqual(['Repo-wide rule only']);
});
it('GET /api/v1/context versioned query returns only the version-specific rules row', async () => {
const repositoryId = seedRepo(db);
const versionId = seedVersion(db, repositoryId, 'v3.0.0');
const documentId = seedDocument(db, repositoryId, versionId);
const sharedRule = 'Use TypeScript strict mode';
// Insert repo-wide NULL row — must NOT bleed into versioned query results.
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, NULL, ?, ?)`
).run(repositoryId, JSON.stringify([sharedRule]), NOW_S);
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, ?, ?, ?)`
).run(repositoryId, versionId, JSON.stringify([sharedRule, 'Version-only rule']), NOW_S);
seedSnippet(db, { documentId, repositoryId, versionId, content: 'dedup test content' });
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(`${repositoryId}/v3.0.0`)}&query=${encodeURIComponent('dedup test')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
// Returns only the version-specific row as stored — no NULL row merge.
expect(body.rules).toEqual([sharedRule, 'Version-only rule']);
});
it('GET /api/v1/context versioned query returns empty rules when only NULL row exists (no NULL contamination)', async () => {
const repositoryId = seedRepo(db);
const versionId = seedVersion(db, repositoryId, 'v1.0.0');
const documentId = seedDocument(db, repositoryId, versionId);
// Only a repo-wide NULL row exists — no version-specific config.
db.prepare(
`INSERT INTO repository_configs (repository_id, version_id, rules, updated_at)
VALUES (?, NULL, ?, ?)`
).run(repositoryId, JSON.stringify(['HEAD rules that must not contaminate v1']), NOW_S);
seedSnippet(db, { documentId, repositoryId, versionId, content: 'v1 content' });
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(`${repositoryId}/v1.0.0`)}&query=${encodeURIComponent('v1 content')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
// No version-specific config row → empty rules. NULL row must not bleed in.
expect(body.rules).toEqual([]);
});
it('GET /api/v1/context returns 404 with VERSION_NOT_FOUND when version does not exist', async () => {
const repositoryId = seedRepo(db);
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(`${repositoryId}/v99.0.0`)}&query=${encodeURIComponent('foo')}`
)
} as never);
expect(response.status).toBe(404);
const body = await response.json();
expect(body.code).toBe('VERSION_NOT_FOUND');
});
it('GET /api/v1/context resolves a version by full commit SHA', async () => {
const repositoryId = seedRepo(db);
const fullSha = 'a'.repeat(40);
// Insert version with a commit_hash
db.prepare(
`INSERT INTO repository_versions
(id, repository_id, tag, commit_hash, state, total_snippets, indexed_at, created_at)
VALUES (?, ?, ?, ?, 'indexed', 0, ?, ?)`
).run(`${repositoryId}/v2.0.0`, repositoryId, 'v2.0.0', fullSha, NOW_S, NOW_S);
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(`${repositoryId}/${fullSha}`)}&query=${encodeURIComponent('anything')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
expect(body.version?.resolved).toBe('v2.0.0');
});
it('GET /api/v1/context resolves a version by short SHA prefix (8 chars)', async () => {
const repositoryId = seedRepo(db);
const fullSha = 'b1c2d3e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b9c0';
const shortSha = fullSha.slice(0, 8);
db.prepare(
`INSERT INTO repository_versions
(id, repository_id, tag, commit_hash, state, total_snippets, indexed_at, created_at)
VALUES (?, ?, ?, ?, 'indexed', 0, ?, ?)`
).run(`${repositoryId}/v3.0.0`, repositoryId, 'v3.0.0', fullSha, NOW_S, NOW_S);
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(`${repositoryId}/${shortSha}`)}&query=${encodeURIComponent('anything')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
expect(body.version?.resolved).toBe('v3.0.0');
});
it('GET /api/v1/context includes searchModeUsed in JSON response', async () => {
const repositoryId = seedRepo(db);
const documentId = seedDocument(db, repositoryId);
seedSnippet(db, {
documentId,
repositoryId,
content: 'search mode used test snippet'
});
const response = await getContext({
url: new URL(
`http://test/api/v1/context?libraryId=${encodeURIComponent(repositoryId)}&query=${encodeURIComponent('search mode used')}`
)
} as never);
expect(response.status).toBe(200);
const body = await response.json();
expect(body.searchModeUsed).toBeDefined();
expect(['keyword', 'semantic', 'hybrid', 'keyword_fallback']).toContain(body.searchModeUsed);
});
});

View File

@@ -54,24 +54,42 @@ interface RawRepoConfig {
rules: string | null;
}
function getRules(db: ReturnType<typeof getClient>, repositoryId: string): string[] {
const row = db
.prepare<
[string],
RawRepoConfig
>(`SELECT rules FROM repository_configs WHERE repository_id = ?`)
.get(repositoryId);
if (!row?.rules) return [];
function parseRulesJson(raw: string | null | undefined): string[] {
if (!raw) return [];
try {
const parsed = JSON.parse(row.rules);
const parsed = JSON.parse(raw);
return Array.isArray(parsed) ? (parsed as string[]) : [];
} catch {
return [];
}
}
function getRules(
db: ReturnType<typeof getClient>,
repositoryId: string,
versionId?: string
): string[] {
if (!versionId) {
// Unversioned query: return repo-wide (HEAD) rules only.
const row = db
.prepare<
[string],
RawRepoConfig
>(`SELECT rules FROM repository_configs WHERE repository_id = ? AND version_id IS NULL`)
.get(repositoryId);
return parseRulesJson(row?.rules);
}
// Versioned query: return only version-specific rules (no NULL row merge).
const row = db
.prepare<
[string, string],
RawRepoConfig
>(`SELECT rules FROM repository_configs WHERE repository_id = ? AND version_id = ?`)
.get(repositoryId, versionId);
return parseRulesJson(row?.rules);
}
interface RawRepoState {
state: 'pending' | 'indexing' | 'indexed' | 'error';
id: string;
@@ -198,6 +216,7 @@ export const GET: RequestHandler = async ({ url }) => {
let versionId: string | undefined;
let resolvedVersion: RawVersionRow | undefined;
if (parsed.version) {
// Try exact tag match first.
resolvedVersion = db
.prepare<
[string, string],
@@ -205,12 +224,33 @@ export const GET: RequestHandler = async ({ url }) => {
>(`SELECT id, tag FROM repository_versions WHERE repository_id = ? AND tag = ?`)
.get(parsed.repositoryId, parsed.version);
// Version not found is not fatal — fall back to default branch.
versionId = resolvedVersion?.id;
// Fall back to commit hash prefix match (min 7 chars).
if (!resolvedVersion && parsed.version.length >= 7) {
resolvedVersion = db
.prepare<
[string, string],
RawVersionRow
>(
`SELECT id, tag FROM repository_versions
WHERE repository_id = ? AND commit_hash LIKE ?`
)
.get(parsed.repositoryId, `${parsed.version}%`);
}
if (!resolvedVersion) {
return new Response(
JSON.stringify({
error: `Version ${parsed.version} not found for library ${parsed.repositoryId}`,
code: 'VERSION_NOT_FOUND'
}),
{ status: 404, headers: { 'Content-Type': 'application/json', ...CORS_HEADERS } }
);
}
versionId = resolvedVersion.id;
}
// Execute hybrid search (falls back to FTS5 when no embedding provider is set).
const searchResults = await hybridService.search(query, {
const { results: searchResults, searchModeUsed } = await hybridService.search(query, {
repositoryId: parsed.repositoryId,
versionId,
limit: 50, // fetch more than needed; token budget will trim
@@ -242,6 +282,7 @@ export const GET: RequestHandler = async ({ url }) => {
const metadata: ContextResponseMetadata = {
localSource: repo.source === 'local',
resultCount: selectedResults.length,
searchModeUsed,
repository: {
id: repo.id,
title: repo.title,
@@ -260,8 +301,8 @@ export const GET: RequestHandler = async ({ url }) => {
snippetVersions
};
// Load rules from repository_configs.
const rules = getRules(db, parsed.repositoryId);
// Load rules from repository_configs (repo-wide + version-specific merged).
const rules = getRules(db, parsed.repositoryId, versionId);
if (responseType === 'txt') {
const text = formatContextTxt(selectedResults, rules, metadata);

View File

@@ -52,6 +52,9 @@
let showDiscoverPanel = $state(false);
let registerBusy = $state(false);
// Active version indexing jobs: tag -> jobId
let activeVersionJobs = $state<Record<string, string | undefined>>({});
// Remove confirm
let removeTag = $state<string | null>(null);
@@ -115,6 +118,16 @@
activeJobId = d.job.id;
}
const versionCount = d.versionJobs?.length ?? 0;
if (versionCount > 0) {
let next = { ...activeVersionJobs };
for (const vj of d.versionJobs) {
const matched = versions.find((v) => v.id === vj.versionId);
if (matched) {
next = { ...next, [matched.tag]: vj.id };
}
}
activeVersionJobs = next;
}
successMessage =
versionCount > 0
? `Re-indexing started. Also queued ${versionCount} version job${versionCount === 1 ? '' : 's'}.`
@@ -157,6 +170,10 @@
const d = await res.json();
throw new Error(d.error ?? 'Failed to add version');
}
const d = await res.json();
if (d.job?.id) {
activeVersionJobs = { ...activeVersionJobs, [tag]: d.job.id };
}
addVersionTag = '';
await loadVersions();
} catch (e) {
@@ -177,7 +194,10 @@
const d = await res.json();
throw new Error(d.error ?? 'Failed to queue version indexing');
}
await loadVersions();
const d = await res.json();
if (d.job?.id) {
activeVersionJobs = { ...activeVersionJobs, [tag]: d.job.id };
}
} catch (e) {
errorMessage = (e as Error).message;
}
@@ -244,8 +264,9 @@
registerBusy = true;
errorMessage = null;
try {
await Promise.all(
[...selectedDiscoveredTags].map((tag) =>
const tags = [...selectedDiscoveredTags];
const responses = await Promise.all(
tags.map((tag) =>
fetch(`/api/v1/libs/${encodeURIComponent(repo.id)}/versions`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
@@ -253,6 +274,15 @@
})
)
);
const results = await Promise.all(responses.map((r) => (r.ok ? r.json() : null)));
let next = { ...activeVersionJobs };
for (let i = 0; i < tags.length; i++) {
const result = results[i];
if (result?.job?.id) {
next = { ...next, [tags[i]]: result.job.id };
}
}
activeVersionJobs = next;
showDiscoverPanel = false;
discoveredTags = [];
selectedDiscoveredTags = new Set();
@@ -346,7 +376,13 @@
{#if activeJobId}
<div class="mt-4 rounded-xl border border-blue-100 bg-blue-50 p-4">
<p class="mb-2 text-sm font-medium text-blue-700">Indexing in progress</p>
<IndexingProgress jobId={activeJobId} />
<IndexingProgress
jobId={activeJobId}
oncomplete={() => {
activeJobId = null;
refreshRepo();
}}
/>
</div>
{:else if repo.state === 'error'}
<div class="mt-4 rounded-xl border border-red-100 bg-red-50 p-4">
@@ -461,31 +497,67 @@
{:else}
<div class="divide-y divide-gray-100">
{#each versions as version (version.id)}
<div class="flex items-center justify-between py-2.5">
<div class="flex items-center gap-3">
<span class="font-mono text-sm font-medium text-gray-900">{version.tag}</span>
<span
class="rounded-full px-2 py-0.5 text-xs font-medium {stateColors[version.state] ??
'bg-gray-100 text-gray-600'}"
>
{stateLabels[version.state] ?? version.state}
</span>
</div>
<div class="flex items-center gap-2">
<button
onclick={() => handleIndexVersion(version.tag)}
disabled={version.state === 'indexing'}
class="rounded-lg border border-blue-200 px-3 py-1 text-xs font-medium text-blue-600 hover:bg-blue-50 disabled:cursor-not-allowed disabled:opacity-50"
>
{version.state === 'indexing' ? 'Indexing...' : 'Index'}
</button>
<button
onclick={() => (removeTag = version.tag)}
class="rounded-lg border border-red-100 px-3 py-1 text-xs font-medium text-red-500 hover:bg-red-50"
>
Remove
</button>
<div class="py-2.5">
<div class="flex items-center justify-between">
<div class="flex items-center gap-3">
<span class="font-mono text-sm font-medium text-gray-900">{version.tag}</span>
<span
class="rounded-full px-2 py-0.5 text-xs font-medium {stateColors[version.state] ??
'bg-gray-100 text-gray-600'}"
>
{stateLabels[version.state] ?? version.state}
</span>
</div>
<div class="flex items-center gap-2">
<button
onclick={() => handleIndexVersion(version.tag)}
disabled={version.state === 'indexing' || !!activeVersionJobs[version.tag]}
class="rounded-lg border border-blue-200 px-3 py-1 text-xs font-medium text-blue-600 hover:bg-blue-50 disabled:cursor-not-allowed disabled:opacity-50"
>
{version.state === 'indexing' || !!activeVersionJobs[version.tag] ? 'Indexing...' : 'Index'}
</button>
<button
onclick={() => (removeTag = version.tag)}
class="rounded-lg border border-red-100 px-3 py-1 text-xs font-medium text-red-500 hover:bg-red-50"
>
Remove
</button>
</div>
</div>
{#if version.totalSnippets > 0 || version.commitHash || version.indexedAt}
{@const metaParts = (
[
version.totalSnippets > 0
? { text: `${version.totalSnippets} snippets`, mono: false }
: null,
version.commitHash
? { text: version.commitHash.slice(0, 8), mono: true }
: null,
version.indexedAt
? { text: formatDate(version.indexedAt), mono: false }
: null
] as Array<{ text: string; mono: boolean } | null>
).filter((p): p is { text: string; mono: boolean } => p !== null)}
<div class="mt-1 flex items-center gap-1.5">
{#each metaParts as part, i (i)}
{#if i > 0}
<span class="text-xs text-gray-300">·</span>
{/if}
<span class="text-xs text-gray-400{part.mono ? ' font-mono' : ''}">{part.text}</span>
{/each}
</div>
{/if}
{#if !!activeVersionJobs[version.tag]}
<IndexingProgress
jobId={activeVersionJobs[version.tag]!}
oncomplete={() => {
const { [version.tag]: _, ...rest } = activeVersionJobs;
activeVersionJobs = rest;
loadVersions();
refreshRepo();
}}
/>
{/if}
</div>
{/each}
</div>