trueref/docs/features/TRUEREF-0020.md

# TRUEREF-0020 — Embedding Profiles, Default Local Embeddings, and Version-Scoped Semantic Retrieval

**Priority:** P1
**Status:** Pending
**Depends On:** TRUEREF-0007, TRUEREF-0008, TRUEREF-0009, TRUEREF-0010, TRUEREF-0011, TRUEREF-0012, TRUEREF-0014, TRUEREF-0018
**Blocks:** TRUEREF-0019

---

## Overview

TrueRef already has the main ingredients for embeddings and hybrid search, but the current design is still centered on a single hard-coded provider configuration and does not guarantee version-safe semantic retrieval at query time. This feature formalizes the full provider-registry approach and makes semantic retrieval production-ready for both the REST API and MCP surfaces.

The scope is intentionally narrow:

1. Introduce first-class embedding profiles so custom AI providers can be registered without hard-coding provider names throughout the API, UI, and runtime.
2. Enable embeddings by default using the local `@xenova/transformers` model so a fresh install provides semantic retrieval out of the box.
3. Make semantic and hybrid retrieval version-scoped, so a query for a specific library and version only searches snippets indexed for that exact version.
4. Extend the API and MCP `query-docs` path to use the active embedding profile at query time.

Out of scope:

- semantic repository discovery or reranking for `libs/search`
- inferring the repository from the query text
- adding multi-tenant provider isolation

Consumers are expected to pass an exact library or repository identifier and the needed version when they want version-specific retrieval.

---

## Problem Statement

Current semantic search support has four structural gaps:

1. Query-time semantic retrieval is not reliably wired to the configured provider.
2. The embedding configuration shape is fixed to `openai | local | none`, which does not scale to custom provider adapters.
3. Stored embeddings are keyed too narrowly to support multiple profiles or safe provider migration.
4. The vector search path does not enforce version scoping as strongly as the keyword search path.

That leaves TrueRef in a state where embeddings may be generated at indexing time, but retrieval behavior, provider flexibility, and version guarantees are still weaker than required.

---

## Goals

- Make semantic retrieval work by default on a fresh install.
- Keep the default self-hosted path fully local.
- Support custom AI providers through a provider registry plus profile system.
- Keep the API as the source of truth for retrieval behavior.
- Keep MCP as a thin compatibility layer over the API.
- Guarantee version-scoped hybrid retrieval when a versioned library ID is provided.

---

## Non-Goals

- semantic repository search
- automatic repo selection from free-text intent
- remote provider secrets management beyond current settings persistence model
- support for non-embedding rerankers in this ticket

---

## Default Local Embeddings

Embeddings should be enabled by default with the local model path instead of shipping in FTS-only mode.

### Default Runtime Behavior

- Install `@xenova/transformers` as a normal runtime dependency rather than treating it as optional for the default setup.
- Seed the default embedding profile to the local provider.
- Default model: `Xenova/all-MiniLM-L6-v2`
- Default dimensions: `384`
- New repositories index snippets with embeddings automatically unless the user explicitly disables embeddings.
- Query-time retrieval uses hybrid mode automatically when the active profile is healthy.
- If the local model cannot be loaded, the system should surface a clear startup or settings error instead of silently pretending semantic search is enabled.

### Acceptance Criteria

- [ ] `@xenova/transformers` is installed by default for production/runtime use
- [ ] Fresh installations default to an active local embedding profile
- [ ] No manual provider configuration is required to get semantic search on a clean setup
- [ ] The settings UI shows local embeddings as the default active profile
- [ ] Disabling embeddings remains possible from settings

---

## Embedding Profile Registry

Replace the single enum-style config with a registry-oriented model.

### Core Concepts

#### Provider Adapter

A provider adapter is code registered in the server runtime that knows how to validate config and generate embeddings for one provider kind.

Examples:

- `local-transformers`
- `openai-compatible`
- future custom adapters added in code without redesigning the API contract

#### Embedding Profile

An embedding profile is persisted configuration selecting one provider adapter plus its runtime settings.

```typescript
interface EmbeddingProfile {
  id: string;
  providerKind: string;
  title: string;
  enabled: boolean;
  isDefault: boolean;
  config: Record<string, unknown>;
  model: string;
  dimensions: number;
  createdAt: number;
  updatedAt: number;
}
```

### Registry Responsibilities

- create provider instance from profile
- validate profile config
- expose provider metadata to the settings API and UI
- allow future custom providers without widening TypeScript unions across the app

### Acceptance Criteria

- [ ] Provider selection is no longer hard-coded to `openai | local | none`
- [ ] Providers are instantiated through a registry keyed by `providerKind`
- [ ] Profiles are stored as first-class records rather than a single settings blob
- [ ] One profile can be marked as the default active profile for indexing and retrieval
- [ ] Settings endpoints return profile data and provider metadata cleanly

---

## Data Model Changes

The current `snippet_embeddings` shape is insufficient for multiple profiles because it allows only one embedding row per snippet.

### New Tables / Changes

#### `embedding_profiles`

```typescript
embeddingProfiles {
  id: text('id').primaryKey(),
  providerKind: text('provider_kind').notNull(),
  title: text('title').notNull(),
  enabled: integer('enabled', { mode: 'boolean' }).notNull().default(true),
  isDefault: integer('is_default', { mode: 'boolean' }).notNull().default(false),
  model: text('model').notNull(),
  dimensions: integer('dimensions').notNull(),
  config: text('config', { mode: 'json' }).notNull(),
  createdAt: integer('created_at').notNull(),
  updatedAt: integer('updated_at').notNull(),
}
```

#### `snippet_embeddings`

Add `profile_id` and replace the single-row-per-snippet constraint with a composite key or unique index on `(snippet_id, profile_id)`.

```typescript
snippetEmbeddings {
  snippetId: text('snippet_id').notNull(),
  profileId: text('profile_id').notNull(),
  model: text('model').notNull(),
  dimensions: integer('dimensions').notNull(),
  embedding: blob('embedding').notNull(),
  createdAt: integer('created_at').notNull(),
}
```

### Migration Requirements

- [ ] migration adds `embedding_profiles`
- [ ] migration updates `snippet_embeddings` for profile scoping
- [ ] migration seeds a default local profile using `Xenova/all-MiniLM-L6-v2`
- [ ] migration safely maps existing single-provider configs into one default profile when upgrading

---

## Query-Time Semantic Retrieval

The API must resolve the active embedding profile at request time instead of baking provider selection into startup-only flows.

### API Behavior

`GET /api/v1/context`

- keeps `libraryId`, `query`, `tokens`, and `type`
- adds optional `searchMode=auto|keyword|semantic|hybrid`
- adds optional `alpha` for hybrid blending
- uses the default active embedding profile when `searchMode` is `auto`, `semantic`, or `hybrid`
- falls back to keyword mode only when embeddings are disabled or the caller explicitly requests keyword mode

### Version-Scoped Retrieval Rules

- when `libraryId` includes a version, both FTS and vector retrieval must filter to the resolved `versionId`
- re-fetching snippets after ranking must also preserve `versionId`
- default-branch snippets must not bleed into versioned queries
- one version's embeddings must not be compared against another version's snippets for the same repository

### Acceptance Criteria

- [ ] `/api/v1/context` loads the active embedding profile at request time
- [ ] hybrid retrieval works without restarting the server after profile changes
- [ ] `searchMode` is supported for context queries
- [ ] versioned `libraryId` queries enforce version filters in both FTS and vector phases
- [ ] JSON responses can include retrieval metadata such as mode, profile ID, model, and alpha

---

## MCP Surface

MCP should stay thin and inherit semantic behavior from the API.

### `query-docs`

Extend the MCP tool schema to support:

- `searchMode?: 'auto' | 'keyword' | 'semantic' | 'hybrid'`
- `alpha?: number`

The MCP server should forward these options directly to `/api/v1/context`.

### Explicitly Out of Scope

- semantic reranking for `resolve-library-id`
- automatic library detection from the query text

### Acceptance Criteria

- [ ] MCP `query-docs` supports the same retrieval mode controls as the API
- [ ] MCP stdio and HTTP transports both preserve the new options
- [ ] MCP remains backward compatible when the new fields are omitted

---

## Settings and Profile Management

The existing settings page must evolve from a single provider switcher into profile management for the supported provider kinds.

### Required UX Changes

- show the default local profile as the initial active profile
- allow enabling/disabling embeddings globally
- allow creating additional custom profiles for supported provider adapters
- allow selecting exactly one default profile
- show provider health and profile test results
- warn when changing the default profile requires re-embedding to preserve semantic quality

### Acceptance Criteria

- [ ] `/settings` supports profile-based embedding configuration
- [ ] users can create an `openai-compatible` custom profile with arbitrary base URL and model
- [ ] the local default profile is visible and editable
- [ ] switching the default profile triggers a re-embedding workflow or explicit warning state

---

## Indexing and Re-Embedding

Indexing must embed snippets against the default active profile, and profile changes must be operationally explicit.

### Required Behavior

- new indexing jobs use the current default profile
- re-indexing stores embeddings under that profile ID
- changing the default profile does not silently reuse embeddings from another profile
- if a profile is changed in a way that invalidates stored embeddings, affected repositories must be marked as needing re-embedding or re-indexing

### Acceptance Criteria

- [ ] indexing records which profile produced each embedding row
- [ ] re-embedding can be triggered after default-profile changes
- [ ] no cross-profile embedding reuse occurs

---

## Test Coverage

- [ ] migration tests for `embedding_profiles` and `snippet_embeddings`
- [ ] unit tests for provider registry resolution
- [ ] unit tests for version-scoped vector search
- [ ] unit tests for hybrid retrieval with explicit `searchMode`
- [ ] API tests covering default local profile behavior on fresh setup
- [ ] MCP tests covering `query-docs` semantic and hybrid forwarding

---

## Files to Modify

- `package.json` — install `@xenova/transformers` as a runtime dependency
- `src/lib/server/db/schema.ts`
- `src/lib/server/db/migrations/*`
- `src/lib/server/embeddings/provider.ts`
- `src/lib/server/embeddings/local.provider.ts`
- `src/lib/server/embeddings/openai.provider.ts`
- `src/lib/server/embeddings/factory.ts` or replacement registry module
- `src/lib/server/embeddings/embedding.service.ts`
- `src/lib/server/search/vector.search.ts`
- `src/lib/server/search/hybrid.search.service.ts`
- `src/routes/api/v1/context/+server.ts`
- `src/routes/api/v1/settings/embedding/+server.ts`
- `src/routes/api/v1/settings/embedding/test/+server.ts`
- `src/routes/settings/+page.svelte`
- `src/mcp/client.ts`
- `src/mcp/tools/query-docs.ts`
- `src/mcp/index.ts`

---

## Notes

This ticket intentionally leaves `libs/search` as keyword-only. The caller is expected to identify the target library and, when needed, pass a version-qualified library ID such as `/owner/repo/v1.2.3` before requesting semantic retrieval.