Files
trueref/docs/features/TRUEREF-0020.md
2026-03-27 02:23:01 +01:00

12 KiB

TRUEREF-0020 — Embedding Profiles, Default Local Embeddings, and Version-Scoped Semantic Retrieval

Priority: P1 Status: Pending Depends On: TRUEREF-0007, TRUEREF-0008, TRUEREF-0009, TRUEREF-0010, TRUEREF-0011, TRUEREF-0012, TRUEREF-0014, TRUEREF-0018 Blocks: TRUEREF-0019


Overview

TrueRef already has the main ingredients for embeddings and hybrid search, but the current design is still centered on a single hard-coded provider configuration and does not guarantee version-safe semantic retrieval at query time. This feature formalizes the full provider-registry approach and makes semantic retrieval production-ready for both the REST API and MCP surfaces.

The scope is intentionally narrow:

  1. Introduce first-class embedding profiles so custom AI providers can be registered without hard-coding provider names throughout the API, UI, and runtime.
  2. Enable embeddings by default using the local @xenova/transformers model so a fresh install provides semantic retrieval out of the box.
  3. Make semantic and hybrid retrieval version-scoped, so a query for a specific library and version only searches snippets indexed for that exact version.
  4. Extend the API and MCP query-docs path to use the active embedding profile at query time.

Out of scope:

  • semantic repository discovery or reranking for libs/search
  • inferring the repository from the query text
  • adding multi-tenant provider isolation

Consumers are expected to pass an exact library or repository identifier and the needed version when they want version-specific retrieval.


Problem Statement

Current semantic search support has four structural gaps:

  1. Query-time semantic retrieval is not reliably wired to the configured provider.
  2. The embedding configuration shape is fixed to openai | local | none, which does not scale to custom provider adapters.
  3. Stored embeddings are keyed too narrowly to support multiple profiles or safe provider migration.
  4. The vector search path does not enforce version scoping as strongly as the keyword search path.

That leaves TrueRef in a state where embeddings may be generated at indexing time, but retrieval behavior, provider flexibility, and version guarantees are still weaker than required.


Goals

  • Make semantic retrieval work by default on a fresh install.
  • Keep the default self-hosted path fully local.
  • Support custom AI providers through a provider registry plus profile system.
  • Keep the API as the source of truth for retrieval behavior.
  • Keep MCP as a thin compatibility layer over the API.
  • Guarantee version-scoped hybrid retrieval when a versioned library ID is provided.

Non-Goals

  • semantic repository search
  • automatic repo selection from free-text intent
  • remote provider secrets management beyond current settings persistence model
  • support for non-embedding rerankers in this ticket

Default Local Embeddings

Embeddings should be enabled by default with the local model path instead of shipping in FTS-only mode.

Default Runtime Behavior

  • Install @xenova/transformers as a normal runtime dependency rather than treating it as optional for the default setup.
  • Seed the default embedding profile to the local provider.
  • Default model: Xenova/all-MiniLM-L6-v2
  • Default dimensions: 384
  • New repositories index snippets with embeddings automatically unless the user explicitly disables embeddings.
  • Query-time retrieval uses hybrid mode automatically when the active profile is healthy.
  • If the local model cannot be loaded, the system should surface a clear startup or settings error instead of silently pretending semantic search is enabled.

Acceptance Criteria

  • @xenova/transformers is installed by default for production/runtime use
  • Fresh installations default to an active local embedding profile
  • No manual provider configuration is required to get semantic search on a clean setup
  • The settings UI shows local embeddings as the default active profile
  • Disabling embeddings remains possible from settings

Embedding Profile Registry

Replace the single enum-style config with a registry-oriented model.

Core Concepts

Provider Adapter

A provider adapter is code registered in the server runtime that knows how to validate config and generate embeddings for one provider kind.

Examples:

  • local-transformers
  • openai-compatible
  • future custom adapters added in code without redesigning the API contract

Embedding Profile

An embedding profile is persisted configuration selecting one provider adapter plus its runtime settings.

interface EmbeddingProfile {
	id: string;
	providerKind: string;
	title: string;
	enabled: boolean;
	isDefault: boolean;
	config: Record<string, unknown>;
	model: string;
	dimensions: number;
	createdAt: number;
	updatedAt: number;
}

Registry Responsibilities

  • create provider instance from profile
  • validate profile config
  • expose provider metadata to the settings API and UI
  • allow future custom providers without widening TypeScript unions across the app

Acceptance Criteria

  • Provider selection is no longer hard-coded to openai | local | none
  • Providers are instantiated through a registry keyed by providerKind
  • Profiles are stored as first-class records rather than a single settings blob
  • One profile can be marked as the default active profile for indexing and retrieval
  • Settings endpoints return profile data and provider metadata cleanly

Data Model Changes

The current snippet_embeddings shape is insufficient for multiple profiles because it allows only one embedding row per snippet.

New Tables / Changes

embedding_profiles

embeddingProfiles {
  id: text('id').primaryKey(),
  providerKind: text('provider_kind').notNull(),
  title: text('title').notNull(),
  enabled: integer('enabled', { mode: 'boolean' }).notNull().default(true),
  isDefault: integer('is_default', { mode: 'boolean' }).notNull().default(false),
  model: text('model').notNull(),
  dimensions: integer('dimensions').notNull(),
  config: text('config', { mode: 'json' }).notNull(),
  createdAt: integer('created_at').notNull(),
  updatedAt: integer('updated_at').notNull(),
}

snippet_embeddings

Add profile_id and replace the single-row-per-snippet constraint with a composite key or unique index on (snippet_id, profile_id).

snippetEmbeddings {
  snippetId: text('snippet_id').notNull(),
  profileId: text('profile_id').notNull(),
  model: text('model').notNull(),
  dimensions: integer('dimensions').notNull(),
  embedding: blob('embedding').notNull(),
  createdAt: integer('created_at').notNull(),
}

Migration Requirements

  • migration adds embedding_profiles
  • migration updates snippet_embeddings for profile scoping
  • migration seeds a default local profile using Xenova/all-MiniLM-L6-v2
  • migration safely maps existing single-provider configs into one default profile when upgrading

Query-Time Semantic Retrieval

The API must resolve the active embedding profile at request time instead of baking provider selection into startup-only flows.

API Behavior

GET /api/v1/context

  • keeps libraryId, query, tokens, and type
  • adds optional searchMode=auto|keyword|semantic|hybrid
  • adds optional alpha for hybrid blending
  • uses the default active embedding profile when searchMode is auto, semantic, or hybrid
  • falls back to keyword mode only when embeddings are disabled or the caller explicitly requests keyword mode

Version-Scoped Retrieval Rules

  • when libraryId includes a version, both FTS and vector retrieval must filter to the resolved versionId
  • re-fetching snippets after ranking must also preserve versionId
  • default-branch snippets must not bleed into versioned queries
  • one version's embeddings must not be compared against another version's snippets for the same repository

Acceptance Criteria

  • /api/v1/context loads the active embedding profile at request time
  • hybrid retrieval works without restarting the server after profile changes
  • searchMode is supported for context queries
  • versioned libraryId queries enforce version filters in both FTS and vector phases
  • JSON responses can include retrieval metadata such as mode, profile ID, model, and alpha

MCP Surface

MCP should stay thin and inherit semantic behavior from the API.

query-docs

Extend the MCP tool schema to support:

  • searchMode?: 'auto' | 'keyword' | 'semantic' | 'hybrid'
  • alpha?: number

The MCP server should forward these options directly to /api/v1/context.

Explicitly Out of Scope

  • semantic reranking for resolve-library-id
  • automatic library detection from the query text

Acceptance Criteria

  • MCP query-docs supports the same retrieval mode controls as the API
  • MCP stdio and HTTP transports both preserve the new options
  • MCP remains backward compatible when the new fields are omitted

Settings and Profile Management

The existing settings page must evolve from a single provider switcher into profile management for the supported provider kinds.

Required UX Changes

  • show the default local profile as the initial active profile
  • allow enabling/disabling embeddings globally
  • allow creating additional custom profiles for supported provider adapters
  • allow selecting exactly one default profile
  • show provider health and profile test results
  • warn when changing the default profile requires re-embedding to preserve semantic quality

Acceptance Criteria

  • /settings supports profile-based embedding configuration
  • users can create an openai-compatible custom profile with arbitrary base URL and model
  • the local default profile is visible and editable
  • switching the default profile triggers a re-embedding workflow or explicit warning state

Indexing and Re-Embedding

Indexing must embed snippets against the default active profile, and profile changes must be operationally explicit.

Required Behavior

  • new indexing jobs use the current default profile
  • re-indexing stores embeddings under that profile ID
  • changing the default profile does not silently reuse embeddings from another profile
  • if a profile is changed in a way that invalidates stored embeddings, affected repositories must be marked as needing re-embedding or re-indexing

Acceptance Criteria

  • indexing records which profile produced each embedding row
  • re-embedding can be triggered after default-profile changes
  • no cross-profile embedding reuse occurs

Test Coverage

  • migration tests for embedding_profiles and snippet_embeddings
  • unit tests for provider registry resolution
  • unit tests for version-scoped vector search
  • unit tests for hybrid retrieval with explicit searchMode
  • API tests covering default local profile behavior on fresh setup
  • MCP tests covering query-docs semantic and hybrid forwarding

Files to Modify

  • package.json — install @xenova/transformers as a runtime dependency
  • src/lib/server/db/schema.ts
  • src/lib/server/db/migrations/*
  • src/lib/server/embeddings/provider.ts
  • src/lib/server/embeddings/local.provider.ts
  • src/lib/server/embeddings/openai.provider.ts
  • src/lib/server/embeddings/factory.ts or replacement registry module
  • src/lib/server/embeddings/embedding.service.ts
  • src/lib/server/search/vector.search.ts
  • src/lib/server/search/hybrid.search.service.ts
  • src/routes/api/v1/context/+server.ts
  • src/routes/api/v1/settings/embedding/+server.ts
  • src/routes/api/v1/settings/embedding/test/+server.ts
  • src/routes/settings/+page.svelte
  • src/mcp/client.ts
  • src/mcp/tools/query-docs.ts
  • src/mcp/index.ts

Notes

This ticket intentionally leaves libs/search as keyword-only. The caller is expected to identify the target library and, when needed, pass a version-qualified library ID such as /owner/repo/v1.2.3 before requesting semantic retrieval.