13 KiB
TRUEREF-0019 — Git-Native Version Indexing and Corporate Deployment Support
Priority: P1 Status: Pending Depends On: TRUEREF-0001, TRUEREF-0002, TRUEREF-0020 Blocks: —
Overview
TrueRef is intended for corporate environments where developers work with private repositories hosted on Bitbucket Server/Data Center and self-hosted GitLab instances. This feature covers two tightly related concerns:
- Git-native version indexing — resolve version tags to specific commit hashes, extract file trees per commit using
git archive, and store indexed snippets against an exact commit rather than a floating label. - Corporate deployment support — run TrueRef inside Docker with access to private remotes, per-host credentials, and corporate CA certificates without requiring changes to the host machine beyond environment variables and file mounts.
Together these allow a team to deploy TrueRef once, point it at their internal repositories, and have it return version-accurate documentation to LLM assistants.
This ticket depends on TRUEREF-0020 so that version-targeted retrieval remains semantically correct after version indexing is made commit-accurate. Without version-scoped hybrid retrieval, semantic results can still leak across versions even if version metadata is stored correctly.
Part 1: Git-Native Version Indexing
Background
Currently, versions are registered manually with a tag and a title. There is no link between a version tag and a specific git commit, so re-indexing the same version after a branch has moved produces different results silently. For documentation retrieval to be trustworthy, each indexed version must be pinned to an immutable commit hash.
Version → Commit Hash Resolution
Git tags are the canonical mapping. Both lightweight and annotated tags must be handled:
# Resolves any tag, branch, or ref to the underlying commit hash
git -C /path/to/repo rev-parse <tag>^{commit}
The ^{commit} peel dereferences annotated tag objects to the commit they point at.
For GitHub-hosted repositories the GitHub REST API provides the same mapping without requiring a local clone:
GET /repos/{owner}/{repo}/git/refs/tags/{tag}
If the returned object type is tag (annotated), a second request is required to dereference it:
GET /repos/{owner}/{repo}/git/tags/{sha}
→ { object: { sha: "<commit-sha>", type: "commit" } }
Schema Change
Add commit_hash to the versions table:
ALTER TABLE versions ADD COLUMN commit_hash TEXT;
Populated at version registration time. Re-indexing a version always uses the stored commit hash, never re-resolves the tag (tags can be moved; the stored hash is immutable).
Tag Auto-Discovery
When a repository is added or re-fetched, TrueRef should discover all version tags automatically and propose them to the user rather than requiring manual entry:
git -C /path/to/repo tag -l
# → ["v1.0.0", "v1.1.0", "v2.0.0", ...]
For each tag, resolve to a commit hash and pre-populate the versions table. The user can then select which versions to index from the UI.
Per-Version File Extraction via git archive
git archive extracts a clean file tree from any commit without disturbing the working directory or requiring a separate clone:
git -C /path/to/repo archive <commit-hash> | tar -x -C /tmp/trueref-idx/<repo>-<tag>/
Advantages over git checkout or worktrees:
- Working directory is completely untouched
- No
.gitdirectory in the output (cleaner for parsing) - Temp directory deleted after indexing with no git state to clean up
- Multiple versions can be extracted in parallel
The indexing pipeline receives the extracted temp directory as its source path — identical to the existing local crawler interface.
trigger index v2.0.0
→ look up commit_hash = "a3f9c12"
→ git archive a3f9c12 | tar -x /tmp/trueref/repo-v2.0.0/
→ run crawler on /tmp/trueref/repo-v2.0.0/
→ store snippets { repo_id, version_tag, commit_hash }
→ rm -rf /tmp/trueref/repo-v2.0.0/
trueref.json Explicit Override
Allow commit hashes to be pinned explicitly per version, overriding tag resolution. Useful when tags are mutable, versioning is non-standard, or a specific patch within a release must be targeted:
{
"previousVersions": [
{
"tag": "v2.0.0",
"title": "Version 2.0.0",
"commitHash": "a3f9c12abc..."
}
]
}
Edge Cases
| Case | Handling |
|---|---|
| Annotated tags | rev-parse <tag>^{commit} peels to commit automatically |
Mutable tags (e.g. latest) |
Re-resolve on re-index; warn in UI if hash has changed |
| Branch as version | rev-parse origin/<branch>^{commit} gives tip; re-resolves on re-index |
| Shallow clone | Run git fetch --unshallow before git archive if commit is unavailable |
| Submodules | git archive --recurse-submodules or document as a known limitation |
| Git LFS | git lfs pull required after archive if LFS-tracked files are needed for indexing |
Acceptance Criteria
versionstable has acommit_hashcolumn with a migrationresolveTagToCommit(repoPath, tag): stringutility usingchild_process- Tag auto-discovery runs on repo registration and on manual re-fetch
- Discovered tags appear in the UI with their resolved commit hashes
- Indexing a version extracts files via
git archiveto a temp directory - Temp directory is deleted after indexing completes (success or failure)
- Snippets are stored with both
version_tagandcommit_hash trueref.jsoncommitHashfield overrides tag resolution when present- Re-indexing a version with a moved tag warns the user if the hash has changed
Part 2: Corporate Deployment Support
Background
Corporate environments introduce three constraints not present in standard cloud setups:
- Multiple private git remotes — typically Bitbucket Server/Data Center and self-hosted GitLab on separate hostnames, each requiring independent credentials.
- Credential management on Windows — Git Credential Manager stores credentials in the Windows Credential Manager (DPAPI-encrypted), which is inaccessible from inside a Linux container. Personal Access Tokens passed as environment variables are the correct substitute.
- Custom CA certificates — on-premise servers use certificates issued by corporate CAs not trusted by default in Linux container images. All git operations, HTTP requests, and Node.js
fetchcalls fail until the CA is registered at the OS level.
Multi-Remote Credential Configuration
Two environment variables carry the HTTPS tokens, one per remote. Git's per-host credential scoping routes each token to the correct server:
# In docker-entrypoint.sh, before any git operation
if [ -n "$GIT_TOKEN_BITBUCKET" ] && [ -n "$BITBUCKET_HOST" ]; then
git config --global \
"credential.https://${BITBUCKET_HOST}.helper" \
"!f() { echo username=x-token-auth; echo password=\$GIT_TOKEN_BITBUCKET; }; f"
fi
if [ -n "$GIT_TOKEN_GITLAB" ] && [ -n "$GITLAB_HOST" ]; then
git config --global \
"credential.https://${GITLAB_HOST}.helper" \
"!f() { echo username=oauth2; echo password=\$GIT_TOKEN_GITLAB; }; f"
fi
Username conventions by server type:
| Server | HTTPS username | Password |
|---|---|---|
| Bitbucket Server / Data Center | x-token-auth |
HTTP access token |
| Bitbucket Cloud | account username | App password |
| GitLab (self-hosted or cloud) | oauth2 |
Personal access token |
| GitLab deploy token | gitlab-deploy-token |
Deploy token secret |
SSH authentication is also supported and preferred for long-lived deployments. The host SSH configuration (~/.ssh/config) handles per-host key selection and travels into the container via volume mount.
Corporate CA Certificate Handling
The CA certificate is mounted as a read-only file at a staging path. The entrypoint detects the encoding (PEM or DER) and installs it at the system level before any other operation runs:
if [ -f /certs/corp-ca.crt ]; then
if openssl x509 -inform PEM -in /certs/corp-ca.crt -noout 2>/dev/null; then
cp /certs/corp-ca.crt /usr/local/share/ca-certificates/corp-ca.crt
else
openssl x509 -inform DER -in /certs/corp-ca.crt \
-out /usr/local/share/ca-certificates/corp-ca.crt
fi
update-ca-certificates 2>/dev/null
fi
Installing at the OS level means git, curl, and Node.js fetch all trust the certificate without per-tool flags. GIT_SSL_NO_VERIFY is explicitly not used.
openssl must be added to the production image:
RUN apk add --no-cache openssl
SSH Key Permissions
Windows NTFS does not track Unix permissions. SSH keys mounted from %USERPROFILE%\.ssh arrive with world-readable permissions that the SSH client rejects. The entrypoint corrects this before any git operation:
if [ -d /root/.ssh ]; then
chmod 700 /root/.ssh
chmod 600 /root/.ssh/* 2>/dev/null || true
fi
docker-compose.yml
services:
web:
build: .
ports:
- '3000:3000'
volumes:
- trueref-data:/data
- ${USERPROFILE}/.ssh:/root/.ssh:ro
- ${USERPROFILE}/.gitconfig:/root/.gitconfig:ro
- ${CORP_CA_CERT}:/certs/corp-ca.crt:ro
environment:
DATABASE_URL: /data/trueref.db
GIT_TOKEN_BITBUCKET: '${BITBUCKET_TOKEN}'
GIT_TOKEN_GITLAB: '${GITLAB_TOKEN}'
BITBUCKET_HOST: '${BITBUCKET_HOST}'
GITLAB_HOST: '${GITLAB_HOST}'
restart: unless-stopped
mcp:
build: .
command: mcp
ports:
- '3001:3001'
environment:
TRUEREF_API_URL: http://web:3000
MCP_PORT: '3001'
depends_on:
- web
restart: unless-stopped
volumes:
trueref-data:
.env Template
# Corporate CA certificate (PEM or DER, auto-detected)
CORP_CA_CERT=C:/path/to/corp-ca.crt
# Git remote hostnames
BITBUCKET_HOST=bitbucket.corp.example.com
GITLAB_HOST=gitlab.corp.example.com
# Personal access tokens (never commit these)
BITBUCKET_TOKEN=
GITLAB_TOKEN=
Full docker-entrypoint.sh
#!/bin/sh
set -e
# 1. Trust corporate CA — must run first
if [ -f /certs/corp-ca.crt ]; then
if openssl x509 -inform PEM -in /certs/corp-ca.crt -noout 2>/dev/null; then
cp /certs/corp-ca.crt /usr/local/share/ca-certificates/corp-ca.crt
else
openssl x509 -inform DER -in /certs/corp-ca.crt \
-out /usr/local/share/ca-certificates/corp-ca.crt
fi
update-ca-certificates 2>/dev/null
fi
# 2. Fix SSH key permissions (Windows mounts arrive world-readable)
if [ -d /root/.ssh ]; then
chmod 700 /root/.ssh
chmod 600 /root/.ssh/* 2>/dev/null || true
fi
# 3. Per-host HTTPS credential helpers
if [ -n "$GIT_TOKEN_BITBUCKET" ] && [ -n "$BITBUCKET_HOST" ]; then
git config --global \
"credential.https://${BITBUCKET_HOST}.helper" \
"!f() { echo username=x-token-auth; echo password=\$GIT_TOKEN_BITBUCKET; }; f"
fi
if [ -n "$GIT_TOKEN_GITLAB" ] && [ -n "$GITLAB_HOST" ]; then
git config --global \
"credential.https://${GITLAB_HOST}.helper" \
"!f() { echo username=oauth2; echo password=\$GIT_TOKEN_GITLAB; }; f"
fi
case "${1:-web}" in
web)
echo "Running database migrations..."
DATABASE_URL="$DATABASE_URL" npx drizzle-kit migrate
echo "Starting TrueRef web app on port ${PORT:-3000}..."
exec node build
;;
mcp)
MCP_PORT="${MCP_PORT:-3001}"
echo "Starting TrueRef MCP HTTP server on port ${MCP_PORT}..."
exec npx tsx src/mcp/index.ts --transport http --port "$MCP_PORT"
;;
*)
exec "$@"
;;
esac
Acceptance Criteria
docker-entrypoint.shruns CA install, SSH permission fix, and credential helpers in that order before any other operation- CA cert mount path is
/certs/corp-ca.crt; PEM and DER are both handled without user intervention opensslis installed in the production Docker image- Per-host credential helpers are configured only when the relevant env vars are set (no-op if absent)
- SSH key permissions are corrected unconditionally when
/root/.sshexists docker-compose.ymlexposesBITBUCKET_HOST,GITLAB_HOST,GIT_TOKEN_BITBUCKET,GIT_TOKEN_GITLAB, andCORP_CA_CERTas first-class configuration points.env.exampleincludes all corporate deployment variables with placeholder values- README Docker section documents the corporate setup (cert export on Windows, SSH mount, per-host tokens)
Files to Create
Dockerfiledocker-compose.ymldocker-entrypoint.sh.dockerignore.env.example(update)
Files to Modify
src/lib/server/db/schema.ts— addcommit_hashto versions tablesrc/lib/server/db/migrations/— new migration forcommit_hashsrc/lib/server/services/version.service.ts—resolveTagToCommit, tag auto-discoverysrc/lib/server/pipeline/indexing.pipeline.ts—git archiveextraction, temp dir lifecycleREADME.md— Docker section, corporate deployment guide