docs: add TRUEREF-0019 feature request for git-native versioning and corporate deployment
Covers two related concerns: Part 1 — Git-native version indexing: resolve version tags to commit hashes via git rev-parse, store commit_hash in the versions table, auto- discover tags on repo registration, extract per-version file trees with git archive to avoid disturbing the working directory, and support explicit commitHash overrides in trueref.json. Part 2 — Corporate deployment support: per-host HTTPS credential helpers for Bitbucket Server and GitLab, SSH key mounting with Windows permission fix, CA certificate handling (PEM/DER auto-detection), and the full docker-compose and entrypoint reference configuration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
347
docs/features/TRUEREF-0019.md
Normal file
347
docs/features/TRUEREF-0019.md
Normal file
@@ -0,0 +1,347 @@
|
||||
# TRUEREF-0019 — Git-Native Version Indexing and Corporate Deployment Support
|
||||
|
||||
**Priority:** P1
|
||||
**Status:** Pending
|
||||
**Depends On:** TRUEREF-0001, TRUEREF-0002
|
||||
**Blocks:** —
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
TrueRef is intended for corporate environments where developers work with private repositories hosted on Bitbucket Server/Data Center and self-hosted GitLab instances. This feature covers two tightly related concerns:
|
||||
|
||||
1. **Git-native version indexing** — resolve version tags to specific commit hashes, extract file trees per commit using `git archive`, and store indexed snippets against an exact commit rather than a floating label.
|
||||
2. **Corporate deployment support** — run TrueRef inside Docker with access to private remotes, per-host credentials, and corporate CA certificates without requiring changes to the host machine beyond environment variables and file mounts.
|
||||
|
||||
Together these allow a team to deploy TrueRef once, point it at their internal repositories, and have it return version-accurate documentation to LLM assistants.
|
||||
|
||||
---
|
||||
|
||||
## Part 1: Git-Native Version Indexing
|
||||
|
||||
### Background
|
||||
|
||||
Currently, versions are registered manually with a tag and a title. There is no link between a version tag and a specific git commit, so re-indexing the same version after a branch has moved produces different results silently. For documentation retrieval to be trustworthy, each indexed version must be pinned to an immutable commit hash.
|
||||
|
||||
### Version → Commit Hash Resolution
|
||||
|
||||
Git tags are the canonical mapping. Both lightweight and annotated tags must be handled:
|
||||
|
||||
```sh
|
||||
# Resolves any tag, branch, or ref to the underlying commit hash
|
||||
git -C /path/to/repo rev-parse <tag>^{commit}
|
||||
```
|
||||
|
||||
The `^{commit}` peel dereferences annotated tag objects to the commit they point at.
|
||||
|
||||
For GitHub-hosted repositories the GitHub REST API provides the same mapping without requiring a local clone:
|
||||
|
||||
```
|
||||
GET /repos/{owner}/{repo}/git/refs/tags/{tag}
|
||||
```
|
||||
|
||||
If the returned object type is `tag` (annotated), a second request is required to dereference it:
|
||||
|
||||
```
|
||||
GET /repos/{owner}/{repo}/git/tags/{sha}
|
||||
→ { object: { sha: "<commit-sha>", type: "commit" } }
|
||||
```
|
||||
|
||||
### Schema Change
|
||||
|
||||
Add `commit_hash` to the versions table:
|
||||
|
||||
```sql
|
||||
ALTER TABLE versions ADD COLUMN commit_hash TEXT;
|
||||
```
|
||||
|
||||
Populated at version registration time. Re-indexing a version always uses the stored commit hash, never re-resolves the tag (tags can be moved; the stored hash is immutable).
|
||||
|
||||
### Tag Auto-Discovery
|
||||
|
||||
When a repository is added or re-fetched, TrueRef should discover all version tags automatically and propose them to the user rather than requiring manual entry:
|
||||
|
||||
```sh
|
||||
git -C /path/to/repo tag -l
|
||||
# → ["v1.0.0", "v1.1.0", "v2.0.0", ...]
|
||||
```
|
||||
|
||||
For each tag, resolve to a commit hash and pre-populate the versions table. The user can then select which versions to index from the UI.
|
||||
|
||||
### Per-Version File Extraction via `git archive`
|
||||
|
||||
`git archive` extracts a clean file tree from any commit without disturbing the working directory or requiring a separate clone:
|
||||
|
||||
```sh
|
||||
git -C /path/to/repo archive <commit-hash> | tar -x -C /tmp/trueref-idx/<repo>-<tag>/
|
||||
```
|
||||
|
||||
Advantages over `git checkout` or worktrees:
|
||||
- Working directory is completely untouched
|
||||
- No `.git` directory in the output (cleaner for parsing)
|
||||
- Temp directory deleted after indexing with no git state to clean up
|
||||
- Multiple versions can be extracted in parallel
|
||||
|
||||
The indexing pipeline receives the extracted temp directory as its source path — identical to the existing local crawler interface.
|
||||
|
||||
```
|
||||
trigger index v2.0.0
|
||||
→ look up commit_hash = "a3f9c12"
|
||||
→ git archive a3f9c12 | tar -x /tmp/trueref/repo-v2.0.0/
|
||||
→ run crawler on /tmp/trueref/repo-v2.0.0/
|
||||
→ store snippets { repo_id, version_tag, commit_hash }
|
||||
→ rm -rf /tmp/trueref/repo-v2.0.0/
|
||||
```
|
||||
|
||||
### trueref.json Explicit Override
|
||||
|
||||
Allow commit hashes to be pinned explicitly per version, overriding tag resolution. Useful when tags are mutable, versioning is non-standard, or a specific patch within a release must be targeted:
|
||||
|
||||
```json
|
||||
{
|
||||
"previousVersions": [
|
||||
{
|
||||
"tag": "v2.0.0",
|
||||
"title": "Version 2.0.0",
|
||||
"commitHash": "a3f9c12abc..."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Edge Cases
|
||||
|
||||
| Case | Handling |
|
||||
|------|----------|
|
||||
| Annotated tags | `rev-parse <tag>^{commit}` peels to commit automatically |
|
||||
| Mutable tags (e.g. `latest`) | Re-resolve on re-index; warn in UI if hash has changed |
|
||||
| Branch as version | `rev-parse origin/<branch>^{commit}` gives tip; re-resolves on re-index |
|
||||
| Shallow clone | Run `git fetch --unshallow` before `git archive` if commit is unavailable |
|
||||
| Submodules | `git archive --recurse-submodules` or document as a known limitation |
|
||||
| Git LFS | `git lfs pull` required after archive if LFS-tracked files are needed for indexing |
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- [ ] `versions` table has a `commit_hash` column with a migration
|
||||
- [ ] `resolveTagToCommit(repoPath, tag): string` utility using `child_process`
|
||||
- [ ] Tag auto-discovery runs on repo registration and on manual re-fetch
|
||||
- [ ] Discovered tags appear in the UI with their resolved commit hashes
|
||||
- [ ] Indexing a version extracts files via `git archive` to a temp directory
|
||||
- [ ] Temp directory is deleted after indexing completes (success or failure)
|
||||
- [ ] Snippets are stored with both `version_tag` and `commit_hash`
|
||||
- [ ] `trueref.json` `commitHash` field overrides tag resolution when present
|
||||
- [ ] Re-indexing a version with a moved tag warns the user if the hash has changed
|
||||
|
||||
---
|
||||
|
||||
## Part 2: Corporate Deployment Support
|
||||
|
||||
### Background
|
||||
|
||||
Corporate environments introduce three constraints not present in standard cloud setups:
|
||||
|
||||
1. **Multiple private git remotes** — typically Bitbucket Server/Data Center and self-hosted GitLab on separate hostnames, each requiring independent credentials.
|
||||
2. **Credential management on Windows** — Git Credential Manager stores credentials in the Windows Credential Manager (DPAPI-encrypted), which is inaccessible from inside a Linux container. Personal Access Tokens passed as environment variables are the correct substitute.
|
||||
3. **Custom CA certificates** — on-premise servers use certificates issued by corporate CAs not trusted by default in Linux container images. All git operations, HTTP requests, and Node.js `fetch` calls fail until the CA is registered at the OS level.
|
||||
|
||||
### Multi-Remote Credential Configuration
|
||||
|
||||
Two environment variables carry the HTTPS tokens, one per remote. Git's per-host credential scoping routes each token to the correct server:
|
||||
|
||||
```sh
|
||||
# In docker-entrypoint.sh, before any git operation
|
||||
|
||||
if [ -n "$GIT_TOKEN_BITBUCKET" ] && [ -n "$BITBUCKET_HOST" ]; then
|
||||
git config --global \
|
||||
"credential.https://${BITBUCKET_HOST}.helper" \
|
||||
"!f() { echo username=x-token-auth; echo password=\$GIT_TOKEN_BITBUCKET; }; f"
|
||||
fi
|
||||
|
||||
if [ -n "$GIT_TOKEN_GITLAB" ] && [ -n "$GITLAB_HOST" ]; then
|
||||
git config --global \
|
||||
"credential.https://${GITLAB_HOST}.helper" \
|
||||
"!f() { echo username=oauth2; echo password=\$GIT_TOKEN_GITLAB; }; f"
|
||||
fi
|
||||
```
|
||||
|
||||
Username conventions by server type:
|
||||
|
||||
| Server | HTTPS username | Password |
|
||||
|--------|---------------|----------|
|
||||
| Bitbucket Server / Data Center | `x-token-auth` | HTTP access token |
|
||||
| Bitbucket Cloud | account username | App password |
|
||||
| GitLab (self-hosted or cloud) | `oauth2` | Personal access token |
|
||||
| GitLab deploy token | `gitlab-deploy-token` | Deploy token secret |
|
||||
|
||||
SSH authentication is also supported and preferred for long-lived deployments. The host SSH configuration (`~/.ssh/config`) handles per-host key selection and travels into the container via volume mount.
|
||||
|
||||
### Corporate CA Certificate Handling
|
||||
|
||||
The CA certificate is mounted as a read-only file at a staging path. The entrypoint detects the encoding (PEM or DER) and installs it at the system level before any other operation runs:
|
||||
|
||||
```sh
|
||||
if [ -f /certs/corp-ca.crt ]; then
|
||||
if openssl x509 -inform PEM -in /certs/corp-ca.crt -noout 2>/dev/null; then
|
||||
cp /certs/corp-ca.crt /usr/local/share/ca-certificates/corp-ca.crt
|
||||
else
|
||||
openssl x509 -inform DER -in /certs/corp-ca.crt \
|
||||
-out /usr/local/share/ca-certificates/corp-ca.crt
|
||||
fi
|
||||
update-ca-certificates 2>/dev/null
|
||||
fi
|
||||
```
|
||||
|
||||
Installing at the OS level means git, curl, and Node.js `fetch` all trust the certificate without per-tool flags. `GIT_SSL_NO_VERIFY` is explicitly not used.
|
||||
|
||||
`openssl` must be added to the production image:
|
||||
|
||||
```dockerfile
|
||||
RUN apk add --no-cache openssl
|
||||
```
|
||||
|
||||
### SSH Key Permissions
|
||||
|
||||
Windows NTFS does not track Unix permissions. SSH keys mounted from `%USERPROFILE%\.ssh` arrive with world-readable permissions that the SSH client rejects. The entrypoint corrects this before any git operation:
|
||||
|
||||
```sh
|
||||
if [ -d /root/.ssh ]; then
|
||||
chmod 700 /root/.ssh
|
||||
chmod 600 /root/.ssh/* 2>/dev/null || true
|
||||
fi
|
||||
```
|
||||
|
||||
### docker-compose.yml
|
||||
|
||||
```yaml
|
||||
services:
|
||||
web:
|
||||
build: .
|
||||
ports:
|
||||
- "3000:3000"
|
||||
volumes:
|
||||
- trueref-data:/data
|
||||
- ${USERPROFILE}/.ssh:/root/.ssh:ro
|
||||
- ${USERPROFILE}/.gitconfig:/root/.gitconfig:ro
|
||||
- ${CORP_CA_CERT}:/certs/corp-ca.crt:ro
|
||||
environment:
|
||||
DATABASE_URL: /data/trueref.db
|
||||
GIT_TOKEN_BITBUCKET: "${BITBUCKET_TOKEN}"
|
||||
GIT_TOKEN_GITLAB: "${GITLAB_TOKEN}"
|
||||
BITBUCKET_HOST: "${BITBUCKET_HOST}"
|
||||
GITLAB_HOST: "${GITLAB_HOST}"
|
||||
restart: unless-stopped
|
||||
|
||||
mcp:
|
||||
build: .
|
||||
command: mcp
|
||||
ports:
|
||||
- "3001:3001"
|
||||
environment:
|
||||
TRUEREF_API_URL: http://web:3000
|
||||
MCP_PORT: "3001"
|
||||
depends_on:
|
||||
- web
|
||||
restart: unless-stopped
|
||||
|
||||
volumes:
|
||||
trueref-data:
|
||||
```
|
||||
|
||||
### .env Template
|
||||
|
||||
```env
|
||||
# Corporate CA certificate (PEM or DER, auto-detected)
|
||||
CORP_CA_CERT=C:/path/to/corp-ca.crt
|
||||
|
||||
# Git remote hostnames
|
||||
BITBUCKET_HOST=bitbucket.corp.example.com
|
||||
GITLAB_HOST=gitlab.corp.example.com
|
||||
|
||||
# Personal access tokens (never commit these)
|
||||
BITBUCKET_TOKEN=
|
||||
GITLAB_TOKEN=
|
||||
```
|
||||
|
||||
### Full docker-entrypoint.sh
|
||||
|
||||
```sh
|
||||
#!/bin/sh
|
||||
set -e
|
||||
|
||||
# 1. Trust corporate CA — must run first
|
||||
if [ -f /certs/corp-ca.crt ]; then
|
||||
if openssl x509 -inform PEM -in /certs/corp-ca.crt -noout 2>/dev/null; then
|
||||
cp /certs/corp-ca.crt /usr/local/share/ca-certificates/corp-ca.crt
|
||||
else
|
||||
openssl x509 -inform DER -in /certs/corp-ca.crt \
|
||||
-out /usr/local/share/ca-certificates/corp-ca.crt
|
||||
fi
|
||||
update-ca-certificates 2>/dev/null
|
||||
fi
|
||||
|
||||
# 2. Fix SSH key permissions (Windows mounts arrive world-readable)
|
||||
if [ -d /root/.ssh ]; then
|
||||
chmod 700 /root/.ssh
|
||||
chmod 600 /root/.ssh/* 2>/dev/null || true
|
||||
fi
|
||||
|
||||
# 3. Per-host HTTPS credential helpers
|
||||
if [ -n "$GIT_TOKEN_BITBUCKET" ] && [ -n "$BITBUCKET_HOST" ]; then
|
||||
git config --global \
|
||||
"credential.https://${BITBUCKET_HOST}.helper" \
|
||||
"!f() { echo username=x-token-auth; echo password=\$GIT_TOKEN_BITBUCKET; }; f"
|
||||
fi
|
||||
|
||||
if [ -n "$GIT_TOKEN_GITLAB" ] && [ -n "$GITLAB_HOST" ]; then
|
||||
git config --global \
|
||||
"credential.https://${GITLAB_HOST}.helper" \
|
||||
"!f() { echo username=oauth2; echo password=\$GIT_TOKEN_GITLAB; }; f"
|
||||
fi
|
||||
|
||||
case "${1:-web}" in
|
||||
web)
|
||||
echo "Running database migrations..."
|
||||
DATABASE_URL="$DATABASE_URL" npx drizzle-kit migrate
|
||||
echo "Starting TrueRef web app on port ${PORT:-3000}..."
|
||||
exec node build
|
||||
;;
|
||||
mcp)
|
||||
MCP_PORT="${MCP_PORT:-3001}"
|
||||
echo "Starting TrueRef MCP HTTP server on port ${MCP_PORT}..."
|
||||
exec npx tsx src/mcp/index.ts --transport http --port "$MCP_PORT"
|
||||
;;
|
||||
*)
|
||||
exec "$@"
|
||||
;;
|
||||
esac
|
||||
```
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- [ ] `docker-entrypoint.sh` runs CA install, SSH permission fix, and credential helpers in that order before any other operation
|
||||
- [ ] CA cert mount path is `/certs/corp-ca.crt`; PEM and DER are both handled without user intervention
|
||||
- [ ] `openssl` is installed in the production Docker image
|
||||
- [ ] Per-host credential helpers are configured only when the relevant env vars are set (no-op if absent)
|
||||
- [ ] SSH key permissions are corrected unconditionally when `/root/.ssh` exists
|
||||
- [ ] `docker-compose.yml` exposes `BITBUCKET_HOST`, `GITLAB_HOST`, `GIT_TOKEN_BITBUCKET`, `GIT_TOKEN_GITLAB`, and `CORP_CA_CERT` as first-class configuration points
|
||||
- [ ] `.env.example` includes all corporate deployment variables with placeholder values
|
||||
- [ ] README Docker section documents the corporate setup (cert export on Windows, SSH mount, per-host tokens)
|
||||
|
||||
---
|
||||
|
||||
## Files to Create
|
||||
|
||||
- `Dockerfile`
|
||||
- `docker-compose.yml`
|
||||
- `docker-entrypoint.sh`
|
||||
- `.dockerignore`
|
||||
- `.env.example` (update)
|
||||
|
||||
## Files to Modify
|
||||
|
||||
- `src/lib/server/db/schema.ts` — add `commit_hash` to versions table
|
||||
- `src/lib/server/db/migrations/` — new migration for `commit_hash`
|
||||
- `src/lib/server/services/version.service.ts` — `resolveTagToCommit`, tag auto-discovery
|
||||
- `src/lib/server/pipeline/indexing.pipeline.ts` — `git archive` extraction, temp dir lifecycle
|
||||
- `README.md` — Docker section, corporate deployment guide
|
||||
Reference in New Issue
Block a user