Initial commit: tuned multi-model llama.cpp stack

- 5 models: SmolLM3-3B, Gemma4-E2B/E4B, Qwen3-4B, Qwen3.5-9B - TurboQuant image (FORCE_MMQ): +6-11% free speed on Turing GPUs - Bigctx profiles (-nkvo KV in RAM): 2-16x context gain - turbo2 KV: 2x smaller, benchmarked against PPL quality gate - Per-model env files with justified parameters - kv_quant_test.sh + cpu_ctx_test.sh benchmark scripts - docs/FINDINGS.md: surprises, pitfalls, recommendations - docs/ARCHITECTURE.md: compose + test script design
2026-05-06 15:56:40 +02:00
commit 4ad296608b
22 changed files with 2530 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,34 @@
+# Model files — large binaries, download with scripts/download_models.sh
+models/*.gguf
+models/*.bin
+models/*.safetensors
+
+# Benchmark output logs, CSVs, and generated env snapshots — generated, not source
+benchmark-results/*.log
+benchmark-results/*.csv
+benchmark-results/*.txt
+benchmark-results/*.env
+# Keep the .gitkeep placeholder
+!benchmark-results/.gitkeep
+
+# Docker build cache artifacts
+.docker/
+
+# Python cache
+__pycache__/
+*.pyc
+*.pyo
+.venv/
+
+# Editor / OS artifacts
+.DS_Store
+Thumbs.db
+*.swp
+*.swo
+*~
+.idea/
+.vscode/
+
+# Local overrides (never commit secrets or machine-specific tweaks)
+.env.local
+envs/.env.*.local