Initial commit: tuned multi-model llama.cpp stack
- 5 models: SmolLM3-3B, Gemma4-E2B/E4B, Qwen3-4B, Qwen3.5-9B - TurboQuant image (FORCE_MMQ): +6-11% free speed on Turing GPUs - Bigctx profiles (-nkvo KV in RAM): 2-16x context gain - turbo2 KV: 2x smaller, benchmarked against PPL quality gate - Per-model env files with justified parameters - kv_quant_test.sh + cpu_ctx_test.sh benchmark scripts - docs/FINDINGS.md: surprises, pitfalls, recommendations - docs/ARCHITECTURE.md: compose + test script design
This commit is contained in:
34
.gitignore
vendored
Normal file
34
.gitignore
vendored
Normal file
@@ -0,0 +1,34 @@
|
||||
# Model files — large binaries, download with scripts/download_models.sh
|
||||
models/*.gguf
|
||||
models/*.bin
|
||||
models/*.safetensors
|
||||
|
||||
# Benchmark output logs, CSVs, and generated env snapshots — generated, not source
|
||||
benchmark-results/*.log
|
||||
benchmark-results/*.csv
|
||||
benchmark-results/*.txt
|
||||
benchmark-results/*.env
|
||||
# Keep the .gitkeep placeholder
|
||||
!benchmark-results/.gitkeep
|
||||
|
||||
# Docker build cache artifacts
|
||||
.docker/
|
||||
|
||||
# Python cache
|
||||
__pycache__/
|
||||
*.pyc
|
||||
*.pyo
|
||||
.venv/
|
||||
|
||||
# Editor / OS artifacts
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
.idea/
|
||||
.vscode/
|
||||
|
||||
# Local overrides (never commit secrets or machine-specific tweaks)
|
||||
.env.local
|
||||
envs/.env.*.local
|
||||
Reference in New Issue
Block a user