- 5 models: SmolLM3-3B, Gemma4-E2B/E4B, Qwen3-4B, Qwen3.5-9B - TurboQuant image (FORCE_MMQ): +6-11% free speed on Turing GPUs - Bigctx profiles (-nkvo KV in RAM): 2-16x context gain - turbo2 KV: 2x smaller, benchmarked against PPL quality gate - Per-model env files with justified parameters - kv_quant_test.sh + cpu_ctx_test.sh benchmark scripts - docs/FINDINGS.md: surprises, pitfalls, recommendations - docs/ARCHITECTURE.md: compose + test script design
35 lines
628 B
Plaintext
35 lines
628 B
Plaintext
# Model files — large binaries, download with scripts/download_models.sh
|
|
models/*.gguf
|
|
models/*.bin
|
|
models/*.safetensors
|
|
|
|
# Benchmark output logs, CSVs, and generated env snapshots — generated, not source
|
|
benchmark-results/*.log
|
|
benchmark-results/*.csv
|
|
benchmark-results/*.txt
|
|
benchmark-results/*.env
|
|
# Keep the .gitkeep placeholder
|
|
!benchmark-results/.gitkeep
|
|
|
|
# Docker build cache artifacts
|
|
.docker/
|
|
|
|
# Python cache
|
|
__pycache__/
|
|
*.pyc
|
|
*.pyo
|
|
.venv/
|
|
|
|
# Editor / OS artifacts
|
|
.DS_Store
|
|
Thumbs.db
|
|
*.swp
|
|
*.swo
|
|
*~
|
|
.idea/
|
|
.vscode/
|
|
|
|
# Local overrides (never commit secrets or machine-specific tweaks)
|
|
.env.local
|
|
envs/.env.*.local
|