Files
llama-cpp/.env
Giancarmine Salucci 4ad296608b Initial commit: tuned multi-model llama.cpp stack
- 5 models: SmolLM3-3B, Gemma4-E2B/E4B, Qwen3-4B, Qwen3.5-9B
- TurboQuant image (FORCE_MMQ): +6-11% free speed on Turing GPUs
- Bigctx profiles (-nkvo KV in RAM): 2-16x context gain
- turbo2 KV: 2x smaller, benchmarked against PPL quality gate
- Per-model env files with justified parameters
- kv_quant_test.sh + cpu_ctx_test.sh benchmark scripts
- docs/FINDINGS.md: surprises, pitfalls, recommendations
- docs/ARCHITECTURE.md: compose + test script design
2026-05-06 15:56:40 +02:00

14 lines
723 B
Bash

# =============================================================================
# llama.cpp project root .env
#
# Per-model parameters have moved to envs/.env.<model>:
# envs/.env.qwen35-9b — Qwen3.5-9B Q8_0 TurboQuant ~4.4 t/s
# envs/.env.gemma4-e2b — Gemma 4 E2B Q4_K_M ~65 t/s
# envs/.env.gemma4-e4b — Gemma 4 E4B Q4_K_M (split) ~30 t/s
# envs/.env.smollm3-3b — SmolLM3 3B Q4_K_M ~90 t/s
# envs/.env.qwen3-4b — Qwen3 4B Q4_K_M ~75 t/s
#
# This file is loaded by Docker Compose for project-level interpolation.
# Add project-wide overrides here (e.g. COMPOSE_PROJECT_NAME).
# =============================================================================