- 5 models: SmolLM3-3B, Gemma4-E2B/E4B, Qwen3-4B, Qwen3.5-9B - TurboQuant image (FORCE_MMQ): +6-11% free speed on Turing GPUs - Bigctx profiles (-nkvo KV in RAM): 2-16x context gain - turbo2 KV: 2x smaller, benchmarked against PPL quality gate - Per-model env files with justified parameters - kv_quant_test.sh + cpu_ctx_test.sh benchmark scripts - docs/FINDINGS.md: surprises, pitfalls, recommendations - docs/ARCHITECTURE.md: compose + test script design
14 lines
723 B
Bash
14 lines
723 B
Bash
# =============================================================================
|
|
# llama.cpp project root .env
|
|
#
|
|
# Per-model parameters have moved to envs/.env.<model>:
|
|
# envs/.env.qwen35-9b — Qwen3.5-9B Q8_0 TurboQuant ~4.4 t/s
|
|
# envs/.env.gemma4-e2b — Gemma 4 E2B Q4_K_M ~65 t/s
|
|
# envs/.env.gemma4-e4b — Gemma 4 E4B Q4_K_M (split) ~30 t/s
|
|
# envs/.env.smollm3-3b — SmolLM3 3B Q4_K_M ~90 t/s
|
|
# envs/.env.qwen3-4b — Qwen3 4B Q4_K_M ~75 t/s
|
|
#
|
|
# This file is loaded by Docker Compose for project-level interpolation.
|
|
# Add project-wide overrides here (e.g. COMPOSE_PROJECT_NAME).
|
|
# =============================================================================
|