llama+compose: fix bigctx startup timing
- compose: increase start_period for bigctx services - gemma4-e4b-bigctx: 60s -> 150s (5 GiB model + warmup + 163840 ctx takes ~90-120s) - gemma4-e2b-bigctx: 60s -> 120s (large ctx 393216 allocation) - smollm3/qwen3-4b bigctx: 60s -> 90s - llama: extend health poll from 30x2s=60s to 75x2s=150s - llama: require 3 consecutive unhealthy before giving up (avoids false positives during Docker start_period window)
This commit is contained in:
@@ -183,7 +183,7 @@ services:
|
||||
env_file: envs/.env.smollm3-3b-bigctx
|
||||
healthcheck:
|
||||
<<: *hc
|
||||
start_period: 60s
|
||||
start_period: 90s
|
||||
|
||||
llama-gemma4-e2b-bigctx:
|
||||
image: local/llama-cpp-turboquant:server-cuda-sm75-mmq
|
||||
@@ -192,7 +192,7 @@ services:
|
||||
env_file: envs/.env.gemma4-e2b-bigctx
|
||||
healthcheck:
|
||||
<<: *hc
|
||||
start_period: 60s
|
||||
start_period: 120s # large ctx (393216) allocation takes extra time
|
||||
|
||||
llama-gemma4-e4b-bigctx:
|
||||
image: local/llama-cpp-turboquant:server-cuda-sm75-mmq
|
||||
@@ -201,7 +201,7 @@ services:
|
||||
env_file: envs/.env.gemma4-e4b-bigctx
|
||||
healthcheck:
|
||||
<<: *hc
|
||||
start_period: 60s
|
||||
start_period: 150s # 5 GiB model + warmup + 163840 ctx takes ~90-120s
|
||||
|
||||
llama-qwen3-4b-bigctx:
|
||||
image: local/llama-cpp-turboquant:server-cuda-sm75-mmq
|
||||
@@ -210,7 +210,7 @@ services:
|
||||
env_file: envs/.env.qwen3-4b-bigctx
|
||||
healthcheck:
|
||||
<<: *hc
|
||||
start_period: 60s
|
||||
start_period: 90s
|
||||
|
||||
# ── OPEN WEBUI ─────────────────────────────────────────────────────────────
|
||||
# Separate profile — add to any running model:
|
||||
|
||||
Reference in New Issue
Block a user