GPU warmup (src/transcriber.rs):
After creating WhisperState, run a 1s silent inference pass in load().
CUDA JIT-compiles device kernels on the first whisper_full_with_state call.
On a cold GPU this compilation disrupts the decode pipeline mid-inference,
returning 0 segments in ~0.5s. The warmup forces all kernel compilation at
startup so the first real job runs on fully compiled kernels.
test_all.sh:
- Fix submit response field: 'id' → 'job_id' (was breaking all downstream steps)
- Remove language=auto: not a valid ISO 639-1 code; omit field for auto-detect
- Make BASE and AUDIO configurable via env vars (WHISPER_BASE_URL, TEST_AUDIO)
- Fix DELETE assertion: completed jobs return 409 Conflict, not 204
- Add explicit zero-segments failure check in quality inspection (step 9)
- Add progress reporting to poll loop
docs/FINDINGS.md + KNOWLEDGE.md:
Document cold GPU warmup issue, root cause, and fix.
Document language=auto as invalid API usage.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>