whisper-rtx2080

mozempk/whisper-rtx2080

Fork 0

Commit Graph

Author	SHA1	Message	Date
mozempk	fd8d4deefb	fix: GPU warmup on startup + fix test_all.sh + document cold-GPU finding All checks were successful Build & Push Docker Image / build-and-push (push) Successful in 6m39s Details GPU warmup (src/transcriber.rs): After creating WhisperState, run a 1s silent inference pass in load(). CUDA JIT-compiles device kernels on the first whisper_full_with_state call. On a cold GPU this compilation disrupts the decode pipeline mid-inference, returning 0 segments in ~0.5s. The warmup forces all kernel compilation at startup so the first real job runs on fully compiled kernels. test_all.sh: - Fix submit response field: 'id' → 'job_id' (was breaking all downstream steps) - Remove language=auto: not a valid ISO 639-1 code; omit field for auto-detect - Make BASE and AUDIO configurable via env vars (WHISPER_BASE_URL, TEST_AUDIO) - Fix DELETE assertion: completed jobs return 409 Conflict, not 204 - Add explicit zero-segments failure check in quality inspection (step 9) - Add progress reporting to poll loop docs/FINDINGS.md + KNOWLEDGE.md: Document cold GPU warmup issue, root cause, and fix. Document language=auto as invalid API usage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-06 11:57:30 +02:00
mozempk	8fc45ee86f	docs: add KNOWLEDGE.md with lessons learned and improvement notes All checks were successful Build & Push Docker Image / build-and-push (push) Successful in 18s Details Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-06 10:09:27 +02:00

Author

SHA1

Message

Date

mozempk

fd8d4deefb

fix: GPU warmup on startup + fix test_all.sh + document cold-GPU finding

Build & Push Docker Image / build-and-push (push) Successful in 6m39s

Details

GPU warmup (src/transcriber.rs):
  After creating WhisperState, run a 1s silent inference pass in load().
  CUDA JIT-compiles device kernels on the first whisper_full_with_state call.
  On a cold GPU this compilation disrupts the decode pipeline mid-inference,
  returning 0 segments in ~0.5s. The warmup forces all kernel compilation at
  startup so the first real job runs on fully compiled kernels.

test_all.sh:
  - Fix submit response field: 'id' → 'job_id' (was breaking all downstream steps)
  - Remove language=auto: not a valid ISO 639-1 code; omit field for auto-detect
  - Make BASE and AUDIO configurable via env vars (WHISPER_BASE_URL, TEST_AUDIO)
  - Fix DELETE assertion: completed jobs return 409 Conflict, not 204
  - Add explicit zero-segments failure check in quality inspection (step 9)
  - Add progress reporting to poll loop

docs/FINDINGS.md + KNOWLEDGE.md:
  Document cold GPU warmup issue, root cause, and fix.
  Document language=auto as invalid API usage.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-05-06 11:57:30 +02:00

mozempk

8fc45ee86f

docs: add KNOWLEDGE.md with lessons learned and improvement notes

Build & Push Docker Image / build-and-push (push) Successful in 18s

Details

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-05-06 10:09:27 +02:00

2 Commits