whisper-rtx2080

mozempk/whisper-rtx2080

Fork 0

Commit Graph

Author	SHA1	Message	Date
mozempk	78c6fab81b	fix: remove duplicate old test suite and fix step 9 pipe/heredoc bug All checks were successful Build & Push Docker Image / build-and-push (push) Successful in 16s Details Step 9 used 'echo $RESULT \| python3 - << HEREDOC' which is a bash gotcha: the heredoc takes over stdin (as the script source), so the pipe is silently ignored and sys.stdin.read() returns empty string → JSONDecodeError. Fix: write RESULT to a temp file and pass it as sys.argv[1] to the script. Also removed the old buggy test suite that was accidentally left appended at lines 181-327 (had language=auto, ['id'] field, wrong DELETE assertion). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-06 12:13:15 +02:00
mozempk	fd8d4deefb	fix: GPU warmup on startup + fix test_all.sh + document cold-GPU finding All checks were successful Build & Push Docker Image / build-and-push (push) Successful in 6m39s Details GPU warmup (src/transcriber.rs): After creating WhisperState, run a 1s silent inference pass in load(). CUDA JIT-compiles device kernels on the first whisper_full_with_state call. On a cold GPU this compilation disrupts the decode pipeline mid-inference, returning 0 segments in ~0.5s. The warmup forces all kernel compilation at startup so the first real job runs on fully compiled kernels. test_all.sh: - Fix submit response field: 'id' → 'job_id' (was breaking all downstream steps) - Remove language=auto: not a valid ISO 639-1 code; omit field for auto-detect - Make BASE and AUDIO configurable via env vars (WHISPER_BASE_URL, TEST_AUDIO) - Fix DELETE assertion: completed jobs return 409 Conflict, not 204 - Add explicit zero-segments failure check in quality inspection (step 9) - Add progress reporting to poll loop docs/FINDINGS.md + KNOWLEDGE.md: Document cold GPU warmup issue, root cause, and fix. Document language=auto as invalid API usage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-06 11:57:30 +02:00
mozempk	16cb6ca661	feat: GPU-accelerated Whisper API for RTX 2080 (sm_75) All checks were successful Build & Push Docker Image / build-and-push (push) Successful in 11m13s Details - Pure Rust: Axum 0.7 + whisper-rs 0.13 (CUDA FFI) - Async job queue with SSE progress streaming - Webhook delivery with 5x exponential backoff - Disk-persisted job state (survives restarts) - Anti-hallucination params: no_speech_thold, entropy_thold, suppress_blank - CUDA sm_75 flags: GGML_CUDA_FORCE_MMQ, GGML_CUDA_GRAPHS, GGML_CUDA_FA_ALL_QUANTS - Configurable via env: CUDA_DEVICE, WHISPER_MODEL_PATH, PORT, DATA_DIR - Gitea Actions CI: build + push to git.sal.giize.com registry - Multi-stage Dockerfile with customizable CUDA_VERSION ARG Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-05 22:47:24 +02:00

Author

SHA1

Message

Date

mozempk

78c6fab81b

fix: remove duplicate old test suite and fix step 9 pipe/heredoc bug

Build & Push Docker Image / build-and-push (push) Successful in 16s

Details

Step 9 used 'echo $RESULT | python3 - << HEREDOC' which is a bash gotcha:
the heredoc takes over stdin (as the script source), so the pipe is
silently ignored and sys.stdin.read() returns empty string → JSONDecodeError.

Fix: write RESULT to a temp file and pass it as sys.argv[1] to the script.

Also removed the old buggy test suite that was accidentally left appended
at lines 181-327 (had language=auto, ['id'] field, wrong DELETE assertion).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-05-06 12:13:15 +02:00

mozempk

fd8d4deefb

fix: GPU warmup on startup + fix test_all.sh + document cold-GPU finding

Build & Push Docker Image / build-and-push (push) Successful in 6m39s

Details

GPU warmup (src/transcriber.rs):
  After creating WhisperState, run a 1s silent inference pass in load().
  CUDA JIT-compiles device kernels on the first whisper_full_with_state call.
  On a cold GPU this compilation disrupts the decode pipeline mid-inference,
  returning 0 segments in ~0.5s. The warmup forces all kernel compilation at
  startup so the first real job runs on fully compiled kernels.

test_all.sh:
  - Fix submit response field: 'id' → 'job_id' (was breaking all downstream steps)
  - Remove language=auto: not a valid ISO 639-1 code; omit field for auto-detect
  - Make BASE and AUDIO configurable via env vars (WHISPER_BASE_URL, TEST_AUDIO)
  - Fix DELETE assertion: completed jobs return 409 Conflict, not 204
  - Add explicit zero-segments failure check in quality inspection (step 9)
  - Add progress reporting to poll loop

docs/FINDINGS.md + KNOWLEDGE.md:
  Document cold GPU warmup issue, root cause, and fix.
  Document language=auto as invalid API usage.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-05-06 11:57:30 +02:00

mozempk

16cb6ca661

feat: GPU-accelerated Whisper API for RTX 2080 (sm_75)

Build & Push Docker Image / build-and-push (push) Successful in 11m13s

Details

- Pure Rust: Axum 0.7 + whisper-rs 0.13 (CUDA FFI)
- Async job queue with SSE progress streaming
- Webhook delivery with 5x exponential backoff
- Disk-persisted job state (survives restarts)
- Anti-hallucination params: no_speech_thold, entropy_thold, suppress_blank
- CUDA sm_75 flags: GGML_CUDA_FORCE_MMQ, GGML_CUDA_GRAPHS, GGML_CUDA_FA_ALL_QUANTS
- Configurable via env: CUDA_DEVICE, WHISPER_MODEL_PATH, PORT, DATA_DIR
- Gitea Actions CI: build + push to git.sal.giize.com registry
- Multi-stage Dockerfile with customizable CUDA_VERSION ARG

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-05-05 22:47:24 +02:00

3 Commits