Files
whisper-rtx2080/src
mozempk fb8556441c
All checks were successful
Build & Push Docker Image / build-and-push (push) Successful in 6m40s
feat: silence-based audio chunking before transcription
Run ffmpeg silencedetect (n=-35dB, d=0.4s) on the original audio to
find silence midpoints. Build chunk boundaries every 180s, snapping to
the nearest silence midpoint within ±30s (fallback: hard cut).

Each chunk is transcribed independently with its own CUDA context;
timestamps are shifted by chunk_start before merging. Progress is
scaled per-chunk across the overall 0-100% job range.

Result on 101-min YouTube audio (34 chunks, 1714 silence points):
- Previous: x1025 'Yeah.' + x1008 sentence-length loops (hallucinations)
- After:    x4 max consecutive run, all repetitions verified genuine

Also refactored TranscribeRequest to carry on_progress: Box<dyn Fn(u8)>
instead of a raw ProgressTx so each chunk can independently scale its
contribution to the job's broadcast channel.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 01:08:06 +02:00
..