fix: create WhisperState once at load time, reuse across all chunks
Some checks failed
Build & Push Docker Image / build-and-push (push) Has been cancelled
Some checks failed
Build & Push Docker Image / build-and-push (push) Has been cancelled
Previously create_state() was called for every 60s audio chunk, triggering whisper_init_state() each time. This allocates ~700 MB of GPU compute buffers (KV caches, CUDA workspace) and re-initialises the CUDA backend per chunk. For a 101-minute audio (102 chunks), this caused 102 GPU re-initialisations and VRAM allocation cycles. Under VRAM pressure from concurrent processes, CUDA allocation failures occurred silently — whisper returned language detection results but 0 segments. Fix: create WhisperState once in Transcriber::load() and reuse it for every transcription call. GPU memory is stable; no_context=true prevents KV-cache contamination between chunks. WhisperState is Send+Sync (explicitly declared in whisper-rs) and holds its own Arc<WhisperInnerContext>, so the model weights stay alive even after WhisperContext is dropped. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -68,12 +68,17 @@ pub fn start(
|
||||
}
|
||||
|
||||
/// Dedicated OS thread that owns the Transcriber (non-Send) and runs inference.
|
||||
///
|
||||
/// The Transcriber holds a single `WhisperState` that is reused for every chunk.
|
||||
/// GPU compute buffers (~700 MB) are allocated once at startup rather than on
|
||||
/// every call, eliminating per-chunk `whisper_init_state` overhead and the
|
||||
/// VRAM churn that caused intermittent 0-segment results.
|
||||
fn transcriber_thread(
|
||||
rx: std::sync::mpsc::Receiver<TranscribeRequest>,
|
||||
model_path: PathBuf,
|
||||
gpu_device: u32,
|
||||
) {
|
||||
let transcriber = match Transcriber::load(&model_path, gpu_device) {
|
||||
let mut transcriber = match Transcriber::load(&model_path, gpu_device) {
|
||||
Ok(t) => t,
|
||||
Err(e) => {
|
||||
tracing::error!(error = %e, "failed to load whisper model — transcriber thread exiting");
|
||||
|
||||
Reference in New Issue
Block a user