feat: dynamic model loading/unloading with GPU polling
All checks were successful
Build & Push Docker Image / build-and-push (push) Successful in 8m41s
All checks were successful
Build & Push Docker Image / build-and-push (push) Successful in 8m41s
- Model starts unloaded (lazy); loads on first job or POST /model/load
- Auto-unloads after IDLE_TIMEOUT_SECS (default 300) of inactivity
- POST /model/unload for immediate manual release
- GPU-busy detection: on VRAM OOM, enters WaitingForGpu and retries
every GPU_POLL_INTERVAL_SECS (default 30) indefinitely
- POST /jobs when unloaded → 503 + Retry-After header, triggers load
- AppError::OutOfMemory and AppError::ModelNotReady variants
- WorkerCmd channel (SyncSender<WorkerCmd>) replaces bare tx_req channel
- Idle timer via recv_timeout(1s) tick inside OS thread (no extra thread)
- Model lifecycle events broadcast via tokio broadcast channel (SSE + webhooks)
- webhook_registry: all clients that ever submitted a webhook_url receive
model_ready and model_unloaded webhooks
- GPU warmup retained on every (re)load
New routes:
GET /model/status — current state + VRAM stats
POST /model/load — trigger load (idempotent)
POST /model/unload — immediate unload
GET /model/events — SSE stream of model lifecycle events
New env vars:
IDLE_TIMEOUT_SECS (default 300)
GPU_POLL_INTERVAL_SECS (default 30)
Tests:
tests/test_model_lifecycle.sh — 18 integration tests (full state machine,
SSE events, webhooks, concurrency, unload-during-load)
tests/test_idle_timeout.sh — 5 tests with short IDLE_TIMEOUT_SECS=5
test_all.sh updated: loads model before job submission, asserts
model_state in /health, adds POST /model/unload at end
Docs:
docs/USAGE.md: model lifecycle section, new env vars, 503 retry pattern,
updated /health response shape
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -19,7 +19,7 @@ use uuid::Uuid;
|
||||
|
||||
use crate::{
|
||||
models::{Job, JobId, JobStatus, SubmitResponse},
|
||||
worker::{audio_path_for, ProgressEvent},
|
||||
worker::{audio_path_for, ProgressEvent, WorkerCmd},
|
||||
AppError, AppState, Result,
|
||||
};
|
||||
|
||||
@@ -107,6 +107,36 @@ pub async fn submit_job(
|
||||
));
|
||||
}
|
||||
|
||||
// Check model state before accepting the job.
|
||||
let (model_ready, retry_after_secs, state_tag) = {
|
||||
let ms = state.model_state.read().await;
|
||||
let ready = ms.is_ready();
|
||||
let retry = ms.retry_after_secs();
|
||||
let tag = ms.tag().to_string();
|
||||
(ready, retry, tag)
|
||||
};
|
||||
|
||||
// Register the webhook URL regardless of model state — so model lifecycle
|
||||
// events are delivered even if the job itself is rejected.
|
||||
if let Some(url) = &webhook_url {
|
||||
state.webhook_registry.lock()
|
||||
.unwrap_or_else(|e| e.into_inner())
|
||||
.insert(url.clone());
|
||||
}
|
||||
|
||||
if !model_ready {
|
||||
// Trigger a load if the model is simply unloaded (not already loading).
|
||||
if state_tag == "unloaded" {
|
||||
let _ = state.cmd_tx.try_send(WorkerCmd::Load);
|
||||
}
|
||||
// Clean up the audio file we already wrote to disk.
|
||||
let _ = tokio::fs::remove_file(&audio_path).await;
|
||||
return Err(AppError::ModelNotReady {
|
||||
state: state_tag,
|
||||
retry_after_secs,
|
||||
});
|
||||
}
|
||||
|
||||
let mut job = Job::new(id, task, webhook_url, filename);
|
||||
job.language = language;
|
||||
|
||||
|
||||
Reference in New Issue
Block a user