mozempk/whisper-rtx2080

Files

Build & Push Docker Image / build-and-push (push) Successful in 17s

Details

docs: add ARCHITECTURE, CODE_STYLE, FINDINGS, USAGE under docs/

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-05-06 10:17:53 +02:00

6.8 KiB

Raw Blame History

Code Style

This document describes the conventions used in this codebase. Follow them when adding or modifying code.

Language & Edition

Rust 2021 edition
Stable toolchain only — no nightly features
rustfmt is installed in the Docker builder; run it before committing

Error Handling

Use `AppError` everywhere

All fallible functions in the HTTP layer return crate::Result<T> which is std::result::Result<T, AppError>.

// Good
pub async fn get_job(...) -> Result<Json<Job>> {
    let job = state.storage.get(&id).await?;
    Ok(Json(job))
}

// Bad — don't use anyhow inside handlers
pub async fn get_job(...) -> anyhow::Result<...> { ... }

AppError has four variants that map cleanly to HTTP status codes:

Variant	HTTP
`NotFound(String)`	404
`BadRequest(String)`	400
`Conflict(String)`	409
`Internal(String)`	500

`anyhow` is only for `main()`

anyhow::Result is used in main() for startup failures where a single error type is not needed. It must not leak into handler or worker code.

Propagate with `?`, map errors explicitly

Prefer .map_err(|e| AppError::Internal(format!("context: {e}"))) over .unwrap() or .expect() in any code path that can be reached at runtime.

// Good
fs::write(&path, payload).await.map_err(|e| {
    AppError::Internal(format!("failed to write job {}: {e}", job.id))
})?;

// Bad
fs::write(&path, payload).await.unwrap();

Async & Concurrency

Tokio for I/O, std thread for GPU

All file I/O, HTTP, ffmpeg subprocess calls: use tokio::fs, tokio::process::Command
GPU inference runs on a dedicated std::thread — never spawn a tokio::task that blocks on WhisperContext

Don't block the async runtime

Long CPU-bound or blocking operations must not run on Tokio tasks. Currently the only such operation is whisper inference, which lives on the whisper-gpu OS thread and communicates back via oneshot channels.

Channel conventions

Channel type	Used for
`tokio::sync::mpsc::UnboundedSender<JobId>`	HTTP → worker: new job notifications
`std::sync::mpsc::Sender<TranscribeRequest>`	Worker → GPU thread: inference requests
`tokio::sync::oneshot`	GPU thread → worker: inference result
`tokio::sync::broadcast`	Worker → SSE subscribers: progress events

State Management

AppState is Clone and passed via Axum's State extractor. All fields are either Arc<T> or cheap-to-clone types (mpsc::UnboundedSender is Clone).

#[derive(Clone)]
pub struct AppState {
    pub job_tx:      mpsc::UnboundedSender<models::JobId>,  // cheap clone
    pub storage:     Arc<storage::Storage>,                   // shared ref
    pub progress:    worker::ProgressRegistry,                // Arc<DashMap>
    pub model_name:  Arc<str>,                                // shared ref
    pub queue_depth: Arc<std::sync::atomic::AtomicUsize>,    // shared atomic
    pub gpu_device:  u32,                                     // copy
}

Never put Mutex<T> in AppState for hot paths. Prefer atomics (AtomicUsize) for counters and DashMap for concurrent maps.

Naming

Types: UpperCamelCase
Functions / methods / locals: snake_case
Constants: SCREAMING_SNAKE_CASE
Module files match the concept they contain (storage.rs, transcriber.rs)

const TARGET_CHUNK_SECS: f32 = 60.0;
const SNAP_WINDOW_SECS:  f32 = 30.0;

Align related constant definitions vertically with spaces — the compiler doesn't care and it's significantly easier to scan.

Comments

Comment why, not what. The code shows what; the comment explains decisions, non-obvious constraints, and gotchas.

// Good — explains the constraint
// no_context: do not use previous call's transcript as initial prompt.
// Each silence-chunked audio segment is independent; cross-chunk context
// would re-anchor the decoder to any hallucinations from the prior chunk.
fp.set_no_context(true);

// Bad — restates the code
// set no_context to true
fp.set_context(true);

For disabled code (commented-out functionality): always include a reason.

// Flash Attention disabled: causes silent 0-segment output on some
// real-world audio (conference recordings, noisy MP3s).
// params.flash_attn(true);

Logging

Use tracing macros at the appropriate level:

Level	When
`error!`	Unrecoverable failure; always investigated
`warn!`	Recoverable unexpected state (e.g. ffmpeg unavailable, job not found at dequeue)
`info!`	Significant lifecycle events (server start, model load, job queued/done)
`debug!`	Per-chunk detail, useful when diagnosing a specific job
`trace!`	Very high-frequency detail (sample-level, buffer operations)

Always use structured fields, not string interpolation, so logs are machine-parseable (the subscriber emits JSON):

// Good
tracing::info!(job_id = %id, model = path, "job queued");

// Bad
tracing::info!("job {} queued, model: {}", id, path);

Serialisation

All public API types derive Serialize/Deserialize from serde
Use #[serde(rename_all = "snake_case")] on enums used in JSON responses
Use #[serde(skip_serializing_if = "Option::is_none")] on optional fields to keep the JSON clean
Timestamps are chrono::DateTime<Utc> — they serialise to ISO 8601 automatically

OpenAPI Annotations

All route handlers are annotated with #[utoipa::path(...)]. Keep the annotations up to date when adding or changing endpoints. The Swagger UI is served at /docs in development.

File Structure Conventions

src/
  main.rs          — entry point only; no business logic
  models.rs        — pure data types; no logic
  error.rs         — AppError + IntoResponse
  storage.rs       — I/O only; no business logic
  transcriber.rs   — whisper inference only
  worker.rs        — orchestration: audio pipeline, job lifecycle
  webhook.rs       — HTTP delivery only
  routes/
    mod.rs         — router assembly only
    jobs.rs        — HTTP handlers for job endpoints
    health.rs      — HTTP handler for /health

Keep each file focused on a single concern. Business logic lives in worker.rs. Data shapes live in models.rs. Avoid cross-cutting imports in the wrong direction (e.g. storage.rs must not import from worker.rs).

Release Profile

[profile.release]
opt-level     = 3
lto           = "thin"
codegen-units = 1
strip         = "symbols"

lto = "thin" is a good balance between link time and binary size/performance
codegen-units = 1 maximises inlining across crate boundaries — important for hot inference paths
strip = "symbols" keeps the Docker runtime image small; debug info is not needed in production

6.8 KiB Raw Blame History