mozempk/whisper-rtx2080

Fork 0

Files

mozempk bcaf8680db

Build & Push Docker Image / test (push) Successful in 5m54s

Details

Build & Push Docker Image / build-and-push (push) Successful in 17s

Details

docs: add FRONTEND_INTEGRATION.md developer guide

Comprehensive integration guide for frontend/full-stack developers:

- Architecture overview diagram
- Quick start (submit + poll in ~20 lines)
- Model lifecycle: state machine diagram, all 4 /model/* endpoints,
  SSE event subscription with JS examples
- Job submission: multipart fields, 503 model_not_ready handling,
  retry-with-auto-load pattern
- Job progress: polling vs SSE, all event types with payloads
- Webhooks: job completion + model lifecycle, Express receiver example,
  how to distinguish job vs model payloads
- Health check field reference
- Cancellation semantics (GPU inference not interruptible)
- Full TypeScript type definitions for all API shapes
- React hooks: useModelStatus, useJobStream, useTranscribe
- Complete WhisperClient class example with ensureModelReady,
  streamProgress, and end-to-end transcribe()
- Error reference table with all 400/404/409/503/500 shapes

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-05-08 23:47:13 +02:00

29 KiB

Raw Permalink Blame History

Frontend Integration Guide

Audience: Frontend / full-stack developers integrating the whisper transcription API into a web application.
Base URL: http://your-server:8080 (configurable via the PORT env var on the server).
Interactive docs: http://your-server:8080/docs (Swagger UI — try every endpoint live).

Architecture Overview
Quick Start — submit and poll
Model Lifecycle
- 3.1 State machine
- 3.2 GET /model/status
- 3.3 POST /model/load
- 3.4 POST /model/unload
- 3.5 GET /model/events (SSE)
Submitting Jobs
- 4.1 POST /jobs
- 4.2 Handling 503 Model Not Ready
- 4.3 Retry pattern with auto-load
Tracking Job Progress
- 5.1 GET /jobs/:id (poll)
- 5.2 GET /jobs/:id/stream (SSE)
Webhooks
- 6.1 Job completion webhook
- 6.2 Model lifecycle webhooks
Health Check
Cancelling Jobs
TypeScript Types
React Hooks
Complete Integration Example
Error Reference

1. Architecture Overview

┌─────────────────────────────────────────────────────────┐
│  whisper-server                                         │
│                                                         │
│  HTTP / SSE          Worker thread (GPU)                │
│  ────────────        ───────────────────                │
│  POST /jobs    ───►  job queue (FIFO)                   │
│  GET  /jobs/:id      ↕                                  │
│  GET  /jobs/:id/stream ◄── progress broadcast           │
│                                                         │
│  POST /model/load  ─►  load whisper into VRAM           │
│  POST /model/unload ►  free VRAM                        │
│  GET  /model/status    read state                       │
│  GET  /model/events ◄── lifecycle SSE broadcast         │
└─────────────────────────────────────────────────────────┘

Key behaviours to understand before building:

The model starts unloaded on every server restart. No inference is possible until it loads (~15–25 seconds for large-v3 on an RTX 2080).
Submitting a job when the model is not ready returns 503 with a Retry-After header and automatically triggers a load. You can retry the submission; no separate load call is needed.
The worker processes jobs sequentially (one at a time). Queue depth is visible via /health.
Long audio is split into silence-bounded chunks internally. SSE progress events reflect chunk completion, not raw GPU progress.

2. Quick Start — submit and poll

The simplest possible integration — no SSE, no model management, just submit and poll:

const BASE = 'http://your-server:8080';

async function transcribe(audioBlob: Blob): Promise<Job> {
  // 1. Submit
  const form = new FormData();
  form.append('audio', audioBlob, 'audio.wav');

  let submitResp = await fetch(`${BASE}/jobs`, { method: 'POST', body: form });

  // 2. If model isn't loaded yet, keep retrying until it is
  while (submitResp.status === 503) {
    const retryAfter = parseInt(submitResp.headers.get('Retry-After') ?? '15');
    await sleep(retryAfter * 1000);
    submitResp = await fetch(`${BASE}/jobs`, { method: 'POST', body: form });
  }
  if (!submitResp.ok) throw new Error(`Submit failed: ${submitResp.status}`);

  const { job_id } = await submitResp.json();

  // 3. Poll until done
  while (true) {
    await sleep(2000);
    const job: Job = await fetch(`${BASE}/jobs/${job_id}`).then(r => r.json());
    if (job.status === 'done')      return job;
    if (job.status === 'failed')    throw new Error(job.error ?? 'transcription failed');
    if (job.status === 'cancelled') throw new Error('job was cancelled');
  }
}

const sleep = (ms: number) => new Promise(r => setTimeout(r, ms));

For a better UX — real-time progress bar, model state indicator — read the full sections below.

3. Model Lifecycle

3.1 State machine

The model moves through four states:

          job submit
         or POST /model/load
               │
   ┌──────────▼───────────┐
   │       Unloaded        │◄──────────────────────────┐
   └──────────┬───────────┘                            │
              │ load triggered                         │
   ┌──────────▼───────────┐                            │
   │        Loading        │                            │ idle timeout
   └──┬──────────────┬────┘                            │ or POST /model/unload
      │ success      │ VRAM full                       │
      │              │                                  │
   ┌──▼────┐  ┌──────▼────────────────┐                │
   │ Ready │  │   WaitingForGpu       │────────────────►│
   └──┬────┘  └──────────────┬────────┘                │
      │         retry ok ────┘                         │
      └────────────────────────────────────────────────►┘

State	`state` value	Can accept jobs?
Unloaded	`"unloaded"`	❌ → triggers load, returns 503
Loading	`"loading"`	❌ → returns 503
Waiting for GPU	`"waiting_for_gpu"`	❌ → returns 503
Ready	`"ready"`	✅

3.2 `GET /model/status`

Returns the current model state and live VRAM figures (from nvidia-smi).

Unloaded:

{ "state": "unloaded" }

Loading:

{ "state": "loading" }

Waiting for GPU (VRAM contention):

{
  "state": "waiting_for_gpu",
  "vram_needed_mb": 3951,
  "vram_free_mb": 512,
  "retry_in_secs": 30
}

Ready:

{
  "state": "ready",
  "loaded_at": "2026-05-10T14:00:00.000Z",
  "vram_used_mb": 4096,
  "vram_total_mb": 8192
}

vram_used_mb / vram_total_mb are omitted when nvidia-smi is unavailable.

3.3 `POST /model/load`

Tells the server to load the model. Idempotent — safe to call multiple times.

curl -X POST http://your-server:8080/model/load

Responses:

Status	Body	Meaning
202	`{"status":"load_initiated"}`	Load queued
200	`{"status":"already_ready"}`	Already loaded

The load happens asynchronously. Subscribe to /model/events or poll /model/status to know when ready.

3.4 `POST /model/unload`

Immediately frees the model from GPU memory. In-flight jobs finish first; the model is dropped after the current inference completes.

curl -X POST http://your-server:8080/model/unload

Response: 200 {"status":"unload_requested"} (always, regardless of current state).

Use this if you know transcription won't happen for a while and you want to free VRAM for other workloads on the same GPU.

3.5 `GET /model/events` (SSE)

A persistent Server-Sent Events stream that emits every model lifecycle transition.

curl -N http://your-server:8080/model/events

Events emitted:

event: model_loading
data: {"type":"model_loading"}

event: model_ready
data: {"type":"model_ready","loaded_at":"2026-05-10T14:00:00.000Z"}

event: model_unloaded
data: {"type":"model_unloaded"}

event: model_waiting_for_gpu
data: {"type":"model_waiting_for_gpu","vram_needed_mb":3951,"vram_free_mb":512,"retry_in_secs":30}

JavaScript:

function subscribeModelEvents(
  onReady:       (loadedAt: string) => void,
  onUnloaded:    () => void,
  onLoading:     () => void,
  onWaitingGpu:  (info: { vram_needed_mb: number; vram_free_mb: number; retry_in_secs: number }) => void,
): () => void {
  const es = new EventSource(`${BASE}/model/events`);

  es.addEventListener('model_ready',          (e) => onReady(JSON.parse(e.data).loaded_at));
  es.addEventListener('model_unloaded',       ()  => onUnloaded());
  es.addEventListener('model_loading',        ()  => onLoading());
  es.addEventListener('model_waiting_for_gpu',(e) => onWaitingGpu(JSON.parse(e.data)));

  es.onerror = () => {
    // The browser reconnects automatically with exponential backoff.
    // Log the error but don't tear down the listener.
    console.warn('model/events connection dropped, reconnecting…');
  };

  return () => es.close(); // call this to clean up (e.g. in React useEffect return)
}

The server sends an SSE keepalive comment every 15 seconds so proxies don't close idle connections.

4. Submitting Jobs

4.1 `POST /jobs`

Content-Type: multipart/form-data

Field	Required	Type	Notes
`audio`	✅	file	Any format ffmpeg understands: WAV, MP3, M4A, OGG, FLAC, MP4, MKV … No size limit.
`language`	❌	string	ISO 639-1 code (`"en"`, `"it"`, `"fr"` …). Omit for auto-detection.
`task`	❌	string	`"transcribe"` (default) or `"translate"` (→ English)
`webhook_url`	❌	string	URL to POST the completed job to. Also registers the URL for model lifecycle webhooks.

202 Accepted:

{ "job_id": "550e8400-e29b-41d4-a716-446655440000" }

async function submitJob(
  audio: Blob,
  opts: { language?: string; task?: 'transcribe' | 'translate'; webhookUrl?: string } = {}
): Promise<string> {
  const form = new FormData();
  form.append('audio', audio, 'audio.wav');
  if (opts.language)   form.append('language', opts.language);
  if (opts.task)       form.append('task', opts.task);
  if (opts.webhookUrl) form.append('webhook_url', opts.webhookUrl);

  const resp = await fetch(`${BASE}/jobs`, { method: 'POST', body: form });
  if (!resp.ok) throw await toApiError(resp);

  const { job_id } = await resp.json();
  return job_id;
}

4.2 Handling 503 Model Not Ready

When the model isn't loaded, POST /jobs returns:

HTTP/1.1 503 Service Unavailable
Retry-After: 30
Content-Type: application/json

{
  "error": "model_not_ready",
  "state": "unloaded",
  "retry_after_secs": 30
}

retry_after_secs by state:

`state`	`retry_after_secs`	Why
`unloaded`	30	Load just triggered; RTX 2080 + large-v3 loads in ~15–25s
`loading`	10	Already loading; check again soon
`waiting_for_gpu`	`GPU_POLL_INTERVAL_SECS` (default 30)	VRAM busy; retry later

Submitting a job when the model is unloaded automatically triggers a load. You do NOT need a separate POST /model/load call for the normal happy path.

4.3 Retry pattern with auto-load

async function submitWithRetry(
  audio: Blob,
  opts: { language?: string; task?: 'transcribe' | 'translate'; webhookUrl?: string } = {},
  maxAttempts = 20,
): Promise<string> {
  const form = new FormData();
  form.append('audio', audio, 'audio.wav');
  if (opts.language)   form.append('language', opts.language);
  if (opts.task)       form.append('task', opts.task);
  if (opts.webhookUrl) form.append('webhook_url', opts.webhookUrl);

  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    const resp = await fetch(`${BASE}/jobs`, { method: 'POST', body: form });

    if (resp.status === 202) {
      const { job_id } = await resp.json();
      return job_id;
    }

    if (resp.status === 503) {
      const body = await resp.json();
      const waitMs = (parseInt(resp.headers.get('Retry-After') ?? '15') + 1) * 1000;
      console.log(`Model ${body.state} — waiting ${waitMs / 1000}s (attempt ${attempt}/${maxAttempts})`);
      await sleep(waitMs);
      continue;
    }

    throw await toApiError(resp);
  }

  throw new Error(`Model did not become ready after ${maxAttempts} attempts`);
}

Tip: For a better UX, subscribe to GET /model/events and wait for the model_ready event instead of sleeping blindly — then submit immediately when ready.

5. Tracking Job Progress

Two patterns: SSE (real-time push) or polling (simpler). SSE is preferred for UX.

5.1 `GET /jobs/:id` (poll)

Returns the full job document. Poll every 2–5 seconds while status is queued or running.

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "running",
  "task": "transcribe",
  "language": "en",
  "progress": 42,
  "duration_secs": 120.5,
  "segments": [],
  "created_at": "2026-05-10T14:00:00.000Z"
}

When status === "done":

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "done",
  "task": "transcribe",
  "language": "en",
  "progress": 100,
  "duration_secs": 120.5,
  "segments": [
    { "index": 0, "start": 0.0, "end": 3.5, "text": "Hello, world.", "words": [] },
    { "index": 1, "start": 3.6, "end": 7.2, "text": "How are you?", "words": [] }
  ],
  "created_at": "2026-05-10T14:00:00.000Z",
  "completed_at": "2026-05-10T14:02:35.000Z"
}

Terminal statuses: done, failed, cancelled — stop polling when you see one.

5.2 `GET /jobs/:id/stream` (SSE)

Subscribe immediately after submission. The connection is held open and events are pushed as they occur.

Event types:

event: progress
data: {"type":"progress","percent":42,"chunk":3,"chunks_total":7}

event: done
data: {"type":"done","job":{...full Job object...}}

event: error
data: {"type":"error","message":"whisper inference failed: ..."}

percent — overall job progress 0–100 (derived from chunks completed / total).
chunk / chunks_total — the audio is split on silences; each chunk is one whisper inference call.
If you open the stream after the job is already finished, you immediately receive a single done event.

function streamJobProgress(
  jobId: string,
  onProgress: (percent: number, chunk: number, total: number) => void,
  onDone:     (job: Job) => void,
  onError:    (message: string) => void,
): () => void {
  const es = new EventSource(`${BASE}/jobs/${jobId}/stream`);

  es.addEventListener('progress', (e) => {
    const { percent, chunk, chunks_total } = JSON.parse(e.data);
    onProgress(percent, chunk, chunks_total);
  });

  es.addEventListener('done', (e) => {
    const { job } = JSON.parse(e.data);
    es.close();
    onDone(job);
  });

  es.addEventListener('error', (e) => {
    // SSE protocol error vs application error — check if data exists
    if ('data' in e) {
      const { message } = JSON.parse((e as MessageEvent).data);
      onError(message);
    }
    es.close();
  });

  return () => es.close();
}

Note: Do not confuse the SSE error event (connection drop — no data) with the application error event (transcription failure — has data). The example above handles both.

6. Webhooks

Webhooks are fired as HTTP POST requests with Content-Type: application/json to the webhook_url you supply at job submission. The server retries up to 3 times with exponential backoff (1s, 2s) on non-2xx responses.

6.1 Job completion webhook

Fired when a job reaches done, failed, or cancelled.
Payload: the full Job object (same as GET /jobs/:id).

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "done",
  "task": "transcribe",
  "language": "en",
  "progress": 100,
  "duration_secs": 120.5,
  "segments": [
    { "index": 0, "start": 0.0, "end": 3.5, "text": "Hello, world.", "words": [] }
  ],
  "created_at": "2026-05-10T14:00:00.000Z",
  "completed_at": "2026-05-10T14:02:35.000Z"
}

6.2 Model lifecycle webhooks

Any URL that has ever appeared as a webhook_url in a job submission also receives model lifecycle webhooks for the lifetime of the server process. This lets your backend know when the model comes up or goes down without polling.

Only two events are delivered via webhook (the others are SSE-only):

Model ready:

{ "type": "model_ready", "loaded_at": "2026-05-10T14:00:00.000Z" }

Model unloaded:

{ "type": "model_unloaded" }

Express.js receiver example:

import express from 'express';
const app = express();
app.use(express.json());

app.post('/webhooks/whisper', (req, res) => {
  res.sendStatus(200); // acknowledge quickly — retries on non-2xx

  const body = req.body;

  if ('type' in body) {
    // Model lifecycle event
    if (body.type === 'model_ready') {
      console.log('Whisper model ready at', body.loaded_at);
    } else if (body.type === 'model_unloaded') {
      console.log('Whisper model freed GPU memory');
    }
    return;
  }

  // Job completion event — body is a Job object
  if (body.status === 'done') {
    console.log(`Job ${body.id} done — ${body.segments.length} segments`);
    processTranscript(body.segments);
  } else if (body.status === 'failed') {
    console.error(`Job ${body.id} failed:`, body.error);
  }
});

Distinguish job vs. model webhook: Job payloads have an id and status field. Model payloads have a type field at the top level (model_ready / model_unloaded).

7. Health Check

curl http://your-server:8080/health

{
  "status": "ok",
  "gpu_name": "NVIDIA GeForce RTX 2080",
  "vram_total_mb": 8192,
  "model": "large-v3",
  "queue_depth": 2,
  "model_state": "ready"
}

Field	Notes
`status`	Always `"ok"` when the server is reachable
`gpu_name`	From `nvidia-smi`; `null` if unavailable
`vram_total_mb`	Total VRAM in MiB; `null` if unavailable
`model`	Model name string (server config)
`queue_depth`	Jobs waiting (not counting the currently running one)
`model_state`	`"unloaded"` / `"loading"` / `"waiting_for_gpu"` / `"ready"`

8. Cancelling Jobs

curl -X DELETE http://your-server:8080/jobs/550e8400-e29b-41d4-a716-446655440000

200 — job marked cancelled. Returns the updated Job object.
404 — job not found.
409 — job already in a terminal state (done / failed / cancelled).

Important: whisper.cpp does not support mid-inference cancellation. If the job is currently running, the GPU inference will finish before the cancellation takes effect — the result is simply discarded and the status set to cancelled.

9. TypeScript Types

type ModelStateTag = 'unloaded' | 'loading' | 'waiting_for_gpu' | 'ready';
type JobStatus     = 'queued' | 'running' | 'done' | 'failed' | 'cancelled';
type Task          = 'transcribe' | 'translate';

interface ModelStatus {
  state: ModelStateTag;
  // ready only
  loaded_at?: string;
  // waiting_for_gpu only
  vram_needed_mb?: number;
  vram_free_mb?:   number;
  retry_in_secs?:  number;
  // always (when nvidia-smi available)
  vram_used_mb?:   number;
  vram_total_mb?:  number;
}

interface Word {
  text:        string;
  start:       number; // seconds
  end:         number; // seconds
  probability: number; // 0–1
}

interface Segment {
  index: number;
  start: number; // seconds
  end:   number; // seconds
  text:  string;
  words: Word[];
}

interface Job {
  id:            string;
  status:        JobStatus;
  task:          Task;
  language?:     string;     // ISO 639-1; null until detected/set
  progress:      number;     // 0–100
  duration_secs?: number;   // null until processing starts
  segments:      Segment[];  // populated when status = 'done'
  error?:        string;     // populated when status = 'failed'
  webhook_url?:  string;
  filename?:     string;
  created_at:    string;     // ISO 8601
  completed_at?: string;     // ISO 8601; null until terminal
}

// SSE payloads from GET /jobs/:id/stream
type JobSseEvent =
  | { type: 'progress'; percent: number; chunk: number; chunks_total: number }
  | { type: 'done';     job: Job }
  | { type: 'error';    message: string };

// SSE payloads from GET /model/events
type ModelSseEvent =
  | { type: 'model_loading' }
  | { type: 'model_ready';           loaded_at: string }
  | { type: 'model_unloaded' }
  | { type: 'model_waiting_for_gpu'; vram_needed_mb: number; vram_free_mb: number; retry_in_secs: number };

// Webhook payload — union of job completion and model lifecycle events
type WebhookPayload = Job | { type: 'model_ready'; loaded_at: string } | { type: 'model_unloaded' };

// Helpers
function isJobPayload(p: WebhookPayload): p is Job {
  return 'id' in p && 'status' in p;
}
function isModelPayload(p: WebhookPayload): p is { type: string } {
  return 'type' in p;
}

10. React Hooks

// useModelStatus.ts
import { useEffect, useState } from 'react';

const BASE = process.env.NEXT_PUBLIC_WHISPER_BASE_URL ?? '';

export function useModelStatus() {
  const [status, setStatus] = useState<ModelStatus | null>(null);

  // Initial fetch
  useEffect(() => {
    fetch(`${BASE}/model/status`)
      .then(r => r.json())
      .then(setStatus)
      .catch(console.error);
  }, []);

  // Live updates via SSE
  useEffect(() => {
    const es = new EventSource(`${BASE}/model/events`);

    const refresh = () => {
      fetch(`${BASE}/model/status`)
        .then(r => r.json())
        .then(setStatus)
        .catch(console.error);
    };

    es.addEventListener('model_loading',        refresh);
    es.addEventListener('model_ready',          refresh);
    es.addEventListener('model_unloaded',       refresh);
    es.addEventListener('model_waiting_for_gpu',refresh);
    es.onerror = () => console.warn('model/events reconnecting…');

    return () => es.close();
  }, []);

  return status;
}

// useJobStream.ts
import { useEffect, useRef, useState } from 'react';

type ProgressState = {
  percent: number;
  chunk: number;
  chunks_total: number;
};

export function useJobStream(jobId: string | null) {
  const [progress, setProgress] = useState<ProgressState | null>(null);
  const [job,      setJob]      = useState<Job | null>(null);
  const [error,    setError]    = useState<string | null>(null);
  const esRef = useRef<EventSource | null>(null);

  useEffect(() => {
    if (!jobId) return;

    esRef.current?.close();
    setProgress(null); setJob(null); setError(null);

    const es = new EventSource(`${BASE}/jobs/${jobId}/stream`);
    esRef.current = es;

    es.addEventListener('progress', (e) => {
      setProgress(JSON.parse(e.data));
    });

    es.addEventListener('done', (e) => {
      setJob(JSON.parse(e.data).job);
      setProgress({ percent: 100, chunk: 0, chunks_total: 0 });
      es.close();
    });

    es.addEventListener('error', (e) => {
      if ('data' in e) setError(JSON.parse((e as MessageEvent).data).message);
      es.close();
    });

    return () => es.close();
  }, [jobId]);

  return { progress, job, error };
}

// useTranscribe.ts — ties it all together
import { useState, useCallback } from 'react';

export function useTranscribe() {
  const [jobId,  setJobId]  = useState<string | null>(null);
  const [loading, setLoading] = useState(false);
  const [error,  setError]  = useState<string | null>(null);

  const submit = useCallback(async (
    audio: Blob,
    opts: { language?: string; task?: Task } = {}
  ) => {
    setLoading(true);
    setError(null);
    setJobId(null);

    try {
      const id = await submitWithRetry(audio, opts); // see §4.3
      setJobId(id);
    } catch (e) {
      setError(String(e));
    } finally {
      setLoading(false);
    }
  }, []);

  const { progress, job, error: streamError } = useJobStream(jobId);

  return { submit, loading, jobId, progress, job, error: error ?? streamError };
}

11. Complete Integration Example

A full transcription flow with model warm-up indicator and real-time progress:

// whisperClient.ts
const BASE = process.env.NEXT_PUBLIC_WHISPER_BASE_URL ?? '';

export class WhisperClient {
  /** Wait for the model to be ready, triggering a load if needed. */
  async ensureModelReady(timeoutMs = 120_000): Promise<void> {
    const status = await this.getModelStatus();
    if (status.state === 'ready') return;

    // Trigger load (idempotent)
    await fetch(`${BASE}/model/load`, { method: 'POST' });

    return new Promise((resolve, reject) => {
      const deadline = setTimeout(() => {
        es.close();
        reject(new Error('Model did not become ready within timeout'));
      }, timeoutMs);

      const es = new EventSource(`${BASE}/model/events`);
      es.addEventListener('model_ready', () => {
        clearTimeout(deadline);
        es.close();
        resolve();
      });
      es.onerror = () => {
        // Reconnects automatically; don't reject on transient drops.
      };
    });
  }

  async getModelStatus(): Promise<ModelStatus> {
    const r = await fetch(`${BASE}/model/status`);
    if (!r.ok) throw new Error(`/model/status ${r.status}`);
    return r.json();
  }

  async submit(
    audio: Blob,
    opts: { language?: string; task?: Task; webhookUrl?: string } = {}
  ): Promise<string> {
    return submitWithRetry(audio, opts);
  }

  streamProgress(
    jobId: string,
    callbacks: {
      onProgress?: (p: { percent: number; chunk: number; total: number }) => void;
      onDone?:     (job: Job) => void;
      onError?:    (msg: string) => void;
    }
  ): () => void {
    const es = new EventSource(`${BASE}/jobs/${jobId}/stream`);

    es.addEventListener('progress', (e) => {
      const d = JSON.parse(e.data);
      callbacks.onProgress?.({ percent: d.percent, chunk: d.chunk, total: d.chunks_total });
    });

    es.addEventListener('done', (e) => {
      callbacks.onDone?.(JSON.parse(e.data).job);
      es.close();
    });

    es.addEventListener('error', (e) => {
      if ('data' in e) callbacks.onError?.(JSON.parse((e as MessageEvent).data).message);
      es.close();
    });

    return () => es.close();
  }

  async transcribe(
    audio: Blob,
    opts: {
      language?: string;
      task?: Task;
      webhookUrl?: string;
      onProgress?: (percent: number) => void;
    } = {}
  ): Promise<Job> {
    const jobId = await this.submit(audio, opts);

    return new Promise((resolve, reject) => {
      this.streamProgress(jobId, {
        onProgress: (p) => opts.onProgress?.(p.percent),
        onDone:     resolve,
        onError:    (msg) => reject(new Error(msg)),
      });
    });
  }
}

// Usage
const whisper = new WhisperClient();

const job = await whisper.transcribe(audioBlob, {
  language: 'en',
  onProgress: (pct) => console.log(`${pct}%`),
});

for (const seg of job.segments) {
  console.log(`[${seg.start.toFixed(1)}s → ${seg.end.toFixed(1)}s] ${seg.text}`);
}

12. Error Reference

All error responses follow this shape:

{ "error": "human-readable message" }

With the following additions for specific errors:

503 model_not_ready:

{ "error": "model_not_ready", "state": "loading", "retry_after_secs": 10 }

HTTP	`error` value	When	What to do
400	`"missing 'audio' field"`	`audio` not in form	Fix the form
400	`"audio field is empty"`	Zero-byte file uploaded	Fix the file
400	`"task must be 'transcribe' or 'translate'"`	Bad `task` value	Fix the value
400	`"multipart error: …"`	Malformed request	Check content-type header
404	`"job … not found"`	Unknown job ID	Check the ID
409	`"job … is already in terminal state …"`	Cancelling a finished job	No action needed
503	`"model_not_ready"`	Model not loaded	See §4.2 — retry with `Retry-After`
500	`"worker channel closed"`	Server crash	Contact server admin

Network / SSE errors:

EventSource onerror with no .data = connection dropped. The browser reconnects automatically — no action needed unless you want to show a UI indicator.
HTTP 502/503/504 from a reverse proxy = the container is restarting. Wait and retry.

Last updated: 2026-05-08. Corresponds to whisper-server v0.1.0 commit d014826.

29 KiB Raw Permalink Blame History Unescape Escape