Files
whisper-rtx2080/docs/FRONTEND_INTEGRATION.md
mozempk bcaf8680db
All checks were successful
Build & Push Docker Image / test (push) Successful in 5m54s
Build & Push Docker Image / build-and-push (push) Successful in 17s
docs: add FRONTEND_INTEGRATION.md developer guide
Comprehensive integration guide for frontend/full-stack developers:

- Architecture overview diagram
- Quick start (submit + poll in ~20 lines)
- Model lifecycle: state machine diagram, all 4 /model/* endpoints,
  SSE event subscription with JS examples
- Job submission: multipart fields, 503 model_not_ready handling,
  retry-with-auto-load pattern
- Job progress: polling vs SSE, all event types with payloads
- Webhooks: job completion + model lifecycle, Express receiver example,
  how to distinguish job vs model payloads
- Health check field reference
- Cancellation semantics (GPU inference not interruptible)
- Full TypeScript type definitions for all API shapes
- React hooks: useModelStatus, useJobStream, useTranscribe
- Complete WhisperClient class example with ensureModelReady,
  streamProgress, and end-to-end transcribe()
- Error reference table with all 400/404/409/503/500 shapes

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-08 23:47:13 +02:00

29 KiB
Raw Permalink Blame History

Frontend Integration Guide

Audience: Frontend / full-stack developers integrating the whisper transcription API into a web application.
Base URL: http://your-server:8080 (configurable via the PORT env var on the server).
Interactive docs: http://your-server:8080/docs (Swagger UI — try every endpoint live).


Table of Contents

  1. Architecture Overview
  2. Quick Start — submit and poll
  3. Model Lifecycle
  4. Submitting Jobs
  5. Tracking Job Progress
  6. Webhooks
  7. Health Check
  8. Cancelling Jobs
  9. TypeScript Types
  10. React Hooks
  11. Complete Integration Example
  12. Error Reference

1. Architecture Overview

┌─────────────────────────────────────────────────────────┐
│  whisper-server                                         │
│                                                         │
│  HTTP / SSE          Worker thread (GPU)                │
│  ────────────        ───────────────────                │
│  POST /jobs    ───►  job queue (FIFO)                   │
│  GET  /jobs/:id      ↕                                  │
│  GET  /jobs/:id/stream ◄── progress broadcast           │
│                                                         │
│  POST /model/load  ─►  load whisper into VRAM           │
│  POST /model/unload ►  free VRAM                        │
│  GET  /model/status    read state                       │
│  GET  /model/events ◄── lifecycle SSE broadcast         │
└─────────────────────────────────────────────────────────┘

Key behaviours to understand before building:

  • The model starts unloaded on every server restart. No inference is possible until it loads (~1525 seconds for large-v3 on an RTX 2080).
  • Submitting a job when the model is not ready returns 503 with a Retry-After header and automatically triggers a load. You can retry the submission; no separate load call is needed.
  • The worker processes jobs sequentially (one at a time). Queue depth is visible via /health.
  • Long audio is split into silence-bounded chunks internally. SSE progress events reflect chunk completion, not raw GPU progress.

2. Quick Start — submit and poll

The simplest possible integration — no SSE, no model management, just submit and poll:

const BASE = 'http://your-server:8080';

async function transcribe(audioBlob: Blob): Promise<Job> {
  // 1. Submit
  const form = new FormData();
  form.append('audio', audioBlob, 'audio.wav');

  let submitResp = await fetch(`${BASE}/jobs`, { method: 'POST', body: form });

  // 2. If model isn't loaded yet, keep retrying until it is
  while (submitResp.status === 503) {
    const retryAfter = parseInt(submitResp.headers.get('Retry-After') ?? '15');
    await sleep(retryAfter * 1000);
    submitResp = await fetch(`${BASE}/jobs`, { method: 'POST', body: form });
  }
  if (!submitResp.ok) throw new Error(`Submit failed: ${submitResp.status}`);

  const { job_id } = await submitResp.json();

  // 3. Poll until done
  while (true) {
    await sleep(2000);
    const job: Job = await fetch(`${BASE}/jobs/${job_id}`).then(r => r.json());
    if (job.status === 'done')      return job;
    if (job.status === 'failed')    throw new Error(job.error ?? 'transcription failed');
    if (job.status === 'cancelled') throw new Error('job was cancelled');
  }
}

const sleep = (ms: number) => new Promise(r => setTimeout(r, ms));

For a better UX — real-time progress bar, model state indicator — read the full sections below.


3. Model Lifecycle

3.1 State machine

The model moves through four states:

          job submit
         or POST /model/load
               │
   ┌──────────▼───────────┐
   │       Unloaded        │◄──────────────────────────┐
   └──────────┬───────────┘                            │
              │ load triggered                         │
   ┌──────────▼───────────┐                            │
   │        Loading        │                            │ idle timeout
   └──┬──────────────┬────┘                            │ or POST /model/unload
      │ success      │ VRAM full                       │
      │              │                                  │
   ┌──▼────┐  ┌──────▼────────────────┐                │
   │ Ready │  │   WaitingForGpu       │────────────────►│
   └──┬────┘  └──────────────┬────────┘                │
      │         retry ok ────┘                         │
      └────────────────────────────────────────────────►┘
State state value Can accept jobs?
Unloaded "unloaded" → triggers load, returns 503
Loading "loading" → returns 503
Waiting for GPU "waiting_for_gpu" → returns 503
Ready "ready"

3.2 GET /model/status

Returns the current model state and live VRAM figures (from nvidia-smi).

Unloaded:

{ "state": "unloaded" }

Loading:

{ "state": "loading" }

Waiting for GPU (VRAM contention):

{
  "state": "waiting_for_gpu",
  "vram_needed_mb": 3951,
  "vram_free_mb": 512,
  "retry_in_secs": 30
}

Ready:

{
  "state": "ready",
  "loaded_at": "2026-05-10T14:00:00.000Z",
  "vram_used_mb": 4096,
  "vram_total_mb": 8192
}

vram_used_mb / vram_total_mb are omitted when nvidia-smi is unavailable.


3.3 POST /model/load

Tells the server to load the model. Idempotent — safe to call multiple times.

curl -X POST http://your-server:8080/model/load

Responses:

Status Body Meaning
202 {"status":"load_initiated"} Load queued
200 {"status":"already_ready"} Already loaded

The load happens asynchronously. Subscribe to /model/events or poll /model/status to know when ready.


3.4 POST /model/unload

Immediately frees the model from GPU memory. In-flight jobs finish first; the model is dropped after the current inference completes.

curl -X POST http://your-server:8080/model/unload

Response: 200 {"status":"unload_requested"} (always, regardless of current state).

Use this if you know transcription won't happen for a while and you want to free VRAM for other workloads on the same GPU.


3.5 GET /model/events (SSE)

A persistent Server-Sent Events stream that emits every model lifecycle transition.

curl -N http://your-server:8080/model/events

Events emitted:

event: model_loading
data: {"type":"model_loading"}

event: model_ready
data: {"type":"model_ready","loaded_at":"2026-05-10T14:00:00.000Z"}

event: model_unloaded
data: {"type":"model_unloaded"}

event: model_waiting_for_gpu
data: {"type":"model_waiting_for_gpu","vram_needed_mb":3951,"vram_free_mb":512,"retry_in_secs":30}

JavaScript:

function subscribeModelEvents(
  onReady:       (loadedAt: string) => void,
  onUnloaded:    () => void,
  onLoading:     () => void,
  onWaitingGpu:  (info: { vram_needed_mb: number; vram_free_mb: number; retry_in_secs: number }) => void,
): () => void {
  const es = new EventSource(`${BASE}/model/events`);

  es.addEventListener('model_ready',          (e) => onReady(JSON.parse(e.data).loaded_at));
  es.addEventListener('model_unloaded',       ()  => onUnloaded());
  es.addEventListener('model_loading',        ()  => onLoading());
  es.addEventListener('model_waiting_for_gpu',(e) => onWaitingGpu(JSON.parse(e.data)));

  es.onerror = () => {
    // The browser reconnects automatically with exponential backoff.
    // Log the error but don't tear down the listener.
    console.warn('model/events connection dropped, reconnecting…');
  };

  return () => es.close(); // call this to clean up (e.g. in React useEffect return)
}

The server sends an SSE keepalive comment every 15 seconds so proxies don't close idle connections.


4. Submitting Jobs

4.1 POST /jobs

Content-Type: multipart/form-data

Field Required Type Notes
audio file Any format ffmpeg understands: WAV, MP3, M4A, OGG, FLAC, MP4, MKV … No size limit.
language string ISO 639-1 code ("en", "it", "fr" …). Omit for auto-detection.
task string "transcribe" (default) or "translate" (→ English)
webhook_url string URL to POST the completed job to. Also registers the URL for model lifecycle webhooks.

202 Accepted:

{ "job_id": "550e8400-e29b-41d4-a716-446655440000" }
async function submitJob(
  audio: Blob,
  opts: { language?: string; task?: 'transcribe' | 'translate'; webhookUrl?: string } = {}
): Promise<string> {
  const form = new FormData();
  form.append('audio', audio, 'audio.wav');
  if (opts.language)   form.append('language', opts.language);
  if (opts.task)       form.append('task', opts.task);
  if (opts.webhookUrl) form.append('webhook_url', opts.webhookUrl);

  const resp = await fetch(`${BASE}/jobs`, { method: 'POST', body: form });
  if (!resp.ok) throw await toApiError(resp);

  const { job_id } = await resp.json();
  return job_id;
}

4.2 Handling 503 Model Not Ready

When the model isn't loaded, POST /jobs returns:

HTTP/1.1 503 Service Unavailable
Retry-After: 30
Content-Type: application/json
{
  "error": "model_not_ready",
  "state": "unloaded",
  "retry_after_secs": 30
}

retry_after_secs by state:

state retry_after_secs Why
unloaded 30 Load just triggered; RTX 2080 + large-v3 loads in ~1525s
loading 10 Already loading; check again soon
waiting_for_gpu GPU_POLL_INTERVAL_SECS (default 30) VRAM busy; retry later

Submitting a job when the model is unloaded automatically triggers a load. You do NOT need a separate POST /model/load call for the normal happy path.


4.3 Retry pattern with auto-load

async function submitWithRetry(
  audio: Blob,
  opts: { language?: string; task?: 'transcribe' | 'translate'; webhookUrl?: string } = {},
  maxAttempts = 20,
): Promise<string> {
  const form = new FormData();
  form.append('audio', audio, 'audio.wav');
  if (opts.language)   form.append('language', opts.language);
  if (opts.task)       form.append('task', opts.task);
  if (opts.webhookUrl) form.append('webhook_url', opts.webhookUrl);

  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    const resp = await fetch(`${BASE}/jobs`, { method: 'POST', body: form });

    if (resp.status === 202) {
      const { job_id } = await resp.json();
      return job_id;
    }

    if (resp.status === 503) {
      const body = await resp.json();
      const waitMs = (parseInt(resp.headers.get('Retry-After') ?? '15') + 1) * 1000;
      console.log(`Model ${body.state} — waiting ${waitMs / 1000}s (attempt ${attempt}/${maxAttempts})`);
      await sleep(waitMs);
      continue;
    }

    throw await toApiError(resp);
  }

  throw new Error(`Model did not become ready after ${maxAttempts} attempts`);
}

Tip: For a better UX, subscribe to GET /model/events and wait for the model_ready event instead of sleeping blindly — then submit immediately when ready.


5. Tracking Job Progress

Two patterns: SSE (real-time push) or polling (simpler). SSE is preferred for UX.

5.1 GET /jobs/:id (poll)

Returns the full job document. Poll every 25 seconds while status is queued or running.

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "running",
  "task": "transcribe",
  "language": "en",
  "progress": 42,
  "duration_secs": 120.5,
  "segments": [],
  "created_at": "2026-05-10T14:00:00.000Z"
}

When status === "done":

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "done",
  "task": "transcribe",
  "language": "en",
  "progress": 100,
  "duration_secs": 120.5,
  "segments": [
    { "index": 0, "start": 0.0, "end": 3.5, "text": "Hello, world.", "words": [] },
    { "index": 1, "start": 3.6, "end": 7.2, "text": "How are you?", "words": [] }
  ],
  "created_at": "2026-05-10T14:00:00.000Z",
  "completed_at": "2026-05-10T14:02:35.000Z"
}

Terminal statuses: done, failed, cancelled — stop polling when you see one.


5.2 GET /jobs/:id/stream (SSE)

Subscribe immediately after submission. The connection is held open and events are pushed as they occur.

Event types:

event: progress
data: {"type":"progress","percent":42,"chunk":3,"chunks_total":7}

event: done
data: {"type":"done","job":{...full Job object...}}

event: error
data: {"type":"error","message":"whisper inference failed: ..."}
  • percent — overall job progress 0100 (derived from chunks completed / total).
  • chunk / chunks_total — the audio is split on silences; each chunk is one whisper inference call.
  • If you open the stream after the job is already finished, you immediately receive a single done event.
function streamJobProgress(
  jobId: string,
  onProgress: (percent: number, chunk: number, total: number) => void,
  onDone:     (job: Job) => void,
  onError:    (message: string) => void,
): () => void {
  const es = new EventSource(`${BASE}/jobs/${jobId}/stream`);

  es.addEventListener('progress', (e) => {
    const { percent, chunk, chunks_total } = JSON.parse(e.data);
    onProgress(percent, chunk, chunks_total);
  });

  es.addEventListener('done', (e) => {
    const { job } = JSON.parse(e.data);
    es.close();
    onDone(job);
  });

  es.addEventListener('error', (e) => {
    // SSE protocol error vs application error — check if data exists
    if ('data' in e) {
      const { message } = JSON.parse((e as MessageEvent).data);
      onError(message);
    }
    es.close();
  });

  return () => es.close();
}

Note: Do not confuse the SSE error event (connection drop — no data) with the application error event (transcription failure — has data). The example above handles both.


6. Webhooks

Webhooks are fired as HTTP POST requests with Content-Type: application/json to the webhook_url you supply at job submission. The server retries up to 3 times with exponential backoff (1s, 2s) on non-2xx responses.

6.1 Job completion webhook

Fired when a job reaches done, failed, or cancelled.
Payload: the full Job object (same as GET /jobs/:id).

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "done",
  "task": "transcribe",
  "language": "en",
  "progress": 100,
  "duration_secs": 120.5,
  "segments": [
    { "index": 0, "start": 0.0, "end": 3.5, "text": "Hello, world.", "words": [] }
  ],
  "created_at": "2026-05-10T14:00:00.000Z",
  "completed_at": "2026-05-10T14:02:35.000Z"
}

6.2 Model lifecycle webhooks

Any URL that has ever appeared as a webhook_url in a job submission also receives model lifecycle webhooks for the lifetime of the server process. This lets your backend know when the model comes up or goes down without polling.

Only two events are delivered via webhook (the others are SSE-only):

Model ready:

{ "type": "model_ready", "loaded_at": "2026-05-10T14:00:00.000Z" }

Model unloaded:

{ "type": "model_unloaded" }

Express.js receiver example:

import express from 'express';
const app = express();
app.use(express.json());

app.post('/webhooks/whisper', (req, res) => {
  res.sendStatus(200); // acknowledge quickly — retries on non-2xx

  const body = req.body;

  if ('type' in body) {
    // Model lifecycle event
    if (body.type === 'model_ready') {
      console.log('Whisper model ready at', body.loaded_at);
    } else if (body.type === 'model_unloaded') {
      console.log('Whisper model freed GPU memory');
    }
    return;
  }

  // Job completion event — body is a Job object
  if (body.status === 'done') {
    console.log(`Job ${body.id} done — ${body.segments.length} segments`);
    processTranscript(body.segments);
  } else if (body.status === 'failed') {
    console.error(`Job ${body.id} failed:`, body.error);
  }
});

Distinguish job vs. model webhook: Job payloads have an id and status field. Model payloads have a type field at the top level (model_ready / model_unloaded).


7. Health Check

curl http://your-server:8080/health
{
  "status": "ok",
  "gpu_name": "NVIDIA GeForce RTX 2080",
  "vram_total_mb": 8192,
  "model": "large-v3",
  "queue_depth": 2,
  "model_state": "ready"
}
Field Notes
status Always "ok" when the server is reachable
gpu_name From nvidia-smi; null if unavailable
vram_total_mb Total VRAM in MiB; null if unavailable
model Model name string (server config)
queue_depth Jobs waiting (not counting the currently running one)
model_state "unloaded" / "loading" / "waiting_for_gpu" / "ready"

8. Cancelling Jobs

curl -X DELETE http://your-server:8080/jobs/550e8400-e29b-41d4-a716-446655440000
  • 200 — job marked cancelled. Returns the updated Job object.
  • 404 — job not found.
  • 409 — job already in a terminal state (done / failed / cancelled).

Important: whisper.cpp does not support mid-inference cancellation. If the job is currently running, the GPU inference will finish before the cancellation takes effect — the result is simply discarded and the status set to cancelled.


9. TypeScript Types

type ModelStateTag = 'unloaded' | 'loading' | 'waiting_for_gpu' | 'ready';
type JobStatus     = 'queued' | 'running' | 'done' | 'failed' | 'cancelled';
type Task          = 'transcribe' | 'translate';

interface ModelStatus {
  state: ModelStateTag;
  // ready only
  loaded_at?: string;
  // waiting_for_gpu only
  vram_needed_mb?: number;
  vram_free_mb?:   number;
  retry_in_secs?:  number;
  // always (when nvidia-smi available)
  vram_used_mb?:   number;
  vram_total_mb?:  number;
}

interface Word {
  text:        string;
  start:       number; // seconds
  end:         number; // seconds
  probability: number; // 01
}

interface Segment {
  index: number;
  start: number; // seconds
  end:   number; // seconds
  text:  string;
  words: Word[];
}

interface Job {
  id:            string;
  status:        JobStatus;
  task:          Task;
  language?:     string;     // ISO 639-1; null until detected/set
  progress:      number;     // 0100
  duration_secs?: number;   // null until processing starts
  segments:      Segment[];  // populated when status = 'done'
  error?:        string;     // populated when status = 'failed'
  webhook_url?:  string;
  filename?:     string;
  created_at:    string;     // ISO 8601
  completed_at?: string;     // ISO 8601; null until terminal
}

// SSE payloads from GET /jobs/:id/stream
type JobSseEvent =
  | { type: 'progress'; percent: number; chunk: number; chunks_total: number }
  | { type: 'done';     job: Job }
  | { type: 'error';    message: string };

// SSE payloads from GET /model/events
type ModelSseEvent =
  | { type: 'model_loading' }
  | { type: 'model_ready';           loaded_at: string }
  | { type: 'model_unloaded' }
  | { type: 'model_waiting_for_gpu'; vram_needed_mb: number; vram_free_mb: number; retry_in_secs: number };

// Webhook payload — union of job completion and model lifecycle events
type WebhookPayload = Job | { type: 'model_ready'; loaded_at: string } | { type: 'model_unloaded' };

// Helpers
function isJobPayload(p: WebhookPayload): p is Job {
  return 'id' in p && 'status' in p;
}
function isModelPayload(p: WebhookPayload): p is { type: string } {
  return 'type' in p;
}

10. React Hooks

// useModelStatus.ts
import { useEffect, useState } from 'react';

const BASE = process.env.NEXT_PUBLIC_WHISPER_BASE_URL ?? '';

export function useModelStatus() {
  const [status, setStatus] = useState<ModelStatus | null>(null);

  // Initial fetch
  useEffect(() => {
    fetch(`${BASE}/model/status`)
      .then(r => r.json())
      .then(setStatus)
      .catch(console.error);
  }, []);

  // Live updates via SSE
  useEffect(() => {
    const es = new EventSource(`${BASE}/model/events`);

    const refresh = () => {
      fetch(`${BASE}/model/status`)
        .then(r => r.json())
        .then(setStatus)
        .catch(console.error);
    };

    es.addEventListener('model_loading',        refresh);
    es.addEventListener('model_ready',          refresh);
    es.addEventListener('model_unloaded',       refresh);
    es.addEventListener('model_waiting_for_gpu',refresh);
    es.onerror = () => console.warn('model/events reconnecting…');

    return () => es.close();
  }, []);

  return status;
}
// useJobStream.ts
import { useEffect, useRef, useState } from 'react';

type ProgressState = {
  percent: number;
  chunk: number;
  chunks_total: number;
};

export function useJobStream(jobId: string | null) {
  const [progress, setProgress] = useState<ProgressState | null>(null);
  const [job,      setJob]      = useState<Job | null>(null);
  const [error,    setError]    = useState<string | null>(null);
  const esRef = useRef<EventSource | null>(null);

  useEffect(() => {
    if (!jobId) return;

    esRef.current?.close();
    setProgress(null); setJob(null); setError(null);

    const es = new EventSource(`${BASE}/jobs/${jobId}/stream`);
    esRef.current = es;

    es.addEventListener('progress', (e) => {
      setProgress(JSON.parse(e.data));
    });

    es.addEventListener('done', (e) => {
      setJob(JSON.parse(e.data).job);
      setProgress({ percent: 100, chunk: 0, chunks_total: 0 });
      es.close();
    });

    es.addEventListener('error', (e) => {
      if ('data' in e) setError(JSON.parse((e as MessageEvent).data).message);
      es.close();
    });

    return () => es.close();
  }, [jobId]);

  return { progress, job, error };
}
// useTranscribe.ts — ties it all together
import { useState, useCallback } from 'react';

export function useTranscribe() {
  const [jobId,  setJobId]  = useState<string | null>(null);
  const [loading, setLoading] = useState(false);
  const [error,  setError]  = useState<string | null>(null);

  const submit = useCallback(async (
    audio: Blob,
    opts: { language?: string; task?: Task } = {}
  ) => {
    setLoading(true);
    setError(null);
    setJobId(null);

    try {
      const id = await submitWithRetry(audio, opts); // see §4.3
      setJobId(id);
    } catch (e) {
      setError(String(e));
    } finally {
      setLoading(false);
    }
  }, []);

  const { progress, job, error: streamError } = useJobStream(jobId);

  return { submit, loading, jobId, progress, job, error: error ?? streamError };
}

11. Complete Integration Example

A full transcription flow with model warm-up indicator and real-time progress:

// whisperClient.ts
const BASE = process.env.NEXT_PUBLIC_WHISPER_BASE_URL ?? '';

export class WhisperClient {
  /** Wait for the model to be ready, triggering a load if needed. */
  async ensureModelReady(timeoutMs = 120_000): Promise<void> {
    const status = await this.getModelStatus();
    if (status.state === 'ready') return;

    // Trigger load (idempotent)
    await fetch(`${BASE}/model/load`, { method: 'POST' });

    return new Promise((resolve, reject) => {
      const deadline = setTimeout(() => {
        es.close();
        reject(new Error('Model did not become ready within timeout'));
      }, timeoutMs);

      const es = new EventSource(`${BASE}/model/events`);
      es.addEventListener('model_ready', () => {
        clearTimeout(deadline);
        es.close();
        resolve();
      });
      es.onerror = () => {
        // Reconnects automatically; don't reject on transient drops.
      };
    });
  }

  async getModelStatus(): Promise<ModelStatus> {
    const r = await fetch(`${BASE}/model/status`);
    if (!r.ok) throw new Error(`/model/status ${r.status}`);
    return r.json();
  }

  async submit(
    audio: Blob,
    opts: { language?: string; task?: Task; webhookUrl?: string } = {}
  ): Promise<string> {
    return submitWithRetry(audio, opts);
  }

  streamProgress(
    jobId: string,
    callbacks: {
      onProgress?: (p: { percent: number; chunk: number; total: number }) => void;
      onDone?:     (job: Job) => void;
      onError?:    (msg: string) => void;
    }
  ): () => void {
    const es = new EventSource(`${BASE}/jobs/${jobId}/stream`);

    es.addEventListener('progress', (e) => {
      const d = JSON.parse(e.data);
      callbacks.onProgress?.({ percent: d.percent, chunk: d.chunk, total: d.chunks_total });
    });

    es.addEventListener('done', (e) => {
      callbacks.onDone?.(JSON.parse(e.data).job);
      es.close();
    });

    es.addEventListener('error', (e) => {
      if ('data' in e) callbacks.onError?.(JSON.parse((e as MessageEvent).data).message);
      es.close();
    });

    return () => es.close();
  }

  async transcribe(
    audio: Blob,
    opts: {
      language?: string;
      task?: Task;
      webhookUrl?: string;
      onProgress?: (percent: number) => void;
    } = {}
  ): Promise<Job> {
    const jobId = await this.submit(audio, opts);

    return new Promise((resolve, reject) => {
      this.streamProgress(jobId, {
        onProgress: (p) => opts.onProgress?.(p.percent),
        onDone:     resolve,
        onError:    (msg) => reject(new Error(msg)),
      });
    });
  }
}

// Usage
const whisper = new WhisperClient();

const job = await whisper.transcribe(audioBlob, {
  language: 'en',
  onProgress: (pct) => console.log(`${pct}%`),
});

for (const seg of job.segments) {
  console.log(`[${seg.start.toFixed(1)}s → ${seg.end.toFixed(1)}s] ${seg.text}`);
}

12. Error Reference

All error responses follow this shape:

{ "error": "human-readable message" }

With the following additions for specific errors:

503 model_not_ready:

{ "error": "model_not_ready", "state": "loading", "retry_after_secs": 10 }
HTTP error value When What to do
400 "missing 'audio' field" audio not in form Fix the form
400 "audio field is empty" Zero-byte file uploaded Fix the file
400 "task must be 'transcribe' or 'translate'" Bad task value Fix the value
400 "multipart error: …" Malformed request Check content-type header
404 "job … not found" Unknown job ID Check the ID
409 "job … is already in terminal state …" Cancelling a finished job No action needed
503 "model_not_ready" Model not loaded See §4.2 — retry with Retry-After
500 "worker channel closed" Server crash Contact server admin

Network / SSE errors:

  • EventSource onerror with no .data = connection dropped. The browser reconnects automatically — no action needed unless you want to show a UI indicator.
  • HTTP 502/503/504 from a reverse proxy = the container is restarting. Wait and retry.

Last updated: 2026-05-08. Corresponds to whisper-server v0.1.0 commit d014826.