Comprehensive integration guide for frontend/full-stack developers: - Architecture overview diagram - Quick start (submit + poll in ~20 lines) - Model lifecycle: state machine diagram, all 4 /model/* endpoints, SSE event subscription with JS examples - Job submission: multipart fields, 503 model_not_ready handling, retry-with-auto-load pattern - Job progress: polling vs SSE, all event types with payloads - Webhooks: job completion + model lifecycle, Express receiver example, how to distinguish job vs model payloads - Health check field reference - Cancellation semantics (GPU inference not interruptible) - Full TypeScript type definitions for all API shapes - React hooks: useModelStatus, useJobStream, useTranscribe - Complete WhisperClient class example with ensureModelReady, streamProgress, and end-to-end transcribe() - Error reference table with all 400/404/409/503/500 shapes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
29 KiB
Frontend Integration Guide
Audience: Frontend / full-stack developers integrating the whisper transcription API into a web application.
Base URL:http://your-server:8080(configurable via thePORTenv var on the server).
Interactive docs:http://your-server:8080/docs(Swagger UI — try every endpoint live).
Table of Contents
- Architecture Overview
- Quick Start — submit and poll
- Model Lifecycle
- 3.1 State machine
- 3.2 GET /model/status
- 3.3 POST /model/load
- 3.4 POST /model/unload
- 3.5 GET /model/events (SSE)
- Submitting Jobs
- Tracking Job Progress
- Webhooks
- Health Check
- Cancelling Jobs
- TypeScript Types
- React Hooks
- Complete Integration Example
- Error Reference
1. Architecture Overview
┌─────────────────────────────────────────────────────────┐
│ whisper-server │
│ │
│ HTTP / SSE Worker thread (GPU) │
│ ──────────── ─────────────────── │
│ POST /jobs ───► job queue (FIFO) │
│ GET /jobs/:id ↕ │
│ GET /jobs/:id/stream ◄── progress broadcast │
│ │
│ POST /model/load ─► load whisper into VRAM │
│ POST /model/unload ► free VRAM │
│ GET /model/status read state │
│ GET /model/events ◄── lifecycle SSE broadcast │
└─────────────────────────────────────────────────────────┘
Key behaviours to understand before building:
- The model starts unloaded on every server restart. No inference is possible until it loads (~15–25 seconds for large-v3 on an RTX 2080).
- Submitting a job when the model is not ready returns
503with aRetry-Afterheader and automatically triggers a load. You can retry the submission; no separate load call is needed. - The worker processes jobs sequentially (one at a time). Queue depth is visible via
/health. - Long audio is split into silence-bounded chunks internally. SSE
progressevents reflect chunk completion, not raw GPU progress.
2. Quick Start — submit and poll
The simplest possible integration — no SSE, no model management, just submit and poll:
const BASE = 'http://your-server:8080';
async function transcribe(audioBlob: Blob): Promise<Job> {
// 1. Submit
const form = new FormData();
form.append('audio', audioBlob, 'audio.wav');
let submitResp = await fetch(`${BASE}/jobs`, { method: 'POST', body: form });
// 2. If model isn't loaded yet, keep retrying until it is
while (submitResp.status === 503) {
const retryAfter = parseInt(submitResp.headers.get('Retry-After') ?? '15');
await sleep(retryAfter * 1000);
submitResp = await fetch(`${BASE}/jobs`, { method: 'POST', body: form });
}
if (!submitResp.ok) throw new Error(`Submit failed: ${submitResp.status}`);
const { job_id } = await submitResp.json();
// 3. Poll until done
while (true) {
await sleep(2000);
const job: Job = await fetch(`${BASE}/jobs/${job_id}`).then(r => r.json());
if (job.status === 'done') return job;
if (job.status === 'failed') throw new Error(job.error ?? 'transcription failed');
if (job.status === 'cancelled') throw new Error('job was cancelled');
}
}
const sleep = (ms: number) => new Promise(r => setTimeout(r, ms));
For a better UX — real-time progress bar, model state indicator — read the full sections below.
3. Model Lifecycle
3.1 State machine
The model moves through four states:
job submit
or POST /model/load
│
┌──────────▼───────────┐
│ Unloaded │◄──────────────────────────┐
└──────────┬───────────┘ │
│ load triggered │
┌──────────▼───────────┐ │
│ Loading │ │ idle timeout
└──┬──────────────┬────┘ │ or POST /model/unload
│ success │ VRAM full │
│ │ │
┌──▼────┐ ┌──────▼────────────────┐ │
│ Ready │ │ WaitingForGpu │────────────────►│
└──┬────┘ └──────────────┬────────┘ │
│ retry ok ────┘ │
└────────────────────────────────────────────────►┘
| State | state value |
Can accept jobs? |
|---|---|---|
| Unloaded | "unloaded" |
❌ → triggers load, returns 503 |
| Loading | "loading" |
❌ → returns 503 |
| Waiting for GPU | "waiting_for_gpu" |
❌ → returns 503 |
| Ready | "ready" |
✅ |
3.2 GET /model/status
Returns the current model state and live VRAM figures (from nvidia-smi).
Unloaded:
{ "state": "unloaded" }
Loading:
{ "state": "loading" }
Waiting for GPU (VRAM contention):
{
"state": "waiting_for_gpu",
"vram_needed_mb": 3951,
"vram_free_mb": 512,
"retry_in_secs": 30
}
Ready:
{
"state": "ready",
"loaded_at": "2026-05-10T14:00:00.000Z",
"vram_used_mb": 4096,
"vram_total_mb": 8192
}
vram_used_mb/vram_total_mbare omitted whennvidia-smiis unavailable.
3.3 POST /model/load
Tells the server to load the model. Idempotent — safe to call multiple times.
curl -X POST http://your-server:8080/model/load
Responses:
| Status | Body | Meaning |
|---|---|---|
| 202 | {"status":"load_initiated"} |
Load queued |
| 200 | {"status":"already_ready"} |
Already loaded |
The load happens asynchronously. Subscribe to /model/events or poll /model/status to know when ready.
3.4 POST /model/unload
Immediately frees the model from GPU memory. In-flight jobs finish first; the model is dropped after the current inference completes.
curl -X POST http://your-server:8080/model/unload
Response: 200 {"status":"unload_requested"} (always, regardless of current state).
Use this if you know transcription won't happen for a while and you want to free VRAM for other workloads on the same GPU.
3.5 GET /model/events (SSE)
A persistent Server-Sent Events stream that emits every model lifecycle transition.
curl -N http://your-server:8080/model/events
Events emitted:
event: model_loading
data: {"type":"model_loading"}
event: model_ready
data: {"type":"model_ready","loaded_at":"2026-05-10T14:00:00.000Z"}
event: model_unloaded
data: {"type":"model_unloaded"}
event: model_waiting_for_gpu
data: {"type":"model_waiting_for_gpu","vram_needed_mb":3951,"vram_free_mb":512,"retry_in_secs":30}
JavaScript:
function subscribeModelEvents(
onReady: (loadedAt: string) => void,
onUnloaded: () => void,
onLoading: () => void,
onWaitingGpu: (info: { vram_needed_mb: number; vram_free_mb: number; retry_in_secs: number }) => void,
): () => void {
const es = new EventSource(`${BASE}/model/events`);
es.addEventListener('model_ready', (e) => onReady(JSON.parse(e.data).loaded_at));
es.addEventListener('model_unloaded', () => onUnloaded());
es.addEventListener('model_loading', () => onLoading());
es.addEventListener('model_waiting_for_gpu',(e) => onWaitingGpu(JSON.parse(e.data)));
es.onerror = () => {
// The browser reconnects automatically with exponential backoff.
// Log the error but don't tear down the listener.
console.warn('model/events connection dropped, reconnecting…');
};
return () => es.close(); // call this to clean up (e.g. in React useEffect return)
}
The server sends an SSE keepalive comment every 15 seconds so proxies don't close idle connections.
4. Submitting Jobs
4.1 POST /jobs
Content-Type: multipart/form-data
| Field | Required | Type | Notes |
|---|---|---|---|
audio |
✅ | file | Any format ffmpeg understands: WAV, MP3, M4A, OGG, FLAC, MP4, MKV … No size limit. |
language |
❌ | string | ISO 639-1 code ("en", "it", "fr" …). Omit for auto-detection. |
task |
❌ | string | "transcribe" (default) or "translate" (→ English) |
webhook_url |
❌ | string | URL to POST the completed job to. Also registers the URL for model lifecycle webhooks. |
202 Accepted:
{ "job_id": "550e8400-e29b-41d4-a716-446655440000" }
async function submitJob(
audio: Blob,
opts: { language?: string; task?: 'transcribe' | 'translate'; webhookUrl?: string } = {}
): Promise<string> {
const form = new FormData();
form.append('audio', audio, 'audio.wav');
if (opts.language) form.append('language', opts.language);
if (opts.task) form.append('task', opts.task);
if (opts.webhookUrl) form.append('webhook_url', opts.webhookUrl);
const resp = await fetch(`${BASE}/jobs`, { method: 'POST', body: form });
if (!resp.ok) throw await toApiError(resp);
const { job_id } = await resp.json();
return job_id;
}
4.2 Handling 503 Model Not Ready
When the model isn't loaded, POST /jobs returns:
HTTP/1.1 503 Service Unavailable
Retry-After: 30
Content-Type: application/json
{
"error": "model_not_ready",
"state": "unloaded",
"retry_after_secs": 30
}
retry_after_secs by state:
state |
retry_after_secs |
Why |
|---|---|---|
unloaded |
30 | Load just triggered; RTX 2080 + large-v3 loads in ~15–25s |
loading |
10 | Already loading; check again soon |
waiting_for_gpu |
GPU_POLL_INTERVAL_SECS (default 30) |
VRAM busy; retry later |
Submitting a job when the model is
unloadedautomatically triggers a load. You do NOT need a separatePOST /model/loadcall for the normal happy path.
4.3 Retry pattern with auto-load
async function submitWithRetry(
audio: Blob,
opts: { language?: string; task?: 'transcribe' | 'translate'; webhookUrl?: string } = {},
maxAttempts = 20,
): Promise<string> {
const form = new FormData();
form.append('audio', audio, 'audio.wav');
if (opts.language) form.append('language', opts.language);
if (opts.task) form.append('task', opts.task);
if (opts.webhookUrl) form.append('webhook_url', opts.webhookUrl);
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
const resp = await fetch(`${BASE}/jobs`, { method: 'POST', body: form });
if (resp.status === 202) {
const { job_id } = await resp.json();
return job_id;
}
if (resp.status === 503) {
const body = await resp.json();
const waitMs = (parseInt(resp.headers.get('Retry-After') ?? '15') + 1) * 1000;
console.log(`Model ${body.state} — waiting ${waitMs / 1000}s (attempt ${attempt}/${maxAttempts})`);
await sleep(waitMs);
continue;
}
throw await toApiError(resp);
}
throw new Error(`Model did not become ready after ${maxAttempts} attempts`);
}
Tip: For a better UX, subscribe to
GET /model/eventsand wait for themodel_readyevent instead of sleeping blindly — then submit immediately when ready.
5. Tracking Job Progress
Two patterns: SSE (real-time push) or polling (simpler). SSE is preferred for UX.
5.1 GET /jobs/:id (poll)
Returns the full job document. Poll every 2–5 seconds while status is queued or running.
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "running",
"task": "transcribe",
"language": "en",
"progress": 42,
"duration_secs": 120.5,
"segments": [],
"created_at": "2026-05-10T14:00:00.000Z"
}
When status === "done":
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "done",
"task": "transcribe",
"language": "en",
"progress": 100,
"duration_secs": 120.5,
"segments": [
{ "index": 0, "start": 0.0, "end": 3.5, "text": "Hello, world.", "words": [] },
{ "index": 1, "start": 3.6, "end": 7.2, "text": "How are you?", "words": [] }
],
"created_at": "2026-05-10T14:00:00.000Z",
"completed_at": "2026-05-10T14:02:35.000Z"
}
Terminal statuses: done, failed, cancelled — stop polling when you see one.
5.2 GET /jobs/:id/stream (SSE)
Subscribe immediately after submission. The connection is held open and events are pushed as they occur.
Event types:
event: progress
data: {"type":"progress","percent":42,"chunk":3,"chunks_total":7}
event: done
data: {"type":"done","job":{...full Job object...}}
event: error
data: {"type":"error","message":"whisper inference failed: ..."}
percent— overall job progress 0–100 (derived from chunks completed / total).chunk/chunks_total— the audio is split on silences; each chunk is one whisper inference call.- If you open the stream after the job is already finished, you immediately receive a single
doneevent.
function streamJobProgress(
jobId: string,
onProgress: (percent: number, chunk: number, total: number) => void,
onDone: (job: Job) => void,
onError: (message: string) => void,
): () => void {
const es = new EventSource(`${BASE}/jobs/${jobId}/stream`);
es.addEventListener('progress', (e) => {
const { percent, chunk, chunks_total } = JSON.parse(e.data);
onProgress(percent, chunk, chunks_total);
});
es.addEventListener('done', (e) => {
const { job } = JSON.parse(e.data);
es.close();
onDone(job);
});
es.addEventListener('error', (e) => {
// SSE protocol error vs application error — check if data exists
if ('data' in e) {
const { message } = JSON.parse((e as MessageEvent).data);
onError(message);
}
es.close();
});
return () => es.close();
}
Note: Do not confuse the SSE
errorevent (connection drop — nodata) with the applicationerrorevent (transcription failure — hasdata). The example above handles both.
6. Webhooks
Webhooks are fired as HTTP POST requests with Content-Type: application/json to the webhook_url you supply at job submission. The server retries up to 3 times with exponential backoff (1s, 2s) on non-2xx responses.
6.1 Job completion webhook
Fired when a job reaches done, failed, or cancelled.
Payload: the full Job object (same as GET /jobs/:id).
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "done",
"task": "transcribe",
"language": "en",
"progress": 100,
"duration_secs": 120.5,
"segments": [
{ "index": 0, "start": 0.0, "end": 3.5, "text": "Hello, world.", "words": [] }
],
"created_at": "2026-05-10T14:00:00.000Z",
"completed_at": "2026-05-10T14:02:35.000Z"
}
6.2 Model lifecycle webhooks
Any URL that has ever appeared as a webhook_url in a job submission also receives model lifecycle webhooks for the lifetime of the server process. This lets your backend know when the model comes up or goes down without polling.
Only two events are delivered via webhook (the others are SSE-only):
Model ready:
{ "type": "model_ready", "loaded_at": "2026-05-10T14:00:00.000Z" }
Model unloaded:
{ "type": "model_unloaded" }
Express.js receiver example:
import express from 'express';
const app = express();
app.use(express.json());
app.post('/webhooks/whisper', (req, res) => {
res.sendStatus(200); // acknowledge quickly — retries on non-2xx
const body = req.body;
if ('type' in body) {
// Model lifecycle event
if (body.type === 'model_ready') {
console.log('Whisper model ready at', body.loaded_at);
} else if (body.type === 'model_unloaded') {
console.log('Whisper model freed GPU memory');
}
return;
}
// Job completion event — body is a Job object
if (body.status === 'done') {
console.log(`Job ${body.id} done — ${body.segments.length} segments`);
processTranscript(body.segments);
} else if (body.status === 'failed') {
console.error(`Job ${body.id} failed:`, body.error);
}
});
Distinguish job vs. model webhook: Job payloads have an
idandstatusfield. Model payloads have atypefield at the top level (model_ready/model_unloaded).
7. Health Check
curl http://your-server:8080/health
{
"status": "ok",
"gpu_name": "NVIDIA GeForce RTX 2080",
"vram_total_mb": 8192,
"model": "large-v3",
"queue_depth": 2,
"model_state": "ready"
}
| Field | Notes |
|---|---|
status |
Always "ok" when the server is reachable |
gpu_name |
From nvidia-smi; null if unavailable |
vram_total_mb |
Total VRAM in MiB; null if unavailable |
model |
Model name string (server config) |
queue_depth |
Jobs waiting (not counting the currently running one) |
model_state |
"unloaded" / "loading" / "waiting_for_gpu" / "ready" |
8. Cancelling Jobs
curl -X DELETE http://your-server:8080/jobs/550e8400-e29b-41d4-a716-446655440000
200— job markedcancelled. Returns the updatedJobobject.404— job not found.409— job already in a terminal state (done/failed/cancelled).
Important: whisper.cpp does not support mid-inference cancellation. If the job is currently
running, the GPU inference will finish before the cancellation takes effect — the result is simply discarded and the status set tocancelled.
9. TypeScript Types
type ModelStateTag = 'unloaded' | 'loading' | 'waiting_for_gpu' | 'ready';
type JobStatus = 'queued' | 'running' | 'done' | 'failed' | 'cancelled';
type Task = 'transcribe' | 'translate';
interface ModelStatus {
state: ModelStateTag;
// ready only
loaded_at?: string;
// waiting_for_gpu only
vram_needed_mb?: number;
vram_free_mb?: number;
retry_in_secs?: number;
// always (when nvidia-smi available)
vram_used_mb?: number;
vram_total_mb?: number;
}
interface Word {
text: string;
start: number; // seconds
end: number; // seconds
probability: number; // 0–1
}
interface Segment {
index: number;
start: number; // seconds
end: number; // seconds
text: string;
words: Word[];
}
interface Job {
id: string;
status: JobStatus;
task: Task;
language?: string; // ISO 639-1; null until detected/set
progress: number; // 0–100
duration_secs?: number; // null until processing starts
segments: Segment[]; // populated when status = 'done'
error?: string; // populated when status = 'failed'
webhook_url?: string;
filename?: string;
created_at: string; // ISO 8601
completed_at?: string; // ISO 8601; null until terminal
}
// SSE payloads from GET /jobs/:id/stream
type JobSseEvent =
| { type: 'progress'; percent: number; chunk: number; chunks_total: number }
| { type: 'done'; job: Job }
| { type: 'error'; message: string };
// SSE payloads from GET /model/events
type ModelSseEvent =
| { type: 'model_loading' }
| { type: 'model_ready'; loaded_at: string }
| { type: 'model_unloaded' }
| { type: 'model_waiting_for_gpu'; vram_needed_mb: number; vram_free_mb: number; retry_in_secs: number };
// Webhook payload — union of job completion and model lifecycle events
type WebhookPayload = Job | { type: 'model_ready'; loaded_at: string } | { type: 'model_unloaded' };
// Helpers
function isJobPayload(p: WebhookPayload): p is Job {
return 'id' in p && 'status' in p;
}
function isModelPayload(p: WebhookPayload): p is { type: string } {
return 'type' in p;
}
10. React Hooks
// useModelStatus.ts
import { useEffect, useState } from 'react';
const BASE = process.env.NEXT_PUBLIC_WHISPER_BASE_URL ?? '';
export function useModelStatus() {
const [status, setStatus] = useState<ModelStatus | null>(null);
// Initial fetch
useEffect(() => {
fetch(`${BASE}/model/status`)
.then(r => r.json())
.then(setStatus)
.catch(console.error);
}, []);
// Live updates via SSE
useEffect(() => {
const es = new EventSource(`${BASE}/model/events`);
const refresh = () => {
fetch(`${BASE}/model/status`)
.then(r => r.json())
.then(setStatus)
.catch(console.error);
};
es.addEventListener('model_loading', refresh);
es.addEventListener('model_ready', refresh);
es.addEventListener('model_unloaded', refresh);
es.addEventListener('model_waiting_for_gpu',refresh);
es.onerror = () => console.warn('model/events reconnecting…');
return () => es.close();
}, []);
return status;
}
// useJobStream.ts
import { useEffect, useRef, useState } from 'react';
type ProgressState = {
percent: number;
chunk: number;
chunks_total: number;
};
export function useJobStream(jobId: string | null) {
const [progress, setProgress] = useState<ProgressState | null>(null);
const [job, setJob] = useState<Job | null>(null);
const [error, setError] = useState<string | null>(null);
const esRef = useRef<EventSource | null>(null);
useEffect(() => {
if (!jobId) return;
esRef.current?.close();
setProgress(null); setJob(null); setError(null);
const es = new EventSource(`${BASE}/jobs/${jobId}/stream`);
esRef.current = es;
es.addEventListener('progress', (e) => {
setProgress(JSON.parse(e.data));
});
es.addEventListener('done', (e) => {
setJob(JSON.parse(e.data).job);
setProgress({ percent: 100, chunk: 0, chunks_total: 0 });
es.close();
});
es.addEventListener('error', (e) => {
if ('data' in e) setError(JSON.parse((e as MessageEvent).data).message);
es.close();
});
return () => es.close();
}, [jobId]);
return { progress, job, error };
}
// useTranscribe.ts — ties it all together
import { useState, useCallback } from 'react';
export function useTranscribe() {
const [jobId, setJobId] = useState<string | null>(null);
const [loading, setLoading] = useState(false);
const [error, setError] = useState<string | null>(null);
const submit = useCallback(async (
audio: Blob,
opts: { language?: string; task?: Task } = {}
) => {
setLoading(true);
setError(null);
setJobId(null);
try {
const id = await submitWithRetry(audio, opts); // see §4.3
setJobId(id);
} catch (e) {
setError(String(e));
} finally {
setLoading(false);
}
}, []);
const { progress, job, error: streamError } = useJobStream(jobId);
return { submit, loading, jobId, progress, job, error: error ?? streamError };
}
11. Complete Integration Example
A full transcription flow with model warm-up indicator and real-time progress:
// whisperClient.ts
const BASE = process.env.NEXT_PUBLIC_WHISPER_BASE_URL ?? '';
export class WhisperClient {
/** Wait for the model to be ready, triggering a load if needed. */
async ensureModelReady(timeoutMs = 120_000): Promise<void> {
const status = await this.getModelStatus();
if (status.state === 'ready') return;
// Trigger load (idempotent)
await fetch(`${BASE}/model/load`, { method: 'POST' });
return new Promise((resolve, reject) => {
const deadline = setTimeout(() => {
es.close();
reject(new Error('Model did not become ready within timeout'));
}, timeoutMs);
const es = new EventSource(`${BASE}/model/events`);
es.addEventListener('model_ready', () => {
clearTimeout(deadline);
es.close();
resolve();
});
es.onerror = () => {
// Reconnects automatically; don't reject on transient drops.
};
});
}
async getModelStatus(): Promise<ModelStatus> {
const r = await fetch(`${BASE}/model/status`);
if (!r.ok) throw new Error(`/model/status ${r.status}`);
return r.json();
}
async submit(
audio: Blob,
opts: { language?: string; task?: Task; webhookUrl?: string } = {}
): Promise<string> {
return submitWithRetry(audio, opts);
}
streamProgress(
jobId: string,
callbacks: {
onProgress?: (p: { percent: number; chunk: number; total: number }) => void;
onDone?: (job: Job) => void;
onError?: (msg: string) => void;
}
): () => void {
const es = new EventSource(`${BASE}/jobs/${jobId}/stream`);
es.addEventListener('progress', (e) => {
const d = JSON.parse(e.data);
callbacks.onProgress?.({ percent: d.percent, chunk: d.chunk, total: d.chunks_total });
});
es.addEventListener('done', (e) => {
callbacks.onDone?.(JSON.parse(e.data).job);
es.close();
});
es.addEventListener('error', (e) => {
if ('data' in e) callbacks.onError?.(JSON.parse((e as MessageEvent).data).message);
es.close();
});
return () => es.close();
}
async transcribe(
audio: Blob,
opts: {
language?: string;
task?: Task;
webhookUrl?: string;
onProgress?: (percent: number) => void;
} = {}
): Promise<Job> {
const jobId = await this.submit(audio, opts);
return new Promise((resolve, reject) => {
this.streamProgress(jobId, {
onProgress: (p) => opts.onProgress?.(p.percent),
onDone: resolve,
onError: (msg) => reject(new Error(msg)),
});
});
}
}
// Usage
const whisper = new WhisperClient();
const job = await whisper.transcribe(audioBlob, {
language: 'en',
onProgress: (pct) => console.log(`${pct}%`),
});
for (const seg of job.segments) {
console.log(`[${seg.start.toFixed(1)}s → ${seg.end.toFixed(1)}s] ${seg.text}`);
}
12. Error Reference
All error responses follow this shape:
{ "error": "human-readable message" }
With the following additions for specific errors:
503 model_not_ready:
{ "error": "model_not_ready", "state": "loading", "retry_after_secs": 10 }
| HTTP | error value |
When | What to do |
|---|---|---|---|
| 400 | "missing 'audio' field" |
audio not in form |
Fix the form |
| 400 | "audio field is empty" |
Zero-byte file uploaded | Fix the file |
| 400 | "task must be 'transcribe' or 'translate'" |
Bad task value |
Fix the value |
| 400 | "multipart error: …" |
Malformed request | Check content-type header |
| 404 | "job … not found" |
Unknown job ID | Check the ID |
| 409 | "job … is already in terminal state …" |
Cancelling a finished job | No action needed |
| 503 | "model_not_ready" |
Model not loaded | See §4.2 — retry with Retry-After |
| 500 | "worker channel closed" |
Server crash | Contact server admin |
Network / SSE errors:
EventSourceonerrorwith no.data= connection dropped. The browser reconnects automatically — no action needed unless you want to show a UI indicator.- HTTP 502/503/504 from a reverse proxy = the container is restarting. Wait and retry.
Last updated: 2026-05-08. Corresponds to whisper-server v0.1.0 commit d014826.