# Frontend Integration Guide

> **Audience:** Frontend / full-stack developers integrating the whisper transcription API into a web application.  
> **Base URL:** `http://your-server:8080` (configurable via the `PORT` env var on the server).  
> **Interactive docs:** `http://your-server:8080/docs` (Swagger UI — try every endpoint live).

---

## Table of Contents

1. [Architecture Overview](#1-architecture-overview)
2. [Quick Start — submit and poll](#2-quick-start--submit-and-poll)
3. [Model Lifecycle](#3-model-lifecycle)
   - 3.1 [State machine](#31-state-machine)
   - 3.2 [GET /model/status](#32-get-modelstatus)
   - 3.3 [POST /model/load](#33-post-modelload)
   - 3.4 [POST /model/unload](#34-post-modelunload)
   - 3.5 [GET /model/events (SSE)](#35-get-modelevents-sse)
4. [Submitting Jobs](#4-submitting-jobs)
   - 4.1 [POST /jobs](#41-post-jobs)
   - 4.2 [Handling 503 Model Not Ready](#42-handling-503-model-not-ready)
   - 4.3 [Retry pattern with auto-load](#43-retry-pattern-with-auto-load)
5. [Tracking Job Progress](#5-tracking-job-progress)
   - 5.1 [GET /jobs/:id (poll)](#51-get-jobsid-poll)
   - 5.2 [GET /jobs/:id/stream (SSE)](#52-get-jobsidstream-sse)
6. [Webhooks](#6-webhooks)
   - 6.1 [Job completion webhook](#61-job-completion-webhook)
   - 6.2 [Model lifecycle webhooks](#62-model-lifecycle-webhooks)
7. [Health Check](#7-health-check)
8. [Cancelling Jobs](#8-cancelling-jobs)
9. [TypeScript Types](#9-typescript-types)
10. [React Hooks](#10-react-hooks)
11. [Complete Integration Example](#11-complete-integration-example)
12. [Error Reference](#12-error-reference)

---

## 1. Architecture Overview

```
┌─────────────────────────────────────────────────────────┐
│  whisper-server                                         │
│                                                         │
│  HTTP / SSE          Worker thread (GPU)                │
│  ────────────        ───────────────────                │
│  POST /jobs    ───►  job queue (FIFO)                   │
│  GET  /jobs/:id      ↕                                  │
│  GET  /jobs/:id/stream ◄── progress broadcast           │
│                                                         │
│  POST /model/load  ─►  load whisper into VRAM           │
│  POST /model/unload ►  free VRAM                        │
│  GET  /model/status    read state                       │
│  GET  /model/events ◄── lifecycle SSE broadcast         │
└─────────────────────────────────────────────────────────┘
```

**Key behaviours to understand before building:**

- The model starts **unloaded** on every server restart. No inference is possible until it loads (~15–25 seconds for large-v3 on an RTX 2080).
- Submitting a job when the model is not ready returns `503` with a `Retry-After` header **and automatically triggers a load**. You can retry the submission; no separate load call is needed.
- The worker processes jobs **sequentially** (one at a time). Queue depth is visible via `/health`.
- Long audio is split into silence-bounded chunks internally. SSE `progress` events reflect chunk completion, not raw GPU progress.

---

## 2. Quick Start — submit and poll

The simplest possible integration — no SSE, no model management, just submit and poll:

```typescript
const BASE = 'http://your-server:8080';

async function transcribe(audioBlob: Blob): Promise<Job> {
  // 1. Submit
  const form = new FormData();
  form.append('audio', audioBlob, 'audio.wav');

  let submitResp = await fetch(`${BASE}/jobs`, { method: 'POST', body: form });

  // 2. If model isn't loaded yet, keep retrying until it is
  while (submitResp.status === 503) {
    const retryAfter = parseInt(submitResp.headers.get('Retry-After') ?? '15');
    await sleep(retryAfter * 1000);
    submitResp = await fetch(`${BASE}/jobs`, { method: 'POST', body: form });
  }
  if (!submitResp.ok) throw new Error(`Submit failed: ${submitResp.status}`);

  const { job_id } = await submitResp.json();

  // 3. Poll until done
  while (true) {
    await sleep(2000);
    const job: Job = await fetch(`${BASE}/jobs/${job_id}`).then(r => r.json());
    if (job.status === 'done')      return job;
    if (job.status === 'failed')    throw new Error(job.error ?? 'transcription failed');
    if (job.status === 'cancelled') throw new Error('job was cancelled');
  }
}

const sleep = (ms: number) => new Promise(r => setTimeout(r, ms));
```

> For a better UX — real-time progress bar, model state indicator — read the full sections below.

---

## 3. Model Lifecycle

### 3.1 State machine

The model moves through four states:

```
          job submit
         or POST /model/load
               │
   ┌──────────▼───────────┐
   │       Unloaded        │◄──────────────────────────┐
   └──────────┬───────────┘                            │
              │ load triggered                         │
   ┌──────────▼───────────┐                            │
   │        Loading        │                            │ idle timeout
   └──┬──────────────┬────┘                            │ or POST /model/unload
      │ success      │ VRAM full                       │
      │              │                                  │
   ┌──▼────┐  ┌──────▼────────────────┐                │
   │ Ready │  │   WaitingForGpu       │────────────────►│
   └──┬────┘  └──────────────┬────────┘                │
      │         retry ok ────┘                         │
      └────────────────────────────────────────────────►┘
```

| State | `state` value | Can accept jobs? |
|-------|--------------|-----------------|
| Unloaded | `"unloaded"` | ❌ → triggers load, returns 503 |
| Loading | `"loading"` | ❌ → returns 503 |
| Waiting for GPU | `"waiting_for_gpu"` | ❌ → returns 503 |
| Ready | `"ready"` | ✅ |

---

### 3.2 `GET /model/status`

Returns the current model state and live VRAM figures (from `nvidia-smi`).

**Unloaded:**
```json
{ "state": "unloaded" }
```

**Loading:**
```json
{ "state": "loading" }
```

**Waiting for GPU (VRAM contention):**
```json
{
  "state": "waiting_for_gpu",
  "vram_needed_mb": 3951,
  "vram_free_mb": 512,
  "retry_in_secs": 30
}
```

**Ready:**
```json
{
  "state": "ready",
  "loaded_at": "2026-05-10T14:00:00.000Z",
  "vram_used_mb": 4096,
  "vram_total_mb": 8192
}
```

> `vram_used_mb` / `vram_total_mb` are omitted when `nvidia-smi` is unavailable.

---

### 3.3 `POST /model/load`

Tells the server to load the model. **Idempotent** — safe to call multiple times.

```bash
curl -X POST http://your-server:8080/model/load
```

**Responses:**

| Status | Body | Meaning |
|--------|------|---------|
| 202 | `{"status":"load_initiated"}` | Load queued |
| 200 | `{"status":"already_ready"}` | Already loaded |

The load happens asynchronously. Subscribe to `/model/events` or poll `/model/status` to know when ready.

---

### 3.4 `POST /model/unload`

Immediately frees the model from GPU memory. In-flight jobs finish first; the model is dropped after the current inference completes.

```bash
curl -X POST http://your-server:8080/model/unload
```

**Response:** `200 {"status":"unload_requested"}` (always, regardless of current state).

> Use this if you know transcription won't happen for a while and you want to free VRAM for other workloads on the same GPU.

---

### 3.5 `GET /model/events` (SSE)

A persistent Server-Sent Events stream that emits every model lifecycle transition.

```bash
curl -N http://your-server:8080/model/events
```

**Events emitted:**

```
event: model_loading
data: {"type":"model_loading"}

event: model_ready
data: {"type":"model_ready","loaded_at":"2026-05-10T14:00:00.000Z"}

event: model_unloaded
data: {"type":"model_unloaded"}

event: model_waiting_for_gpu
data: {"type":"model_waiting_for_gpu","vram_needed_mb":3951,"vram_free_mb":512,"retry_in_secs":30}
```

**JavaScript:**
```typescript
function subscribeModelEvents(
  onReady:       (loadedAt: string) => void,
  onUnloaded:    () => void,
  onLoading:     () => void,
  onWaitingGpu:  (info: { vram_needed_mb: number; vram_free_mb: number; retry_in_secs: number }) => void,
): () => void {
  const es = new EventSource(`${BASE}/model/events`);

  es.addEventListener('model_ready',          (e) => onReady(JSON.parse(e.data).loaded_at));
  es.addEventListener('model_unloaded',       ()  => onUnloaded());
  es.addEventListener('model_loading',        ()  => onLoading());
  es.addEventListener('model_waiting_for_gpu',(e) => onWaitingGpu(JSON.parse(e.data)));

  es.onerror = () => {
    // The browser reconnects automatically with exponential backoff.
    // Log the error but don't tear down the listener.
    console.warn('model/events connection dropped, reconnecting…');
  };

  return () => es.close(); // call this to clean up (e.g. in React useEffect return)
}
```

> The server sends an SSE keepalive comment every 15 seconds so proxies don't close idle connections.

---

## 4. Submitting Jobs

### 4.1 `POST /jobs`

**Content-Type:** `multipart/form-data`

| Field | Required | Type | Notes |
|-------|----------|------|-------|
| `audio` | ✅ | file | Any format ffmpeg understands: WAV, MP3, M4A, OGG, FLAC, MP4, MKV … No size limit. |
| `language` | ❌ | string | ISO 639-1 code (`"en"`, `"it"`, `"fr"` …). Omit for auto-detection. |
| `task` | ❌ | string | `"transcribe"` (default) or `"translate"` (→ English) |
| `webhook_url` | ❌ | string | URL to POST the completed job to. Also registers the URL for model lifecycle webhooks. |

**202 Accepted:**
```json
{ "job_id": "550e8400-e29b-41d4-a716-446655440000" }
```

```typescript
async function submitJob(
  audio: Blob,
  opts: { language?: string; task?: 'transcribe' | 'translate'; webhookUrl?: string } = {}
): Promise<string> {
  const form = new FormData();
  form.append('audio', audio, 'audio.wav');
  if (opts.language)   form.append('language', opts.language);
  if (opts.task)       form.append('task', opts.task);
  if (opts.webhookUrl) form.append('webhook_url', opts.webhookUrl);

  const resp = await fetch(`${BASE}/jobs`, { method: 'POST', body: form });
  if (!resp.ok) throw await toApiError(resp);

  const { job_id } = await resp.json();
  return job_id;
}
```

---

### 4.2 Handling 503 Model Not Ready

When the model isn't loaded, `POST /jobs` returns:

```
HTTP/1.1 503 Service Unavailable
Retry-After: 30
Content-Type: application/json
```
```json
{
  "error": "model_not_ready",
  "state": "unloaded",
  "retry_after_secs": 30
}
```

**`retry_after_secs` by state:**

| `state` | `retry_after_secs` | Why |
|---------|-------------------|-----|
| `unloaded` | 30 | Load just triggered; RTX 2080 + large-v3 loads in ~15–25s |
| `loading` | 10 | Already loading; check again soon |
| `waiting_for_gpu` | `GPU_POLL_INTERVAL_SECS` (default 30) | VRAM busy; retry later |

> **Submitting a job when the model is `unloaded` automatically triggers a load.** You do NOT need a separate `POST /model/load` call for the normal happy path.

---

### 4.3 Retry pattern with auto-load

```typescript
async function submitWithRetry(
  audio: Blob,
  opts: { language?: string; task?: 'transcribe' | 'translate'; webhookUrl?: string } = {},
  maxAttempts = 20,
): Promise<string> {
  const form = new FormData();
  form.append('audio', audio, 'audio.wav');
  if (opts.language)   form.append('language', opts.language);
  if (opts.task)       form.append('task', opts.task);
  if (opts.webhookUrl) form.append('webhook_url', opts.webhookUrl);

  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    const resp = await fetch(`${BASE}/jobs`, { method: 'POST', body: form });

    if (resp.status === 202) {
      const { job_id } = await resp.json();
      return job_id;
    }

    if (resp.status === 503) {
      const body = await resp.json();
      const waitMs = (parseInt(resp.headers.get('Retry-After') ?? '15') + 1) * 1000;
      console.log(`Model ${body.state} — waiting ${waitMs / 1000}s (attempt ${attempt}/${maxAttempts})`);
      await sleep(waitMs);
      continue;
    }

    throw await toApiError(resp);
  }

  throw new Error(`Model did not become ready after ${maxAttempts} attempts`);
}
```

> **Tip:** For a better UX, subscribe to `GET /model/events` and wait for the `model_ready` event instead of sleeping blindly — then submit immediately when ready.

---

## 5. Tracking Job Progress

Two patterns: **SSE** (real-time push) or **polling** (simpler). SSE is preferred for UX.

### 5.1 `GET /jobs/:id` (poll)

Returns the full job document. Poll every 2–5 seconds while `status` is `queued` or `running`.

```json
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "running",
  "task": "transcribe",
  "language": "en",
  "progress": 42,
  "duration_secs": 120.5,
  "segments": [],
  "created_at": "2026-05-10T14:00:00.000Z"
}
```

When `status === "done"`:
```json
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "done",
  "task": "transcribe",
  "language": "en",
  "progress": 100,
  "duration_secs": 120.5,
  "segments": [
    { "index": 0, "start": 0.0, "end": 3.5, "text": "Hello, world.", "words": [] },
    { "index": 1, "start": 3.6, "end": 7.2, "text": "How are you?", "words": [] }
  ],
  "created_at": "2026-05-10T14:00:00.000Z",
  "completed_at": "2026-05-10T14:02:35.000Z"
}
```

**Terminal statuses:** `done`, `failed`, `cancelled` — stop polling when you see one.

---

### 5.2 `GET /jobs/:id/stream` (SSE)

Subscribe immediately after submission. The connection is held open and events are pushed as they occur.

**Event types:**

```
event: progress
data: {"type":"progress","percent":42,"chunk":3,"chunks_total":7}

event: done
data: {"type":"done","job":{...full Job object...}}

event: error
data: {"type":"error","message":"whisper inference failed: ..."}
```

- `percent` — overall job progress 0–100 (derived from chunks completed / total).
- `chunk` / `chunks_total` — the audio is split on silences; each chunk is one whisper inference call.
- If you open the stream after the job is already finished, you immediately receive a single `done` event.

```typescript
function streamJobProgress(
  jobId: string,
  onProgress: (percent: number, chunk: number, total: number) => void,
  onDone:     (job: Job) => void,
  onError:    (message: string) => void,
): () => void {
  const es = new EventSource(`${BASE}/jobs/${jobId}/stream`);

  es.addEventListener('progress', (e) => {
    const { percent, chunk, chunks_total } = JSON.parse(e.data);
    onProgress(percent, chunk, chunks_total);
  });

  es.addEventListener('done', (e) => {
    const { job } = JSON.parse(e.data);
    es.close();
    onDone(job);
  });

  es.addEventListener('error', (e) => {
    // SSE protocol error vs application error — check if data exists
    if ('data' in e) {
      const { message } = JSON.parse((e as MessageEvent).data);
      onError(message);
    }
    es.close();
  });

  return () => es.close();
}
```

> **Note:** Do not confuse the SSE `error` event (connection drop — no `data`) with the application `error` event (transcription failure — has `data`). The example above handles both.

---

## 6. Webhooks

Webhooks are fired as HTTP `POST` requests with `Content-Type: application/json` to the `webhook_url` you supply at job submission. The server retries up to 3 times with exponential backoff (1s, 2s) on non-2xx responses.

### 6.1 Job completion webhook

Fired when a job reaches `done`, `failed`, or `cancelled`.  
**Payload:** the full `Job` object (same as `GET /jobs/:id`).

```json
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "done",
  "task": "transcribe",
  "language": "en",
  "progress": 100,
  "duration_secs": 120.5,
  "segments": [
    { "index": 0, "start": 0.0, "end": 3.5, "text": "Hello, world.", "words": [] }
  ],
  "created_at": "2026-05-10T14:00:00.000Z",
  "completed_at": "2026-05-10T14:02:35.000Z"
}
```

### 6.2 Model lifecycle webhooks

**Any URL that has ever appeared as a `webhook_url` in a job submission** also receives model lifecycle webhooks for the lifetime of the server process. This lets your backend know when the model comes up or goes down without polling.

Only two events are delivered via webhook (the others are SSE-only):

**Model ready:**
```json
{ "type": "model_ready", "loaded_at": "2026-05-10T14:00:00.000Z" }
```

**Model unloaded:**
```json
{ "type": "model_unloaded" }
```

**Express.js receiver example:**
```typescript
import express from 'express';
const app = express();
app.use(express.json());

app.post('/webhooks/whisper', (req, res) => {
  res.sendStatus(200); // acknowledge quickly — retries on non-2xx

  const body = req.body;

  if ('type' in body) {
    // Model lifecycle event
    if (body.type === 'model_ready') {
      console.log('Whisper model ready at', body.loaded_at);
    } else if (body.type === 'model_unloaded') {
      console.log('Whisper model freed GPU memory');
    }
    return;
  }

  // Job completion event — body is a Job object
  if (body.status === 'done') {
    console.log(`Job ${body.id} done — ${body.segments.length} segments`);
    processTranscript(body.segments);
  } else if (body.status === 'failed') {
    console.error(`Job ${body.id} failed:`, body.error);
  }
});
```

> **Distinguish job vs. model webhook:** Job payloads have an `id` and `status` field. Model payloads have a `type` field at the top level (`model_ready` / `model_unloaded`).

---

## 7. Health Check

```bash
curl http://your-server:8080/health
```

```json
{
  "status": "ok",
  "gpu_name": "NVIDIA GeForce RTX 2080",
  "vram_total_mb": 8192,
  "model": "large-v3",
  "queue_depth": 2,
  "model_state": "ready"
}
```

| Field | Notes |
|-------|-------|
| `status` | Always `"ok"` when the server is reachable |
| `gpu_name` | From `nvidia-smi`; `null` if unavailable |
| `vram_total_mb` | Total VRAM in MiB; `null` if unavailable |
| `model` | Model name string (server config) |
| `queue_depth` | Jobs waiting (not counting the currently running one) |
| `model_state` | `"unloaded"` / `"loading"` / `"waiting_for_gpu"` / `"ready"` |

---

## 8. Cancelling Jobs

```bash
curl -X DELETE http://your-server:8080/jobs/550e8400-e29b-41d4-a716-446655440000
```

- `200` — job marked `cancelled`. Returns the updated `Job` object.
- `404` — job not found.
- `409` — job already in a terminal state (`done` / `failed` / `cancelled`).

> **Important:** whisper.cpp does not support mid-inference cancellation. If the job is currently `running`, the GPU inference will finish before the cancellation takes effect — the result is simply discarded and the status set to `cancelled`.

---

## 9. TypeScript Types

```typescript
type ModelStateTag = 'unloaded' | 'loading' | 'waiting_for_gpu' | 'ready';
type JobStatus     = 'queued' | 'running' | 'done' | 'failed' | 'cancelled';
type Task          = 'transcribe' | 'translate';

interface ModelStatus {
  state: ModelStateTag;
  // ready only
  loaded_at?: string;
  // waiting_for_gpu only
  vram_needed_mb?: number;
  vram_free_mb?:   number;
  retry_in_secs?:  number;
  // always (when nvidia-smi available)
  vram_used_mb?:   number;
  vram_total_mb?:  number;
}

interface Word {
  text:        string;
  start:       number; // seconds
  end:         number; // seconds
  probability: number; // 0–1
}

interface Segment {
  index: number;
  start: number; // seconds
  end:   number; // seconds
  text:  string;
  words: Word[];
}

interface Job {
  id:            string;
  status:        JobStatus;
  task:          Task;
  language?:     string;     // ISO 639-1; null until detected/set
  progress:      number;     // 0–100
  duration_secs?: number;   // null until processing starts
  segments:      Segment[];  // populated when status = 'done'
  error?:        string;     // populated when status = 'failed'
  webhook_url?:  string;
  filename?:     string;
  created_at:    string;     // ISO 8601
  completed_at?: string;     // ISO 8601; null until terminal
}

// SSE payloads from GET /jobs/:id/stream
type JobSseEvent =
  | { type: 'progress'; percent: number; chunk: number; chunks_total: number }
  | { type: 'done';     job: Job }
  | { type: 'error';    message: string };

// SSE payloads from GET /model/events
type ModelSseEvent =
  | { type: 'model_loading' }
  | { type: 'model_ready';           loaded_at: string }
  | { type: 'model_unloaded' }
  | { type: 'model_waiting_for_gpu'; vram_needed_mb: number; vram_free_mb: number; retry_in_secs: number };

// Webhook payload — union of job completion and model lifecycle events
type WebhookPayload = Job | { type: 'model_ready'; loaded_at: string } | { type: 'model_unloaded' };

// Helpers
function isJobPayload(p: WebhookPayload): p is Job {
  return 'id' in p && 'status' in p;
}
function isModelPayload(p: WebhookPayload): p is { type: string } {
  return 'type' in p;
}
```

---

## 10. React Hooks

```typescript
// useModelStatus.ts
import { useEffect, useState } from 'react';

const BASE = process.env.NEXT_PUBLIC_WHISPER_BASE_URL ?? '';

export function useModelStatus() {
  const [status, setStatus] = useState<ModelStatus | null>(null);

  // Initial fetch
  useEffect(() => {
    fetch(`${BASE}/model/status`)
      .then(r => r.json())
      .then(setStatus)
      .catch(console.error);
  }, []);

  // Live updates via SSE
  useEffect(() => {
    const es = new EventSource(`${BASE}/model/events`);

    const refresh = () => {
      fetch(`${BASE}/model/status`)
        .then(r => r.json())
        .then(setStatus)
        .catch(console.error);
    };

    es.addEventListener('model_loading',        refresh);
    es.addEventListener('model_ready',          refresh);
    es.addEventListener('model_unloaded',       refresh);
    es.addEventListener('model_waiting_for_gpu',refresh);
    es.onerror = () => console.warn('model/events reconnecting…');

    return () => es.close();
  }, []);

  return status;
}
```

```typescript
// useJobStream.ts
import { useEffect, useRef, useState } from 'react';

type ProgressState = {
  percent: number;
  chunk: number;
  chunks_total: number;
};

export function useJobStream(jobId: string | null) {
  const [progress, setProgress] = useState<ProgressState | null>(null);
  const [job,      setJob]      = useState<Job | null>(null);
  const [error,    setError]    = useState<string | null>(null);
  const esRef = useRef<EventSource | null>(null);

  useEffect(() => {
    if (!jobId) return;

    esRef.current?.close();
    setProgress(null); setJob(null); setError(null);

    const es = new EventSource(`${BASE}/jobs/${jobId}/stream`);
    esRef.current = es;

    es.addEventListener('progress', (e) => {
      setProgress(JSON.parse(e.data));
    });

    es.addEventListener('done', (e) => {
      setJob(JSON.parse(e.data).job);
      setProgress({ percent: 100, chunk: 0, chunks_total: 0 });
      es.close();
    });

    es.addEventListener('error', (e) => {
      if ('data' in e) setError(JSON.parse((e as MessageEvent).data).message);
      es.close();
    });

    return () => es.close();
  }, [jobId]);

  return { progress, job, error };
}
```

```typescript
// useTranscribe.ts — ties it all together
import { useState, useCallback } from 'react';

export function useTranscribe() {
  const [jobId,  setJobId]  = useState<string | null>(null);
  const [loading, setLoading] = useState(false);
  const [error,  setError]  = useState<string | null>(null);

  const submit = useCallback(async (
    audio: Blob,
    opts: { language?: string; task?: Task } = {}
  ) => {
    setLoading(true);
    setError(null);
    setJobId(null);

    try {
      const id = await submitWithRetry(audio, opts); // see §4.3
      setJobId(id);
    } catch (e) {
      setError(String(e));
    } finally {
      setLoading(false);
    }
  }, []);

  const { progress, job, error: streamError } = useJobStream(jobId);

  return { submit, loading, jobId, progress, job, error: error ?? streamError };
}
```

---

## 11. Complete Integration Example

A full transcription flow with model warm-up indicator and real-time progress:

```typescript
// whisperClient.ts
const BASE = process.env.NEXT_PUBLIC_WHISPER_BASE_URL ?? '';

export class WhisperClient {
  /** Wait for the model to be ready, triggering a load if needed. */
  async ensureModelReady(timeoutMs = 120_000): Promise<void> {
    const status = await this.getModelStatus();
    if (status.state === 'ready') return;

    // Trigger load (idempotent)
    await fetch(`${BASE}/model/load`, { method: 'POST' });

    return new Promise((resolve, reject) => {
      const deadline = setTimeout(() => {
        es.close();
        reject(new Error('Model did not become ready within timeout'));
      }, timeoutMs);

      const es = new EventSource(`${BASE}/model/events`);
      es.addEventListener('model_ready', () => {
        clearTimeout(deadline);
        es.close();
        resolve();
      });
      es.onerror = () => {
        // Reconnects automatically; don't reject on transient drops.
      };
    });
  }

  async getModelStatus(): Promise<ModelStatus> {
    const r = await fetch(`${BASE}/model/status`);
    if (!r.ok) throw new Error(`/model/status ${r.status}`);
    return r.json();
  }

  async submit(
    audio: Blob,
    opts: { language?: string; task?: Task; webhookUrl?: string } = {}
  ): Promise<string> {
    return submitWithRetry(audio, opts);
  }

  streamProgress(
    jobId: string,
    callbacks: {
      onProgress?: (p: { percent: number; chunk: number; total: number }) => void;
      onDone?:     (job: Job) => void;
      onError?:    (msg: string) => void;
    }
  ): () => void {
    const es = new EventSource(`${BASE}/jobs/${jobId}/stream`);

    es.addEventListener('progress', (e) => {
      const d = JSON.parse(e.data);
      callbacks.onProgress?.({ percent: d.percent, chunk: d.chunk, total: d.chunks_total });
    });

    es.addEventListener('done', (e) => {
      callbacks.onDone?.(JSON.parse(e.data).job);
      es.close();
    });

    es.addEventListener('error', (e) => {
      if ('data' in e) callbacks.onError?.(JSON.parse((e as MessageEvent).data).message);
      es.close();
    });

    return () => es.close();
  }

  async transcribe(
    audio: Blob,
    opts: {
      language?: string;
      task?: Task;
      webhookUrl?: string;
      onProgress?: (percent: number) => void;
    } = {}
  ): Promise<Job> {
    const jobId = await this.submit(audio, opts);

    return new Promise((resolve, reject) => {
      this.streamProgress(jobId, {
        onProgress: (p) => opts.onProgress?.(p.percent),
        onDone:     resolve,
        onError:    (msg) => reject(new Error(msg)),
      });
    });
  }
}

// Usage
const whisper = new WhisperClient();

const job = await whisper.transcribe(audioBlob, {
  language: 'en',
  onProgress: (pct) => console.log(`${pct}%`),
});

for (const seg of job.segments) {
  console.log(`[${seg.start.toFixed(1)}s → ${seg.end.toFixed(1)}s] ${seg.text}`);
}
```

---

## 12. Error Reference

All error responses follow this shape:

```json
{ "error": "human-readable message" }
```

With the following additions for specific errors:

**503 model_not_ready:**
```json
{ "error": "model_not_ready", "state": "loading", "retry_after_secs": 10 }
```

| HTTP | `error` value | When | What to do |
|------|--------------|------|-----------|
| 400 | `"missing 'audio' field"` | `audio` not in form | Fix the form |
| 400 | `"audio field is empty"` | Zero-byte file uploaded | Fix the file |
| 400 | `"task must be 'transcribe' or 'translate'"` | Bad `task` value | Fix the value |
| 400 | `"multipart error: …"` | Malformed request | Check content-type header |
| 404 | `"job … not found"` | Unknown job ID | Check the ID |
| 409 | `"job … is already in terminal state …"` | Cancelling a finished job | No action needed |
| 503 | `"model_not_ready"` | Model not loaded | See §4.2 — retry with `Retry-After` |
| 500 | `"worker channel closed"` | Server crash | Contact server admin |

**Network / SSE errors:**

- `EventSource` `onerror` with no `.data` = connection dropped. The browser reconnects automatically — no action needed unless you want to show a UI indicator.
- HTTP 502/503/504 from a reverse proxy = the container is restarting. Wait and retry.

---

*Last updated: 2026-05-08. Corresponds to whisper-server v0.1.0 commit `d014826`.*