tonemark/README.md

# whisper-pwa

A SvelteKit PWA that transcribes YouTube videos and audio/video files using a local [whisper-rtx2080](https://git.sal.giize.com/mozempk/whisper-rtx2080) backend.

## Features

- **Web Share Target** — share YouTube URLs, video or audio files directly from your phone or browser
- **Smart audio preparation** — multi-strategy FFmpeg pipeline (auto/standard/aggressive/none) before submission
- **One job → one webhook** — whisper-rtx2080 handles internal chunking; we receive a single webhook when done
- **Live progress** — SSE stream showing chunk N of M + percentage
- **Post-processing** — collapse repeats, n-gram deduplication to clean up hallucinations
- **4 output formats** — SRT, plain TXT, Markdown (with timestamps), JSON
- **Web Push notifications** — get notified when the transcript is ready (works on mobile too)

## Requirements

- Node.js 20+
- FFmpeg in `$PATH`
- yt-dlp in `$PATH` (for YouTube URLs)
- Docker + whisper-rtx2080 running (or reachable at `WHISPER_URL`)

## Setup

### 1. Install dependencies

```bash
npm install
```

### 2. Generate VAPID keys (one-time)

```bash
npx web-push generate-vapid-keys
```

### 3. Create `.env`

```bash
cp .env.example .env
# Edit .env and fill in all values
```

Required env vars:

| Variable | Example | Description |
|---|---|---|
| `WHISPER_URL` | `http://localhost:8080` | whisper-rtx2080 base URL |
| `WEBHOOK_BASE_URL` | `http://192.168.1.x:3000` | Reachable from inside Docker |
| `OUTPUT_DIR` | `/home/user/transcripts` | Where to write output files |
| `VAPID_PUBLIC_KEY` | `BNxx...` | From `npx web-push generate-vapid-keys` |
| `VAPID_PRIVATE_KEY` | `xxxx` | From `npx web-push generate-vapid-keys` |
| `VAPID_SUBJECT` | `mailto:you@example.com` | Contact for push service |
| `DATA_DIR` | `/home/user/.whisper-pwa` | SQLite DB + tmp audio (default: `~/.whisper-pwa`) |

> **Important**: `WEBHOOK_BASE_URL` must be the IP/hostname reachable from inside the whisper-rtx2080 Docker container (not `localhost`).

### 4. Build and run

```bash
npm run build
npm start
```

Visit `http://localhost:3000`.

### 5. For development

```bash
npm run dev
```

## Audio preparation modes

| Mode | Description |
|---|---|
| `auto` (default) | volumedetect → boost quiet audio + denoise + EBU R128 loudnorm |
| `standard` | Highpass 80Hz + lowpass 8kHz + EBU R128 loudnorm |
| `aggressive` | Standard + FFT denoiser (`afftdn`) + noise gate (`agate`) |
| `none` | Convert to 16kHz mono WAV only |

All modes trim leading silence to prevent Whisper hallucinations at file start.

## API

| Endpoint | Method | Description |
|---|---|---|
| `/api/jobs` | POST | Create job (`{ source, title, audioMode }`) |
| `/api/jobs` | GET | List recent jobs |
| `/api/jobs/[id]` | GET | Poll job status |
| `/api/jobs/[id]` | DELETE | Cancel job |
| `/api/jobs/[id]/stream` | GET (SSE) | Live progress stream |
| `/api/jobs/[id]/download/[format]` | GET | Download SRT/TXT/MD/JSON |
| `/api/jobs/[id]/reprocess` | POST | Re-run post-processing on stored segments |
| `/api/webhook/[jobId]` | POST | Whisper completion webhook (called by whisper-rtx2080) |
| `/api/push` | GET | Get VAPID public key |
| `/api/push` | POST | Register push subscription |
| `/share` | POST | Web Share Target entry point |

## whisper-rtx2080 internals

The backend handles all audio chunking internally (60s chunks with 30s snap window, silence-detected at −35dB). We submit one WAV file per job and receive one webhook when all chunks are transcribed.

SSE progress events from the backend include `{ percent, chunk, total }` relayed live to the browser.