# whisper-pwa A SvelteKit PWA that transcribes YouTube videos and audio/video files using a local [whisper-rtx2080](https://git.sal.giize.com/mozempk/whisper-rtx2080) backend. ## Features - **Web Share Target** — share YouTube URLs, video or audio files directly from your phone or browser - **Smart audio preparation** — multi-strategy FFmpeg pipeline (auto/standard/aggressive/none) before submission - **One job → one webhook** — whisper-rtx2080 handles internal chunking; we receive a single webhook when done - **Live progress** — SSE stream showing chunk N of M + percentage - **Post-processing** — collapse repeats, n-gram deduplication to clean up hallucinations - **4 output formats** — SRT, plain TXT, Markdown (with timestamps), JSON - **Web Push notifications** — get notified when the transcript is ready (works on mobile too) ## Requirements - Node.js 20+ - FFmpeg in `$PATH` - yt-dlp in `$PATH` (for YouTube URLs) - Docker + whisper-rtx2080 running (or reachable at `WHISPER_URL`) ## Setup ### 1. Install dependencies ```bash npm install ``` ### 2. Generate VAPID keys (one-time) ```bash npx web-push generate-vapid-keys ``` ### 3. Create `.env` ```bash cp .env.example .env # Edit .env and fill in all values ``` Required env vars: | Variable | Example | Description | |---|---|---| | `WHISPER_URL` | `http://localhost:8080` | whisper-rtx2080 base URL | | `WEBHOOK_BASE_URL` | `http://192.168.1.x:3000` | Reachable from inside Docker | | `OUTPUT_DIR` | `/home/user/transcripts` | Where to write output files | | `VAPID_PUBLIC_KEY` | `BNxx...` | From `npx web-push generate-vapid-keys` | | `VAPID_PRIVATE_KEY` | `xxxx` | From `npx web-push generate-vapid-keys` | | `VAPID_SUBJECT` | `mailto:you@example.com` | Contact for push service | | `DATA_DIR` | `/home/user/.whisper-pwa` | SQLite DB + tmp audio (default: `~/.whisper-pwa`) | > **Important**: `WEBHOOK_BASE_URL` must be the IP/hostname reachable from inside the whisper-rtx2080 Docker container (not `localhost`). ### 4. Build and run ```bash npm run build npm start ``` Visit `http://localhost:3000`. ### 5. For development ```bash npm run dev ``` ## Audio preparation modes | Mode | Description | |---|---| | `auto` (default) | volumedetect → boost quiet audio + denoise + EBU R128 loudnorm | | `standard` | Highpass 80Hz + lowpass 8kHz + EBU R128 loudnorm | | `aggressive` | Standard + FFT denoiser (`afftdn`) + noise gate (`agate`) | | `none` | Convert to 16kHz mono WAV only | All modes trim leading silence to prevent Whisper hallucinations at file start. ## API | Endpoint | Method | Description | |---|---|---| | `/api/jobs` | POST | Create job (`{ source, title, audioMode }`) | | `/api/jobs` | GET | List recent jobs | | `/api/jobs/[id]` | GET | Poll job status | | `/api/jobs/[id]` | DELETE | Cancel job | | `/api/jobs/[id]/stream` | GET (SSE) | Live progress stream | | `/api/jobs/[id]/download/[format]` | GET | Download SRT/TXT/MD/JSON | | `/api/jobs/[id]/reprocess` | POST | Re-run post-processing on stored segments | | `/api/webhook/[jobId]` | POST | Whisper completion webhook (called by whisper-rtx2080) | | `/api/push` | GET | Get VAPID public key | | `/api/push` | POST | Register push subscription | | `/share` | POST | Web Share Target entry point | ## whisper-rtx2080 internals The backend handles all audio chunking internally (60s chunks with 30s snap window, silence-detected at −35dB). We submit one WAV file per job and receive one webhook when all chunks are transcribed. SSE progress events from the backend include `{ percent, chunk, total }` relayed live to the browser.