04142b17a8e6d705c1ddca28a69a05581dbc3d4a
All checks were successful
Build & Push Docker Image / build-and-push (push) Successful in 48s
- Add cancelJob() to whisper.ts: sends DELETE /jobs/:id to the whisper server (best-effort, errors silently ignored) - DELETE /api/jobs/[id] now calls cancelJob() when cancelling an active job that has a whisperJobId, stopping GPU use immediately - Webhook handler guards against locally-cancelled jobs: returns ok early so whisper's late completion cannot overwrite cancelled status or send a phantom 'Transcript ready' notification - Replace blind sleep(Retry-After + 1s) in submitJob() with waitForModelReady(): subscribes to /model/events SSE and proceeds as soon as state:ready arrives; falls back to the Retry-After timeout if SSE is unreachable or closes without model_ready - Refactor retry tests to use URL-aware makeJobFetch() helper; add 7 new tests (3 SSE-triggered retry, 3 cancelJob, 1 webhook cancelled-guard) — 144/144 passing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
whisper-pwa
A SvelteKit PWA that transcribes YouTube videos and audio/video files using a local whisper-rtx2080 backend.
Features
- Web Share Target — share YouTube URLs, video or audio files directly from your phone or browser
- Smart audio preparation — multi-strategy FFmpeg pipeline (auto/standard/aggressive/none) before submission
- One job → one webhook — whisper-rtx2080 handles internal chunking; we receive a single webhook when done
- Live progress — SSE stream showing chunk N of M + percentage
- Post-processing — collapse repeats, n-gram deduplication to clean up hallucinations
- 4 output formats — SRT, plain TXT, Markdown (with timestamps), JSON
- Web Push notifications — get notified when the transcript is ready (works on mobile too)
Requirements
- Node.js 20+
- FFmpeg in
$PATH - yt-dlp in
$PATH(for YouTube URLs) - Docker + whisper-rtx2080 running (or reachable at
WHISPER_URL)
Setup
1. Install dependencies
npm install
2. Generate VAPID keys (one-time)
npx web-push generate-vapid-keys
3. Create .env
cp .env.example .env
# Edit .env and fill in all values
Required env vars:
| Variable | Example | Description |
|---|---|---|
WHISPER_URL |
http://localhost:8080 |
whisper-rtx2080 base URL |
WEBHOOK_BASE_URL |
http://192.168.1.x:3000 |
Reachable from inside Docker |
OUTPUT_DIR |
/home/user/transcripts |
Where to write output files |
VAPID_PUBLIC_KEY |
BNxx... |
From npx web-push generate-vapid-keys |
VAPID_PRIVATE_KEY |
xxxx |
From npx web-push generate-vapid-keys |
VAPID_SUBJECT |
mailto:you@example.com |
Contact for push service |
DATA_DIR |
/home/user/.whisper-pwa |
SQLite DB + tmp audio (default: ~/.whisper-pwa) |
Important
:
WEBHOOK_BASE_URLmust be the IP/hostname reachable from inside the whisper-rtx2080 Docker container (notlocalhost).
4. Build and run
npm run build
npm start
Visit http://localhost:3000.
5. For development
npm run dev
Audio preparation modes
| Mode | Description |
|---|---|
auto (default) |
volumedetect → boost quiet audio + denoise + EBU R128 loudnorm |
standard |
Highpass 80Hz + lowpass 8kHz + EBU R128 loudnorm |
aggressive |
Standard + FFT denoiser (afftdn) + noise gate (agate) |
none |
Convert to 16kHz mono WAV only |
All modes trim leading silence to prevent Whisper hallucinations at file start.
API
| Endpoint | Method | Description |
|---|---|---|
/api/jobs |
POST | Create job ({ source, title, audioMode }) |
/api/jobs |
GET | List recent jobs |
/api/jobs/[id] |
GET | Poll job status |
/api/jobs/[id] |
DELETE | Cancel job |
/api/jobs/[id]/stream |
GET (SSE) | Live progress stream |
/api/jobs/[id]/download/[format] |
GET | Download SRT/TXT/MD/JSON |
/api/jobs/[id]/reprocess |
POST | Re-run post-processing on stored segments |
/api/webhook/[jobId] |
POST | Whisper completion webhook (called by whisper-rtx2080) |
/api/push |
GET | Get VAPID public key |
/api/push |
POST | Register push subscription |
/share |
POST | Web Share Target entry point |
whisper-rtx2080 internals
The backend handles all audio chunking internally (60s chunks with 30s snap window, silence-detected at −35dB). We submit one WAV file per job and receive one webhook when all chunks are transcribed.
SSE progress events from the backend include { percent, chunk, total } relayed live to the browser.
Description
Languages
TypeScript
63.8%
Svelte
33.3%
CSS
1.5%
Dockerfile
0.7%
HTML
0.4%
Other
0.3%