a76625d37817dd866cd0cba2de167517ba1e5ed4
Lock file was generated with npm 11 (Node 24), CI runs npm 10 (Node 22). npm install avoids the strict sync check and matches the Dockerfile. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
whisper-pwa
A SvelteKit PWA that transcribes YouTube videos and audio/video files using a local whisper-rtx2080 backend.
Features
- Web Share Target — share YouTube URLs, video or audio files directly from your phone or browser
- Smart audio preparation — multi-strategy FFmpeg pipeline (auto/standard/aggressive/none) before submission
- One job → one webhook — whisper-rtx2080 handles internal chunking; we receive a single webhook when done
- Live progress — SSE stream showing chunk N of M + percentage
- Post-processing — collapse repeats, n-gram deduplication to clean up hallucinations
- 4 output formats — SRT, plain TXT, Markdown (with timestamps), JSON
- Web Push notifications — get notified when the transcript is ready (works on mobile too)
Requirements
- Node.js 20+
- FFmpeg in
$PATH - yt-dlp in
$PATH(for YouTube URLs) - Docker + whisper-rtx2080 running (or reachable at
WHISPER_URL)
Setup
1. Install dependencies
npm install
2. Generate VAPID keys (one-time)
npx web-push generate-vapid-keys
3. Create .env
cp .env.example .env
# Edit .env and fill in all values
Required env vars:
| Variable | Example | Description |
|---|---|---|
WHISPER_URL |
http://localhost:8080 |
whisper-rtx2080 base URL |
WEBHOOK_BASE_URL |
http://192.168.1.x:3000 |
Reachable from inside Docker |
OUTPUT_DIR |
/home/user/transcripts |
Where to write output files |
VAPID_PUBLIC_KEY |
BNxx... |
From npx web-push generate-vapid-keys |
VAPID_PRIVATE_KEY |
xxxx |
From npx web-push generate-vapid-keys |
VAPID_SUBJECT |
mailto:you@example.com |
Contact for push service |
DATA_DIR |
/home/user/.whisper-pwa |
SQLite DB + tmp audio (default: ~/.whisper-pwa) |
Important
:
WEBHOOK_BASE_URLmust be the IP/hostname reachable from inside the whisper-rtx2080 Docker container (notlocalhost).
4. Build and run
npm run build
npm start
Visit http://localhost:3000.
5. For development
npm run dev
Audio preparation modes
| Mode | Description |
|---|---|
auto (default) |
volumedetect → boost quiet audio + denoise + EBU R128 loudnorm |
standard |
Highpass 80Hz + lowpass 8kHz + EBU R128 loudnorm |
aggressive |
Standard + FFT denoiser (afftdn) + noise gate (agate) |
none |
Convert to 16kHz mono WAV only |
All modes trim leading silence to prevent Whisper hallucinations at file start.
API
| Endpoint | Method | Description |
|---|---|---|
/api/jobs |
POST | Create job ({ source, title, audioMode }) |
/api/jobs |
GET | List recent jobs |
/api/jobs/[id] |
GET | Poll job status |
/api/jobs/[id] |
DELETE | Cancel job |
/api/jobs/[id]/stream |
GET (SSE) | Live progress stream |
/api/jobs/[id]/download/[format] |
GET | Download SRT/TXT/MD/JSON |
/api/jobs/[id]/reprocess |
POST | Re-run post-processing on stored segments |
/api/webhook/[jobId] |
POST | Whisper completion webhook (called by whisper-rtx2080) |
/api/push |
GET | Get VAPID public key |
/api/push |
POST | Register push subscription |
/share |
POST | Web Share Target entry point |
whisper-rtx2080 internals
The backend handles all audio chunking internally (60s chunks with 30s snap window, silence-detected at −35dB). We submit one WAV file per job and receive one webhook when all chunks are transcribed.
SSE progress events from the backend include { percent, chunk, total } relayed live to the browser.
Description
Languages
TypeScript
63.8%
Svelte
33.3%
CSS
1.5%
Dockerfile
0.7%
HTML
0.4%
Other
0.3%