08adff1562828cddfe7580d6315d378537b2d5f9
All checks were successful
Build & Push Docker Image / build-and-push (push) Successful in 41s
SvelteKit's CSRF guard rejects POST requests whose Origin header doesn't match the site's own origin. Web Share Target POSTs legitimately arrive from external origins (e.g. youtube.com, OS share sheet). Strip the Origin header in a handle hook for /share POST only. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
whisper-pwa
A SvelteKit PWA that transcribes YouTube videos and audio/video files using a local whisper-rtx2080 backend.
Features
- Web Share Target — share YouTube URLs, video or audio files directly from your phone or browser
- Smart audio preparation — multi-strategy FFmpeg pipeline (auto/standard/aggressive/none) before submission
- One job → one webhook — whisper-rtx2080 handles internal chunking; we receive a single webhook when done
- Live progress — SSE stream showing chunk N of M + percentage
- Post-processing — collapse repeats, n-gram deduplication to clean up hallucinations
- 4 output formats — SRT, plain TXT, Markdown (with timestamps), JSON
- Web Push notifications — get notified when the transcript is ready (works on mobile too)
Requirements
- Node.js 20+
- FFmpeg in
$PATH - yt-dlp in
$PATH(for YouTube URLs) - Docker + whisper-rtx2080 running (or reachable at
WHISPER_URL)
Setup
1. Install dependencies
npm install
2. Generate VAPID keys (one-time)
npx web-push generate-vapid-keys
3. Create .env
cp .env.example .env
# Edit .env and fill in all values
Required env vars:
| Variable | Example | Description |
|---|---|---|
WHISPER_URL |
http://localhost:8080 |
whisper-rtx2080 base URL |
WEBHOOK_BASE_URL |
http://192.168.1.x:3000 |
Reachable from inside Docker |
OUTPUT_DIR |
/home/user/transcripts |
Where to write output files |
VAPID_PUBLIC_KEY |
BNxx... |
From npx web-push generate-vapid-keys |
VAPID_PRIVATE_KEY |
xxxx |
From npx web-push generate-vapid-keys |
VAPID_SUBJECT |
mailto:you@example.com |
Contact for push service |
DATA_DIR |
/home/user/.whisper-pwa |
SQLite DB + tmp audio (default: ~/.whisper-pwa) |
Important
:
WEBHOOK_BASE_URLmust be the IP/hostname reachable from inside the whisper-rtx2080 Docker container (notlocalhost).
4. Build and run
npm run build
npm start
Visit http://localhost:3000.
5. For development
npm run dev
Audio preparation modes
| Mode | Description |
|---|---|
auto (default) |
volumedetect → boost quiet audio + denoise + EBU R128 loudnorm |
standard |
Highpass 80Hz + lowpass 8kHz + EBU R128 loudnorm |
aggressive |
Standard + FFT denoiser (afftdn) + noise gate (agate) |
none |
Convert to 16kHz mono WAV only |
All modes trim leading silence to prevent Whisper hallucinations at file start.
API
| Endpoint | Method | Description |
|---|---|---|
/api/jobs |
POST | Create job ({ source, title, audioMode }) |
/api/jobs |
GET | List recent jobs |
/api/jobs/[id] |
GET | Poll job status |
/api/jobs/[id] |
DELETE | Cancel job |
/api/jobs/[id]/stream |
GET (SSE) | Live progress stream |
/api/jobs/[id]/download/[format] |
GET | Download SRT/TXT/MD/JSON |
/api/jobs/[id]/reprocess |
POST | Re-run post-processing on stored segments |
/api/webhook/[jobId] |
POST | Whisper completion webhook (called by whisper-rtx2080) |
/api/push |
GET | Get VAPID public key |
/api/push |
POST | Register push subscription |
/share |
POST | Web Share Target entry point |
whisper-rtx2080 internals
The backend handles all audio chunking internally (60s chunks with 30s snap window, silence-detected at −35dB). We submit one WAV file per job and receive one webhook when all chunks are transcribed.
SSE progress events from the backend include { percent, chunk, total } relayed live to the browser.
Description
Languages
TypeScript
63.8%
Svelte
33.3%
CSS
1.5%
Dockerfile
0.7%
HTML
0.4%
Other
0.3%