dc65c028c1f0dda48bd1b10faa9aa11d3311da27
All checks were successful
Build & Push Docker Image / build-and-push (push) Successful in 40s
SvelteKit's CSRF check runs before the handle hook and blocks POSTs whose Origin header doesn't match the site origin. Web Share Target POSTs from any external app (YouTube, Chrome share sheet, etc.) are legitimately cross-origin. checkOrigin: false is safe here — the app has no cookie-based session auth, so there is no CSRF attack surface. Also remove the ineffective hooks.server.ts approach. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
whisper-pwa
A SvelteKit PWA that transcribes YouTube videos and audio/video files using a local whisper-rtx2080 backend.
Features
- Web Share Target — share YouTube URLs, video or audio files directly from your phone or browser
- Smart audio preparation — multi-strategy FFmpeg pipeline (auto/standard/aggressive/none) before submission
- One job → one webhook — whisper-rtx2080 handles internal chunking; we receive a single webhook when done
- Live progress — SSE stream showing chunk N of M + percentage
- Post-processing — collapse repeats, n-gram deduplication to clean up hallucinations
- 4 output formats — SRT, plain TXT, Markdown (with timestamps), JSON
- Web Push notifications — get notified when the transcript is ready (works on mobile too)
Requirements
- Node.js 20+
- FFmpeg in
$PATH - yt-dlp in
$PATH(for YouTube URLs) - Docker + whisper-rtx2080 running (or reachable at
WHISPER_URL)
Setup
1. Install dependencies
npm install
2. Generate VAPID keys (one-time)
npx web-push generate-vapid-keys
3. Create .env
cp .env.example .env
# Edit .env and fill in all values
Required env vars:
| Variable | Example | Description |
|---|---|---|
WHISPER_URL |
http://localhost:8080 |
whisper-rtx2080 base URL |
WEBHOOK_BASE_URL |
http://192.168.1.x:3000 |
Reachable from inside Docker |
OUTPUT_DIR |
/home/user/transcripts |
Where to write output files |
VAPID_PUBLIC_KEY |
BNxx... |
From npx web-push generate-vapid-keys |
VAPID_PRIVATE_KEY |
xxxx |
From npx web-push generate-vapid-keys |
VAPID_SUBJECT |
mailto:you@example.com |
Contact for push service |
DATA_DIR |
/home/user/.whisper-pwa |
SQLite DB + tmp audio (default: ~/.whisper-pwa) |
Important
:
WEBHOOK_BASE_URLmust be the IP/hostname reachable from inside the whisper-rtx2080 Docker container (notlocalhost).
4. Build and run
npm run build
npm start
Visit http://localhost:3000.
5. For development
npm run dev
Audio preparation modes
| Mode | Description |
|---|---|
auto (default) |
volumedetect → boost quiet audio + denoise + EBU R128 loudnorm |
standard |
Highpass 80Hz + lowpass 8kHz + EBU R128 loudnorm |
aggressive |
Standard + FFT denoiser (afftdn) + noise gate (agate) |
none |
Convert to 16kHz mono WAV only |
All modes trim leading silence to prevent Whisper hallucinations at file start.
API
| Endpoint | Method | Description |
|---|---|---|
/api/jobs |
POST | Create job ({ source, title, audioMode }) |
/api/jobs |
GET | List recent jobs |
/api/jobs/[id] |
GET | Poll job status |
/api/jobs/[id] |
DELETE | Cancel job |
/api/jobs/[id]/stream |
GET (SSE) | Live progress stream |
/api/jobs/[id]/download/[format] |
GET | Download SRT/TXT/MD/JSON |
/api/jobs/[id]/reprocess |
POST | Re-run post-processing on stored segments |
/api/webhook/[jobId] |
POST | Whisper completion webhook (called by whisper-rtx2080) |
/api/push |
GET | Get VAPID public key |
/api/push |
POST | Register push subscription |
/share |
POST | Web Share Target entry point |
whisper-rtx2080 internals
The backend handles all audio chunking internally (60s chunks with 30s snap window, silence-detected at −35dB). We submit one WAV file per job and receive one webhook when all chunks are transcribed.
SSE progress events from the backend include { percent, chunk, total } relayed live to the browser.
Description
Languages
TypeScript
63.8%
Svelte
33.3%
CSS
1.5%
Dockerfile
0.7%
HTML
0.4%
Other
0.3%