Giancarmine Salucci dc65c028c1
All checks were successful
Build & Push Docker Image / build-and-push (push) Successful in 40s
fix: disable CSRF origin check to allow Web Share Target
SvelteKit's CSRF check runs before the handle hook and blocks POSTs
whose Origin header doesn't match the site origin. Web Share Target
POSTs from any external app (YouTube, Chrome share sheet, etc.) are
legitimately cross-origin.

checkOrigin: false is safe here — the app has no cookie-based session
auth, so there is no CSRF attack surface.

Also remove the ineffective hooks.server.ts approach.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 19:02:07 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00

whisper-pwa

A SvelteKit PWA that transcribes YouTube videos and audio/video files using a local whisper-rtx2080 backend.

Features

  • Web Share Target — share YouTube URLs, video or audio files directly from your phone or browser
  • Smart audio preparation — multi-strategy FFmpeg pipeline (auto/standard/aggressive/none) before submission
  • One job → one webhook — whisper-rtx2080 handles internal chunking; we receive a single webhook when done
  • Live progress — SSE stream showing chunk N of M + percentage
  • Post-processing — collapse repeats, n-gram deduplication to clean up hallucinations
  • 4 output formats — SRT, plain TXT, Markdown (with timestamps), JSON
  • Web Push notifications — get notified when the transcript is ready (works on mobile too)

Requirements

  • Node.js 20+
  • FFmpeg in $PATH
  • yt-dlp in $PATH (for YouTube URLs)
  • Docker + whisper-rtx2080 running (or reachable at WHISPER_URL)

Setup

1. Install dependencies

npm install

2. Generate VAPID keys (one-time)

npx web-push generate-vapid-keys

3. Create .env

cp .env.example .env
# Edit .env and fill in all values

Required env vars:

Variable Example Description
WHISPER_URL http://localhost:8080 whisper-rtx2080 base URL
WEBHOOK_BASE_URL http://192.168.1.x:3000 Reachable from inside Docker
OUTPUT_DIR /home/user/transcripts Where to write output files
VAPID_PUBLIC_KEY BNxx... From npx web-push generate-vapid-keys
VAPID_PRIVATE_KEY xxxx From npx web-push generate-vapid-keys
VAPID_SUBJECT mailto:you@example.com Contact for push service
DATA_DIR /home/user/.whisper-pwa SQLite DB + tmp audio (default: ~/.whisper-pwa)

Important

: WEBHOOK_BASE_URL must be the IP/hostname reachable from inside the whisper-rtx2080 Docker container (not localhost).

4. Build and run

npm run build
npm start

Visit http://localhost:3000.

5. For development

npm run dev

Audio preparation modes

Mode Description
auto (default) volumedetect → boost quiet audio + denoise + EBU R128 loudnorm
standard Highpass 80Hz + lowpass 8kHz + EBU R128 loudnorm
aggressive Standard + FFT denoiser (afftdn) + noise gate (agate)
none Convert to 16kHz mono WAV only

All modes trim leading silence to prevent Whisper hallucinations at file start.

API

Endpoint Method Description
/api/jobs POST Create job ({ source, title, audioMode })
/api/jobs GET List recent jobs
/api/jobs/[id] GET Poll job status
/api/jobs/[id] DELETE Cancel job
/api/jobs/[id]/stream GET (SSE) Live progress stream
/api/jobs/[id]/download/[format] GET Download SRT/TXT/MD/JSON
/api/jobs/[id]/reprocess POST Re-run post-processing on stored segments
/api/webhook/[jobId] POST Whisper completion webhook (called by whisper-rtx2080)
/api/push GET Get VAPID public key
/api/push POST Register push subscription
/share POST Web Share Target entry point

whisper-rtx2080 internals

The backend handles all audio chunking internally (60s chunks with 30s snap window, silence-detected at 35dB). We submit one WAV file per job and receive one webhook when all chunks are transcribed.

SSE progress events from the backend include { percent, chunk, total } relayed live to the browser.

Description
Tonemark — AI-powered audio transcription
Readme 667 KiB
Languages
TypeScript 63.8%
Svelte 33.3%
CSS 1.5%
Dockerfile 0.7%
HTML 0.4%
Other 0.3%