Giancarmine Salucci 95eea34011
Some checks failed
Build & Push Docker Image / build-and-push (push) Failing after 12s
ci: use auto-provided GITEA_TOKEN for registry login
Avoids needing to set custom REGISTRY_USERNAME/REGISTRY_TOKEN secrets.
The built-in secrets.GITEA_TOKEN has write:package access for pushing
to the Gitea container registry.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 16:50:51 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00
2026-05-06 16:41:25 +02:00

whisper-pwa

A SvelteKit PWA that transcribes YouTube videos and audio/video files using a local whisper-rtx2080 backend.

Features

  • Web Share Target — share YouTube URLs, video or audio files directly from your phone or browser
  • Smart audio preparation — multi-strategy FFmpeg pipeline (auto/standard/aggressive/none) before submission
  • One job → one webhook — whisper-rtx2080 handles internal chunking; we receive a single webhook when done
  • Live progress — SSE stream showing chunk N of M + percentage
  • Post-processing — collapse repeats, n-gram deduplication to clean up hallucinations
  • 4 output formats — SRT, plain TXT, Markdown (with timestamps), JSON
  • Web Push notifications — get notified when the transcript is ready (works on mobile too)

Requirements

  • Node.js 20+
  • FFmpeg in $PATH
  • yt-dlp in $PATH (for YouTube URLs)
  • Docker + whisper-rtx2080 running (or reachable at WHISPER_URL)

Setup

1. Install dependencies

npm install

2. Generate VAPID keys (one-time)

npx web-push generate-vapid-keys

3. Create .env

cp .env.example .env
# Edit .env and fill in all values

Required env vars:

Variable Example Description
WHISPER_URL http://localhost:8080 whisper-rtx2080 base URL
WEBHOOK_BASE_URL http://192.168.1.x:3000 Reachable from inside Docker
OUTPUT_DIR /home/user/transcripts Where to write output files
VAPID_PUBLIC_KEY BNxx... From npx web-push generate-vapid-keys
VAPID_PRIVATE_KEY xxxx From npx web-push generate-vapid-keys
VAPID_SUBJECT mailto:you@example.com Contact for push service
DATA_DIR /home/user/.whisper-pwa SQLite DB + tmp audio (default: ~/.whisper-pwa)

Important

: WEBHOOK_BASE_URL must be the IP/hostname reachable from inside the whisper-rtx2080 Docker container (not localhost).

4. Build and run

npm run build
npm start

Visit http://localhost:3000.

5. For development

npm run dev

Audio preparation modes

Mode Description
auto (default) volumedetect → boost quiet audio + denoise + EBU R128 loudnorm
standard Highpass 80Hz + lowpass 8kHz + EBU R128 loudnorm
aggressive Standard + FFT denoiser (afftdn) + noise gate (agate)
none Convert to 16kHz mono WAV only

All modes trim leading silence to prevent Whisper hallucinations at file start.

API

Endpoint Method Description
/api/jobs POST Create job ({ source, title, audioMode })
/api/jobs GET List recent jobs
/api/jobs/[id] GET Poll job status
/api/jobs/[id] DELETE Cancel job
/api/jobs/[id]/stream GET (SSE) Live progress stream
/api/jobs/[id]/download/[format] GET Download SRT/TXT/MD/JSON
/api/jobs/[id]/reprocess POST Re-run post-processing on stored segments
/api/webhook/[jobId] POST Whisper completion webhook (called by whisper-rtx2080)
/api/push GET Get VAPID public key
/api/push POST Register push subscription
/share POST Web Share Target entry point

whisper-rtx2080 internals

The backend handles all audio chunking internally (60s chunks with 30s snap window, silence-detected at 35dB). We submit one WAV file per job and receive one webhook when all chunks are transcribed.

SSE progress events from the backend include { percent, chunk, total } relayed live to the browser.

Description
Tonemark — AI-powered audio transcription
Readme 667 KiB
Languages
TypeScript 63.8%
Svelte 33.3%
CSS 1.5%
Dockerfile 0.7%
HTML 0.4%
Other 0.3%