Initial commit: Tonemark PWA

Tonemark is a SvelteKit PWA for transcribing YouTube videos, audio and video files, and microphone recordings using a local Whisper backend. Features: - Dark glassmorphic UI with electric-lime accent (5 switchable themes) - Rail nav (desktop) / tab bar (mobile) layout - Drop zone, YouTube URL input, and live audio recording inputs - Audio mode waveform cards (none / standard / aggressive / auto) - Real-time transcription progress with animated waveform - Job queue with SSE streaming updates - Push notifications on job completion - PWA with native SvelteKit service worker - SRT / TXT / MD / JSON transcript downloads Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 16:41:25 +02:00
commit 13a96b6efa
68 changed files with 9712 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,103 @@
+# whisper-pwa
+
+A SvelteKit PWA that transcribes YouTube videos and audio/video files using a local [whisper-rtx2080](https://git.sal.giize.com/mozempk/whisper-rtx2080) backend.
+
+## Features
+
+- **Web Share Target** — share YouTube URLs, video or audio files directly from your phone or browser
+- **Smart audio preparation** — multi-strategy FFmpeg pipeline (auto/standard/aggressive/none) before submission
+- **One job → one webhook** — whisper-rtx2080 handles internal chunking; we receive a single webhook when done
+- **Live progress** — SSE stream showing chunk N of M + percentage
+- **Post-processing** — collapse repeats, n-gram deduplication to clean up hallucinations
+- **4 output formats** — SRT, plain TXT, Markdown (with timestamps), JSON
+- **Web Push notifications** — get notified when the transcript is ready (works on mobile too)
+
+## Requirements
+
+- Node.js 20+
+- FFmpeg in `$PATH`
+- yt-dlp in `$PATH` (for YouTube URLs)
+- Docker + whisper-rtx2080 running (or reachable at `WHISPER_URL`)
+
+## Setup
+
+### 1. Install dependencies
+
+```bash
+npm install
+```
+
+### 2. Generate VAPID keys (one-time)
+
+```bash
+npx web-push generate-vapid-keys
+```
+
+### 3. Create `.env`
+
+```bash
+cp .env.example .env
+# Edit .env and fill in all values
+```
+
+Required env vars:
+
+| Variable | Example | Description |
+|---|---|---|
+| `WHISPER_URL` | `http://localhost:8080` | whisper-rtx2080 base URL |
+| `WEBHOOK_BASE_URL` | `http://192.168.1.x:3000` | Reachable from inside Docker |
+| `OUTPUT_DIR` | `/home/user/transcripts` | Where to write output files |
+| `VAPID_PUBLIC_KEY` | `BNxx...` | From `npx web-push generate-vapid-keys` |
+| `VAPID_PRIVATE_KEY` | `xxxx` | From `npx web-push generate-vapid-keys` |
+| `VAPID_SUBJECT` | `mailto:you@example.com` | Contact for push service |
+| `DATA_DIR` | `/home/user/.whisper-pwa` | SQLite DB + tmp audio (default: `~/.whisper-pwa`) |
+
+> **Important**: `WEBHOOK_BASE_URL` must be the IP/hostname reachable from inside the whisper-rtx2080 Docker container (not `localhost`).
+
+### 4. Build and run
+
+```bash
+npm run build
+npm start
+```
+
+Visit `http://localhost:3000`.
+
+### 5. For development
+
+```bash
+npm run dev
+```
+
+## Audio preparation modes
+
+| Mode | Description |
+|---|---|
+| `auto` (default) | volumedetect → boost quiet audio + denoise + EBU R128 loudnorm |
+| `standard` | Highpass 80Hz + lowpass 8kHz + EBU R128 loudnorm |
+| `aggressive` | Standard + FFT denoiser (`afftdn`) + noise gate (`agate`) |
+| `none` | Convert to 16kHz mono WAV only |
+
+All modes trim leading silence to prevent Whisper hallucinations at file start.
+
+## API
+
+| Endpoint | Method | Description |
+|---|---|---|
+| `/api/jobs` | POST | Create job (`{ source, title, audioMode }`) |
+| `/api/jobs` | GET | List recent jobs |
+| `/api/jobs/[id]` | GET | Poll job status |
+| `/api/jobs/[id]` | DELETE | Cancel job |
+| `/api/jobs/[id]/stream` | GET (SSE) | Live progress stream |
+| `/api/jobs/[id]/download/[format]` | GET | Download SRT/TXT/MD/JSON |
+| `/api/jobs/[id]/reprocess` | POST | Re-run post-processing on stored segments |
+| `/api/webhook/[jobId]` | POST | Whisper completion webhook (called by whisper-rtx2080) |
+| `/api/push` | GET | Get VAPID public key |
+| `/api/push` | POST | Register push subscription |
+| `/share` | POST | Web Share Target entry point |
+
+## whisper-rtx2080 internals
+
+The backend handles all audio chunking internally (60s chunks with 30s snap window, silence-detected at −35dB). We submit one WAV file per job and receive one webhook when all chunks are transcribed.
+
+SSE progress events from the backend include `{ percent, chunk, total }` relayed live to the browser.