Initial commit: Tonemark PWA
Some checks failed
Build & Push Docker Image / build-and-push (push) Failing after 11s

Tonemark is a SvelteKit PWA for transcribing YouTube videos, audio
and video files, and microphone recordings using a local Whisper backend.

Features:
- Dark glassmorphic UI with electric-lime accent (5 switchable themes)
- Rail nav (desktop) / tab bar (mobile) layout
- Drop zone, YouTube URL input, and live audio recording inputs
- Audio mode waveform cards (none / standard / aggressive / auto)
- Real-time transcription progress with animated waveform
- Job queue with SSE streaming updates
- Push notifications on job completion
- PWA with native SvelteKit service worker
- SRT / TXT / MD / JSON transcript downloads

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
Giancarmine Salucci
2026-05-06 16:41:25 +02:00
commit 13a96b6efa
68 changed files with 9712 additions and 0 deletions

103
README.md Normal file
View File

@@ -0,0 +1,103 @@
# whisper-pwa
A SvelteKit PWA that transcribes YouTube videos and audio/video files using a local [whisper-rtx2080](https://git.sal.giize.com/mozempk/whisper-rtx2080) backend.
## Features
- **Web Share Target** — share YouTube URLs, video or audio files directly from your phone or browser
- **Smart audio preparation** — multi-strategy FFmpeg pipeline (auto/standard/aggressive/none) before submission
- **One job → one webhook** — whisper-rtx2080 handles internal chunking; we receive a single webhook when done
- **Live progress** — SSE stream showing chunk N of M + percentage
- **Post-processing** — collapse repeats, n-gram deduplication to clean up hallucinations
- **4 output formats** — SRT, plain TXT, Markdown (with timestamps), JSON
- **Web Push notifications** — get notified when the transcript is ready (works on mobile too)
## Requirements
- Node.js 20+
- FFmpeg in `$PATH`
- yt-dlp in `$PATH` (for YouTube URLs)
- Docker + whisper-rtx2080 running (or reachable at `WHISPER_URL`)
## Setup
### 1. Install dependencies
```bash
npm install
```
### 2. Generate VAPID keys (one-time)
```bash
npx web-push generate-vapid-keys
```
### 3. Create `.env`
```bash
cp .env.example .env
# Edit .env and fill in all values
```
Required env vars:
| Variable | Example | Description |
|---|---|---|
| `WHISPER_URL` | `http://localhost:8080` | whisper-rtx2080 base URL |
| `WEBHOOK_BASE_URL` | `http://192.168.1.x:3000` | Reachable from inside Docker |
| `OUTPUT_DIR` | `/home/user/transcripts` | Where to write output files |
| `VAPID_PUBLIC_KEY` | `BNxx...` | From `npx web-push generate-vapid-keys` |
| `VAPID_PRIVATE_KEY` | `xxxx` | From `npx web-push generate-vapid-keys` |
| `VAPID_SUBJECT` | `mailto:you@example.com` | Contact for push service |
| `DATA_DIR` | `/home/user/.whisper-pwa` | SQLite DB + tmp audio (default: `~/.whisper-pwa`) |
> **Important**: `WEBHOOK_BASE_URL` must be the IP/hostname reachable from inside the whisper-rtx2080 Docker container (not `localhost`).
### 4. Build and run
```bash
npm run build
npm start
```
Visit `http://localhost:3000`.
### 5. For development
```bash
npm run dev
```
## Audio preparation modes
| Mode | Description |
|---|---|
| `auto` (default) | volumedetect → boost quiet audio + denoise + EBU R128 loudnorm |
| `standard` | Highpass 80Hz + lowpass 8kHz + EBU R128 loudnorm |
| `aggressive` | Standard + FFT denoiser (`afftdn`) + noise gate (`agate`) |
| `none` | Convert to 16kHz mono WAV only |
All modes trim leading silence to prevent Whisper hallucinations at file start.
## API
| Endpoint | Method | Description |
|---|---|---|
| `/api/jobs` | POST | Create job (`{ source, title, audioMode }`) |
| `/api/jobs` | GET | List recent jobs |
| `/api/jobs/[id]` | GET | Poll job status |
| `/api/jobs/[id]` | DELETE | Cancel job |
| `/api/jobs/[id]/stream` | GET (SSE) | Live progress stream |
| `/api/jobs/[id]/download/[format]` | GET | Download SRT/TXT/MD/JSON |
| `/api/jobs/[id]/reprocess` | POST | Re-run post-processing on stored segments |
| `/api/webhook/[jobId]` | POST | Whisper completion webhook (called by whisper-rtx2080) |
| `/api/push` | GET | Get VAPID public key |
| `/api/push` | POST | Register push subscription |
| `/share` | POST | Web Share Target entry point |
## whisper-rtx2080 internals
The backend handles all audio chunking internally (60s chunks with 30s snap window, silence-detected at 35dB). We submit one WAV file per job and receive one webhook when all chunks are transcribed.
SSE progress events from the backend include `{ percent, chunk, total }` relayed live to the browser.