Some checks failed
Build & Push Docker Image / build-and-push (push) Failing after 11s
Tonemark is a SvelteKit PWA for transcribing YouTube videos, audio and video files, and microphone recordings using a local Whisper backend. Features: - Dark glassmorphic UI with electric-lime accent (5 switchable themes) - Rail nav (desktop) / tab bar (mobile) layout - Drop zone, YouTube URL input, and live audio recording inputs - Audio mode waveform cards (none / standard / aggressive / auto) - Real-time transcription progress with animated waveform - Job queue with SSE streaming updates - Push notifications on job completion - PWA with native SvelteKit service worker - SRT / TXT / MD / JSON transcript downloads Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
104 lines
3.5 KiB
Markdown
104 lines
3.5 KiB
Markdown
# whisper-pwa
|
||
|
||
A SvelteKit PWA that transcribes YouTube videos and audio/video files using a local [whisper-rtx2080](https://git.sal.giize.com/mozempk/whisper-rtx2080) backend.
|
||
|
||
## Features
|
||
|
||
- **Web Share Target** — share YouTube URLs, video or audio files directly from your phone or browser
|
||
- **Smart audio preparation** — multi-strategy FFmpeg pipeline (auto/standard/aggressive/none) before submission
|
||
- **One job → one webhook** — whisper-rtx2080 handles internal chunking; we receive a single webhook when done
|
||
- **Live progress** — SSE stream showing chunk N of M + percentage
|
||
- **Post-processing** — collapse repeats, n-gram deduplication to clean up hallucinations
|
||
- **4 output formats** — SRT, plain TXT, Markdown (with timestamps), JSON
|
||
- **Web Push notifications** — get notified when the transcript is ready (works on mobile too)
|
||
|
||
## Requirements
|
||
|
||
- Node.js 20+
|
||
- FFmpeg in `$PATH`
|
||
- yt-dlp in `$PATH` (for YouTube URLs)
|
||
- Docker + whisper-rtx2080 running (or reachable at `WHISPER_URL`)
|
||
|
||
## Setup
|
||
|
||
### 1. Install dependencies
|
||
|
||
```bash
|
||
npm install
|
||
```
|
||
|
||
### 2. Generate VAPID keys (one-time)
|
||
|
||
```bash
|
||
npx web-push generate-vapid-keys
|
||
```
|
||
|
||
### 3. Create `.env`
|
||
|
||
```bash
|
||
cp .env.example .env
|
||
# Edit .env and fill in all values
|
||
```
|
||
|
||
Required env vars:
|
||
|
||
| Variable | Example | Description |
|
||
|---|---|---|
|
||
| `WHISPER_URL` | `http://localhost:8080` | whisper-rtx2080 base URL |
|
||
| `WEBHOOK_BASE_URL` | `http://192.168.1.x:3000` | Reachable from inside Docker |
|
||
| `OUTPUT_DIR` | `/home/user/transcripts` | Where to write output files |
|
||
| `VAPID_PUBLIC_KEY` | `BNxx...` | From `npx web-push generate-vapid-keys` |
|
||
| `VAPID_PRIVATE_KEY` | `xxxx` | From `npx web-push generate-vapid-keys` |
|
||
| `VAPID_SUBJECT` | `mailto:you@example.com` | Contact for push service |
|
||
| `DATA_DIR` | `/home/user/.whisper-pwa` | SQLite DB + tmp audio (default: `~/.whisper-pwa`) |
|
||
|
||
> **Important**: `WEBHOOK_BASE_URL` must be the IP/hostname reachable from inside the whisper-rtx2080 Docker container (not `localhost`).
|
||
|
||
### 4. Build and run
|
||
|
||
```bash
|
||
npm run build
|
||
npm start
|
||
```
|
||
|
||
Visit `http://localhost:3000`.
|
||
|
||
### 5. For development
|
||
|
||
```bash
|
||
npm run dev
|
||
```
|
||
|
||
## Audio preparation modes
|
||
|
||
| Mode | Description |
|
||
|---|---|
|
||
| `auto` (default) | volumedetect → boost quiet audio + denoise + EBU R128 loudnorm |
|
||
| `standard` | Highpass 80Hz + lowpass 8kHz + EBU R128 loudnorm |
|
||
| `aggressive` | Standard + FFT denoiser (`afftdn`) + noise gate (`agate`) |
|
||
| `none` | Convert to 16kHz mono WAV only |
|
||
|
||
All modes trim leading silence to prevent Whisper hallucinations at file start.
|
||
|
||
## API
|
||
|
||
| Endpoint | Method | Description |
|
||
|---|---|---|
|
||
| `/api/jobs` | POST | Create job (`{ source, title, audioMode }`) |
|
||
| `/api/jobs` | GET | List recent jobs |
|
||
| `/api/jobs/[id]` | GET | Poll job status |
|
||
| `/api/jobs/[id]` | DELETE | Cancel job |
|
||
| `/api/jobs/[id]/stream` | GET (SSE) | Live progress stream |
|
||
| `/api/jobs/[id]/download/[format]` | GET | Download SRT/TXT/MD/JSON |
|
||
| `/api/jobs/[id]/reprocess` | POST | Re-run post-processing on stored segments |
|
||
| `/api/webhook/[jobId]` | POST | Whisper completion webhook (called by whisper-rtx2080) |
|
||
| `/api/push` | GET | Get VAPID public key |
|
||
| `/api/push` | POST | Register push subscription |
|
||
| `/share` | POST | Web Share Target entry point |
|
||
|
||
## whisper-rtx2080 internals
|
||
|
||
The backend handles all audio chunking internally (60s chunks with 30s snap window, silence-detected at −35dB). We submit one WAV file per job and receive one webhook when all chunks are transcribed.
|
||
|
||
SSE progress events from the backend include `{ percent, chunk, total }` relayed live to the browser.
|