Files
tonemark/README.md
Giancarmine Salucci 13a96b6efa
Some checks failed
Build & Push Docker Image / build-and-push (push) Failing after 11s
Initial commit: Tonemark PWA
Tonemark is a SvelteKit PWA for transcribing YouTube videos, audio
and video files, and microphone recordings using a local Whisper backend.

Features:
- Dark glassmorphic UI with electric-lime accent (5 switchable themes)
- Rail nav (desktop) / tab bar (mobile) layout
- Drop zone, YouTube URL input, and live audio recording inputs
- Audio mode waveform cards (none / standard / aggressive / auto)
- Real-time transcription progress with animated waveform
- Job queue with SSE streaming updates
- Push notifications on job completion
- PWA with native SvelteKit service worker
- SRT / TXT / MD / JSON transcript downloads

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 16:41:25 +02:00

104 lines
3.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# whisper-pwa
A SvelteKit PWA that transcribes YouTube videos and audio/video files using a local [whisper-rtx2080](https://git.sal.giize.com/mozempk/whisper-rtx2080) backend.
## Features
- **Web Share Target** — share YouTube URLs, video or audio files directly from your phone or browser
- **Smart audio preparation** — multi-strategy FFmpeg pipeline (auto/standard/aggressive/none) before submission
- **One job → one webhook** — whisper-rtx2080 handles internal chunking; we receive a single webhook when done
- **Live progress** — SSE stream showing chunk N of M + percentage
- **Post-processing** — collapse repeats, n-gram deduplication to clean up hallucinations
- **4 output formats** — SRT, plain TXT, Markdown (with timestamps), JSON
- **Web Push notifications** — get notified when the transcript is ready (works on mobile too)
## Requirements
- Node.js 20+
- FFmpeg in `$PATH`
- yt-dlp in `$PATH` (for YouTube URLs)
- Docker + whisper-rtx2080 running (or reachable at `WHISPER_URL`)
## Setup
### 1. Install dependencies
```bash
npm install
```
### 2. Generate VAPID keys (one-time)
```bash
npx web-push generate-vapid-keys
```
### 3. Create `.env`
```bash
cp .env.example .env
# Edit .env and fill in all values
```
Required env vars:
| Variable | Example | Description |
|---|---|---|
| `WHISPER_URL` | `http://localhost:8080` | whisper-rtx2080 base URL |
| `WEBHOOK_BASE_URL` | `http://192.168.1.x:3000` | Reachable from inside Docker |
| `OUTPUT_DIR` | `/home/user/transcripts` | Where to write output files |
| `VAPID_PUBLIC_KEY` | `BNxx...` | From `npx web-push generate-vapid-keys` |
| `VAPID_PRIVATE_KEY` | `xxxx` | From `npx web-push generate-vapid-keys` |
| `VAPID_SUBJECT` | `mailto:you@example.com` | Contact for push service |
| `DATA_DIR` | `/home/user/.whisper-pwa` | SQLite DB + tmp audio (default: `~/.whisper-pwa`) |
> **Important**: `WEBHOOK_BASE_URL` must be the IP/hostname reachable from inside the whisper-rtx2080 Docker container (not `localhost`).
### 4. Build and run
```bash
npm run build
npm start
```
Visit `http://localhost:3000`.
### 5. For development
```bash
npm run dev
```
## Audio preparation modes
| Mode | Description |
|---|---|
| `auto` (default) | volumedetect → boost quiet audio + denoise + EBU R128 loudnorm |
| `standard` | Highpass 80Hz + lowpass 8kHz + EBU R128 loudnorm |
| `aggressive` | Standard + FFT denoiser (`afftdn`) + noise gate (`agate`) |
| `none` | Convert to 16kHz mono WAV only |
All modes trim leading silence to prevent Whisper hallucinations at file start.
## API
| Endpoint | Method | Description |
|---|---|---|
| `/api/jobs` | POST | Create job (`{ source, title, audioMode }`) |
| `/api/jobs` | GET | List recent jobs |
| `/api/jobs/[id]` | GET | Poll job status |
| `/api/jobs/[id]` | DELETE | Cancel job |
| `/api/jobs/[id]/stream` | GET (SSE) | Live progress stream |
| `/api/jobs/[id]/download/[format]` | GET | Download SRT/TXT/MD/JSON |
| `/api/jobs/[id]/reprocess` | POST | Re-run post-processing on stored segments |
| `/api/webhook/[jobId]` | POST | Whisper completion webhook (called by whisper-rtx2080) |
| `/api/push` | GET | Get VAPID public key |
| `/api/push` | POST | Register push subscription |
| `/share` | POST | Web Share Target entry point |
## whisper-rtx2080 internals
The backend handles all audio chunking internally (60s chunks with 30s snap window, silence-detected at 35dB). We submit one WAV file per job and receive one webhook when all chunks are transcribed.
SSE progress events from the backend include `{ percent, chunk, total }` relayed live to the browser.