mozempk/tonemark

Fork 0

Go to file

Giancarmine Salucci a76625d378

Build & Push Docker Image / test (push) Successful in 10s

Details

Build & Push Docker Image / build-and-push (push) Successful in 44s

Details

ci: use npm install instead of npm ci to avoid lock file version mismatch

Lock file was generated with npm 11 (Node 24), CI runs npm 10 (Node 22).
npm install avoids the strict sync check and matches the Dockerfile.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-05-09 15:54:32 +02:00

.gitea/workflows

ci: use npm install instead of npm ci to avoid lock file version mismatch

2026-05-09 15:54:32 +02:00

.vscode

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

src

feat: proxy POST /model/unload endpoint

2026-05-09 15:48:47 +02:00

static

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

.env.example

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

.gitignore

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

.npmrc

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

backend.issue.md

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

Dockerfile

fix: install yt-dlp via pip instead of prebuilt binary

2026-05-06 19:17:18 +02:00

package-lock.json

chore: update package-lock.json to sync with package.json

2026-05-06 16:57:21 +02:00

package.json

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

README.md

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

svelte.config.js

fix: increase body size limit to 500MB for audio uploads

2026-05-06 19:32:28 +02:00

tsconfig.json

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

vite.config.ts

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

vitest.config.ts

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

README.md

whisper-pwa

A SvelteKit PWA that transcribes YouTube videos and audio/video files using a local whisper-rtx2080 backend.

Features

Web Share Target — share YouTube URLs, video or audio files directly from your phone or browser
Smart audio preparation — multi-strategy FFmpeg pipeline (auto/standard/aggressive/none) before submission
One job → one webhook — whisper-rtx2080 handles internal chunking; we receive a single webhook when done
Live progress — SSE stream showing chunk N of M + percentage
Post-processing — collapse repeats, n-gram deduplication to clean up hallucinations
4 output formats — SRT, plain TXT, Markdown (with timestamps), JSON
Web Push notifications — get notified when the transcript is ready (works on mobile too)

Requirements

Node.js 20+
FFmpeg in $PATH
yt-dlp in $PATH (for YouTube URLs)
Docker + whisper-rtx2080 running (or reachable at WHISPER_URL)

Setup

1. Install dependencies

npm install

2. Generate VAPID keys (one-time)

npx web-push generate-vapid-keys

3. Create `.env`

cp .env.example .env
# Edit .env and fill in all values

Required env vars:

Variable	Example	Description
`WHISPER_URL`	`http://localhost:8080`	whisper-rtx2080 base URL
`WEBHOOK_BASE_URL`	`http://192.168.1.x:3000`	Reachable from inside Docker
`OUTPUT_DIR`	`/home/user/transcripts`	Where to write output files
`VAPID_PUBLIC_KEY`	`BNxx...`	From `npx web-push generate-vapid-keys`
`VAPID_PRIVATE_KEY`	`xxxx`	From `npx web-push generate-vapid-keys`
`VAPID_SUBJECT`	`mailto:you@example.com`	Contact for push service
`DATA_DIR`	`/home/user/.whisper-pwa`	SQLite DB + tmp audio (default: `~/.whisper-pwa`)

Important

: WEBHOOK_BASE_URL must be the IP/hostname reachable from inside the whisper-rtx2080 Docker container (not localhost).

4. Build and run

npm run build
npm start

Visit http://localhost:3000.

5. For development

npm run dev

Audio preparation modes

Mode	Description
`auto` (default)	volumedetect → boost quiet audio + denoise + EBU R128 loudnorm
`standard`	Highpass 80Hz + lowpass 8kHz + EBU R128 loudnorm
`aggressive`	Standard + FFT denoiser (`afftdn`) + noise gate (`agate`)
`none`	Convert to 16kHz mono WAV only

All modes trim leading silence to prevent Whisper hallucinations at file start.

API

Endpoint	Method	Description
`/api/jobs`	POST	Create job (`{ source, title, audioMode }`)
`/api/jobs`	GET	List recent jobs
`/api/jobs/[id]`	GET	Poll job status
`/api/jobs/[id]`	DELETE	Cancel job
`/api/jobs/[id]/stream`	GET (SSE)	Live progress stream
`/api/jobs/[id]/download/[format]`	GET	Download SRT/TXT/MD/JSON
`/api/jobs/[id]/reprocess`	POST	Re-run post-processing on stored segments
`/api/webhook/[jobId]`	POST	Whisper completion webhook (called by whisper-rtx2080)
`/api/push`	GET	Get VAPID public key
`/api/push`	POST	Register push subscription
`/share`	POST	Web Share Target entry point

whisper-rtx2080 internals

The backend handles all audio chunking internally (60s chunks with 30s snap window, silence-detected at −35dB). We submit one WAV file per job and receive one webhook when all chunks are transcribed.

SSE progress events from the backend include { percent, chunk, total } relayed live to the browser.

Languages

TypeScript 63.8%

Svelte 33.3%

CSS 1.5%

Dockerfile 0.7%

HTML 0.4%

Other 0.3%

README.md Unescape Escape

whisper-pwa

Features

Requirements

Setup

1. Install dependencies

2. Generate VAPID keys (one-time)

3. Create .env

4. Build and run

5. For development

Audio preparation modes

API

whisper-rtx2080 internals

README.md

3. Create `.env`