Go to file

Giancarmine Salucci 04142b17a8

Build & Push Docker Image / build-and-push (push) Successful in 48s

Details

feat: whisper-side cancellation + SSE-triggered retry

- Add cancelJob() to whisper.ts: sends DELETE /jobs/:id to the whisper
  server (best-effort, errors silently ignored)
- DELETE /api/jobs/[id] now calls cancelJob() when cancelling an active
  job that has a whisperJobId, stopping GPU use immediately
- Webhook handler guards against locally-cancelled jobs: returns ok early
  so whisper's late completion cannot overwrite cancelled status or send
  a phantom 'Transcript ready' notification
- Replace blind sleep(Retry-After + 1s) in submitJob() with
  waitForModelReady(): subscribes to /model/events SSE and proceeds as
  soon as state:ready arrives; falls back to the Retry-After timeout if
  SSE is unreachable or closes without model_ready
- Refactor retry tests to use URL-aware makeJobFetch() helper; add 7 new
  tests (3 SSE-triggered retry, 3 cancelJob, 1 webhook cancelled-guard)
  — 144/144 passing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-05-09 00:40:40 +02:00

.gitea/workflows

ci: restore REGISTRY_USERNAME/REGISTRY_TOKEN secrets (now set on repo)

2026-05-06 16:54:58 +02:00

.vscode

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

src

feat: whisper-side cancellation + SSE-triggered retry

2026-05-09 00:40:40 +02:00

static

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

.env.example

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

.gitignore

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

.npmrc

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

backend.issue.md

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

Dockerfile

fix: install yt-dlp via pip instead of prebuilt binary

2026-05-06 19:17:18 +02:00

package-lock.json

chore: update package-lock.json to sync with package.json

2026-05-06 16:57:21 +02:00

package.json

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

README.md

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

svelte.config.js

fix: increase body size limit to 500MB for audio uploads

2026-05-06 19:32:28 +02:00

tsconfig.json

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

vite.config.ts

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

vitest.config.ts

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

README.md

whisper-pwa

A SvelteKit PWA that transcribes YouTube videos and audio/video files using a local whisper-rtx2080 backend.

Features

Web Share Target — share YouTube URLs, video or audio files directly from your phone or browser
Smart audio preparation — multi-strategy FFmpeg pipeline (auto/standard/aggressive/none) before submission
One job → one webhook — whisper-rtx2080 handles internal chunking; we receive a single webhook when done
Live progress — SSE stream showing chunk N of M + percentage
Post-processing — collapse repeats, n-gram deduplication to clean up hallucinations
4 output formats — SRT, plain TXT, Markdown (with timestamps), JSON
Web Push notifications — get notified when the transcript is ready (works on mobile too)

Requirements

Node.js 20+
FFmpeg in $PATH
yt-dlp in $PATH (for YouTube URLs)
Docker + whisper-rtx2080 running (or reachable at WHISPER_URL)

Setup

1. Install dependencies

npm install

2. Generate VAPID keys (one-time)

npx web-push generate-vapid-keys

3. Create `.env`

cp .env.example .env
# Edit .env and fill in all values

Required env vars:

Variable	Example	Description
`WHISPER_URL`	`http://localhost:8080`	whisper-rtx2080 base URL
`WEBHOOK_BASE_URL`	`http://192.168.1.x:3000`	Reachable from inside Docker
`OUTPUT_DIR`	`/home/user/transcripts`	Where to write output files
`VAPID_PUBLIC_KEY`	`BNxx...`	From `npx web-push generate-vapid-keys`
`VAPID_PRIVATE_KEY`	`xxxx`	From `npx web-push generate-vapid-keys`
`VAPID_SUBJECT`	`mailto:you@example.com`	Contact for push service
`DATA_DIR`	`/home/user/.whisper-pwa`	SQLite DB + tmp audio (default: `~/.whisper-pwa`)

Important

: WEBHOOK_BASE_URL must be the IP/hostname reachable from inside the whisper-rtx2080 Docker container (not localhost).

4. Build and run

npm run build
npm start

Visit http://localhost:3000.

5. For development

npm run dev

Audio preparation modes

Mode	Description
`auto` (default)	volumedetect → boost quiet audio + denoise + EBU R128 loudnorm
`standard`	Highpass 80Hz + lowpass 8kHz + EBU R128 loudnorm
`aggressive`	Standard + FFT denoiser (`afftdn`) + noise gate (`agate`)
`none`	Convert to 16kHz mono WAV only

All modes trim leading silence to prevent Whisper hallucinations at file start.

API

Endpoint	Method	Description
`/api/jobs`	POST	Create job (`{ source, title, audioMode }`)
`/api/jobs`	GET	List recent jobs
`/api/jobs/[id]`	GET	Poll job status
`/api/jobs/[id]`	DELETE	Cancel job
`/api/jobs/[id]/stream`	GET (SSE)	Live progress stream
`/api/jobs/[id]/download/[format]`	GET	Download SRT/TXT/MD/JSON
`/api/jobs/[id]/reprocess`	POST	Re-run post-processing on stored segments
`/api/webhook/[jobId]`	POST	Whisper completion webhook (called by whisper-rtx2080)
`/api/push`	GET	Get VAPID public key
`/api/push`	POST	Register push subscription
`/share`	POST	Web Share Target entry point

whisper-rtx2080 internals

The backend handles all audio chunking internally (60s chunks with 30s snap window, silence-detected at −35dB). We submit one WAV file per job and receive one webhook when all chunks are transcribed.

SSE progress events from the backend include { percent, chunk, total } relayed live to the browser.

Languages

TypeScript 63.8%

Svelte 33.3%

CSS 1.5%

Dockerfile 0.7%

HTML 0.4%

Other 0.3%

README.md Unescape Escape

whisper-pwa

Features

Requirements

Setup

1. Install dependencies

2. Generate VAPID keys (one-time)

3. Create .env

4. Build and run

5. For development

Audio preparation modes

API

whisper-rtx2080 internals

README.md

3. Create `.env`