Go to file

Giancarmine Salucci 08adff1562

Build & Push Docker Image / build-and-push (push) Successful in 41s

Details

fix: bypass CSRF for Web Share Target POST

SvelteKit's CSRF guard rejects POST requests whose Origin header doesn't
match the site's own origin. Web Share Target POSTs legitimately arrive
from external origins (e.g. youtube.com, OS share sheet). Strip the
Origin header in a handle hook for /share POST only.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-05-06 18:58:39 +02:00

.gitea/workflows

ci: restore REGISTRY_USERNAME/REGISTRY_TOKEN secrets (now set on repo)

2026-05-06 16:54:58 +02:00

.vscode

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

src

fix: bypass CSRF for Web Share Target POST

2026-05-06 18:58:39 +02:00

static

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

.env.example

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

.gitignore

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

.npmrc

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

backend.issue.md

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

Dockerfile

fix: ffmpeg/yt-dlp/tz in image, UID 1000, reactive accent store

2026-05-06 17:35:39 +02:00

package-lock.json

chore: update package-lock.json to sync with package.json

2026-05-06 16:57:21 +02:00

package.json

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

README.md

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

svelte.config.js

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

tsconfig.json

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

vite.config.ts

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

vitest.config.ts

Initial commit: Tonemark PWA

2026-05-06 16:41:25 +02:00

README.md

whisper-pwa

A SvelteKit PWA that transcribes YouTube videos and audio/video files using a local whisper-rtx2080 backend.

Features

Web Share Target — share YouTube URLs, video or audio files directly from your phone or browser
Smart audio preparation — multi-strategy FFmpeg pipeline (auto/standard/aggressive/none) before submission
One job → one webhook — whisper-rtx2080 handles internal chunking; we receive a single webhook when done
Live progress — SSE stream showing chunk N of M + percentage
Post-processing — collapse repeats, n-gram deduplication to clean up hallucinations
4 output formats — SRT, plain TXT, Markdown (with timestamps), JSON
Web Push notifications — get notified when the transcript is ready (works on mobile too)

Requirements

Node.js 20+
FFmpeg in $PATH
yt-dlp in $PATH (for YouTube URLs)
Docker + whisper-rtx2080 running (or reachable at WHISPER_URL)

Setup

1. Install dependencies

npm install

2. Generate VAPID keys (one-time)

npx web-push generate-vapid-keys

3. Create `.env`

cp .env.example .env
# Edit .env and fill in all values

Required env vars:

Variable	Example	Description
`WHISPER_URL`	`http://localhost:8080`	whisper-rtx2080 base URL
`WEBHOOK_BASE_URL`	`http://192.168.1.x:3000`	Reachable from inside Docker
`OUTPUT_DIR`	`/home/user/transcripts`	Where to write output files
`VAPID_PUBLIC_KEY`	`BNxx...`	From `npx web-push generate-vapid-keys`
`VAPID_PRIVATE_KEY`	`xxxx`	From `npx web-push generate-vapid-keys`
`VAPID_SUBJECT`	`mailto:you@example.com`	Contact for push service
`DATA_DIR`	`/home/user/.whisper-pwa`	SQLite DB + tmp audio (default: `~/.whisper-pwa`)

Important

: WEBHOOK_BASE_URL must be the IP/hostname reachable from inside the whisper-rtx2080 Docker container (not localhost).

4. Build and run

npm run build
npm start

Visit http://localhost:3000.

5. For development

npm run dev

Audio preparation modes

Mode	Description
`auto` (default)	volumedetect → boost quiet audio + denoise + EBU R128 loudnorm
`standard`	Highpass 80Hz + lowpass 8kHz + EBU R128 loudnorm
`aggressive`	Standard + FFT denoiser (`afftdn`) + noise gate (`agate`)
`none`	Convert to 16kHz mono WAV only

All modes trim leading silence to prevent Whisper hallucinations at file start.

API

Endpoint	Method	Description
`/api/jobs`	POST	Create job (`{ source, title, audioMode }`)
`/api/jobs`	GET	List recent jobs
`/api/jobs/[id]`	GET	Poll job status
`/api/jobs/[id]`	DELETE	Cancel job
`/api/jobs/[id]/stream`	GET (SSE)	Live progress stream
`/api/jobs/[id]/download/[format]`	GET	Download SRT/TXT/MD/JSON
`/api/jobs/[id]/reprocess`	POST	Re-run post-processing on stored segments
`/api/webhook/[jobId]`	POST	Whisper completion webhook (called by whisper-rtx2080)
`/api/push`	GET	Get VAPID public key
`/api/push`	POST	Register push subscription
`/share`	POST	Web Share Target entry point

whisper-rtx2080 internals

The backend handles all audio chunking internally (60s chunks with 30s snap window, silence-detected at −35dB). We submit one WAV file per job and receive one webhook when all chunks are transcribed.

SSE progress events from the backend include { percent, chunk, total } relayed live to the browser.

Languages

TypeScript 63.8%

Svelte 33.3%

CSS 1.5%

Dockerfile 0.7%

HTML 0.4%

Other 0.3%

README.md Unescape Escape

whisper-pwa

Features

Requirements

Setup

1. Install dependencies

2. Generate VAPID keys (one-time)

3. Create .env

4. Build and run

5. For development

Audio preparation modes

API

whisper-rtx2080 internals

README.md

3. Create `.env`