Files

Giancarmine Salucci 56d3aec3e2 fix(RECIPE-0006): complete iteration 1 - unit tests for Instagram caption extraction

- Exported cleanText() and extractFromDOM() for unit testing
- Fixed metadata prefix regex to handle optional quotes
- Created comprehensive unit tests with mocked Playwright Page (15 tests, 12ms)
- All 275 tests passing

2026-02-17 11:03:33 +01:00

58 KiB

Raw Blame History

Findings & Research Documentation

Last Updated: 2026-02-15T00:00:00.000Z
JIRA: RECIPE-0001
Status: Initialized

Purpose

This document tracks research findings, analysis results, and technical discoveries made during development. Each agent (Planner, Developer, Reviewer) appends findings as they work through the pipeline.

Initial Codebase Analysis

Language & Framework

Primary Language: TypeScript 5.9.3
Framework: SvelteKit 2.48.5 with Svelte 5.43.8
Runtime: Node.js 22+
Package Manager: npm

Project Type

Progressive Web Application (PWA) for extracting recipes from Instagram posts and uploading them to Tandoor Recipe Manager.

Architecture Style

Hexagonal Architecture (Ports and Adapters):

Domain logic in src/lib/server/
External system adapters: Instagram, Tandoor, LLM, Browser
Clear separation between client and server code

Key Technical Components

Queue Management System: In-memory FIFO queue with async processing
Three-Phase Pipeline: Extraction → Parsing → Uploading
Real-Time Updates: Server-Sent Events (SSE) for progress tracking
Push Notifications: Web Push API for background notifications
PWA Features: Service worker, manifest, install prompts

Design Patterns Identified

Singleton: QueueManager, QueueProcessor, PushNotificationService
Factory: createLLM(), createBrowserContext(), initializeBrowser()
Observer: Queue subscription system, SSE streaming
Adapter: Instagram, Tandoor, LLM, Browser adapters
Strategy: Multiple extraction methods with fallback

Dependencies Overview

Production (6 dependencies):

Browser automation: playwright
LLM integration: openai
Utilities: uuid, date-fns, zod

Development (26+ dependencies):

Framework: @sveltejs/kit, svelte, vite
Testing: vitest, @vitest/browser-playwright
Styling: tailwindcss
Tooling: typescript, eslint, prettier

File Structure

52 total TypeScript/JavaScript files
├── 39 TypeScript files (.ts)
├── 10+ Svelte components (.svelte)
├── 3 JavaScript config files (.js)
└── Multiple test files (.spec.ts)

Code Quality Indicators

Strict TypeScript: strict: true enabled
Comprehensive Testing: 138 tests across unit, integration, and browser tests
Linting: ESLint with TypeScript and Svelte plugins
Formatting: Prettier with Svelte and Tailwind plugins
Type Safety: Zod schemas for runtime validation

Environment Configuration

Required variables:

OPENAI_API_KEY - LLM access
TANDOOR_URL - Recipe manager URL (optional)
TANDOOR_TOKEN - API authentication (optional)
QUEUE_CONCURRENCY - Processing limit (default: 2)
QUEUE_MAX_RETRIES - Retry attempts (default: 3)

Deployment Setup

Docker: Dockerfile with Node.js 22 Alpine + Chromium
HTTPS: Local SSL certificates for PWA features
Production: Node.js adapter for SvelteKit

Notable Features

Multi-Method Extraction: 4-strategy cascade with intelligent fallback
Progress Tracking: Real-time callbacks throughout extraction pipeline
Thumbnail Validation: HTTP status code checking for image URLs
Retry Logic: Configurable retry attempts for failed extractions
Scheduler: Background task execution with authentication

Technical Debt & Opportunities

Identified Issues

Deprecated Endpoints: /api/extract returns 410 Gone (migration helper)
In-Memory Queue: No persistence - items lost on server restart
Single Instance: Queue state not shared across multiple server instances

Potential Improvements

Queue Persistence: Redis or database-backed queue for durability
Horizontal Scaling: Shared queue state for multi-instance deployments
Rate Limiting: Instagram request throttling to avoid blocks
Caching: Extracted content caching to reduce redundant processing

Research Findings

This section will be populated by the Planner agent during task analysis.

[Planner] Research Notes - RECIPE-0001 (2026-02-15)

Task: Fix model loading issue and frontend error display

Issue 1: Model Loading - "400 No models loaded"

Research Date: 2026-02-15
Source: Stack trace analysis, OpenAI SDK documentation, LM Studio/LiteLLM API patterns

Problem Analysis:

Error occurs at detectRecipe() in src/lib/server/parser.ts
OpenAI-compatible APIs (LM Studio, LiteLLM, Ollama, etc.) often require models to be explicitly loaded
Current implementation assumes model is already loaded
Error message contains provider-specific instructions ("use the 'lms load' command")

OpenAI-Compatible Model Loading Patterns:

LM Studio: Uses /v1/models endpoint to list available models
- Loaded models appear in response with "id": "model-name"
- No programmatic loading endpoint (manual load in UI)
LiteLLM: Uses /v1/models to list loaded models
- Models must be configured in server startup
- No dynamic loading endpoint
Ollama: Uses /api/tags for model list and /api/pull for loading
- Different API structure (not /v1 prefix)
Generic OpenAI-compatible: Most follow OpenAI's /v1/models endpoint
- No standard for dynamic model loading
- Usually require pre-configuration

Solution Approach:

Check if model exists via client.models.list()
If model not found/loaded, provide clear user-facing error
Remove provider-specific error messages
Add notification when model check succeeds
Consider future enhancement: detect provider type and attempt auto-load if supported

Files Affected:

src/lib/server/llm.ts - Add model availability check
src/lib/server/parser.ts - Handle model not loaded error
src/lib/server/queue/QueueProcessor.ts - User notification

Issue 2: Frontend Error Display - "[object Object]"

Research Date: 2026-02-15
Source: Code analysis of QueueItemCard.svelte, types.ts, QueueManager.ts

Problem Analysis:

Error structure is an object: { phase, message, recoverable, timestamp }
Frontend displays {item.error} directly (line 205 of QueueItemCard.svelte)
Svelte renders object.toString() → "[object Object]"

Current Implementation:

// types.ts - Error is an object
error?: {
  phase: ProcessingPhase;
  message: string;
  recoverable: boolean;
  timestamp: string;
}

// QueueItemCard.svelte line 205 - Displays object directly
<div class="text-sm text-red-700 mt-1">{item.error}</div>

Solution: Change to: {item.error?.message || item.error}

Handles object error (gets .message)
Handles legacy string errors (fallback)
Type-safe with optional chaining

Files Affected:

src/routes/components/QueueItemCard.svelte - Display error.message

Dependencies & Constraints (from ARCHITECTURE.md)

Using openai@^4.20.0 SDK
Environment: OPENAI_BASE_URL, OPENAI_API_KEY, LLM_MODEL
Current config example: http://192.168.1.10:1234/v1 (LM Studio)
Must maintain OpenAI-compatible API contract
No assumption about specific provider implementation

Code Style Requirements (from CODE_STYLE.md)

Use SvelteKit $env/dynamic/private for env vars (already correct)
Error handling: try-catch with descriptive messages
Console logging: [Component] Message format
Type safety: TypeScript strict mode enabled

[Developer] Implementation Notes

[Reviewer] Review Notes

API Endpoint Catalog

Active Endpoints

Queue Management

POST /api/queue - Enqueue Instagram URL for processing
GET /api/queue - List queue items (supports filtering, pagination)
GET /api/queue/stream - SSE stream for real-time updates
GET /api/queue/{id} - Get specific queue item details
DELETE /api/queue/{id} - Remove item from queue
POST /api/queue/{id}/retry - Retry failed extraction

Push Notifications

POST /api/notifications/subscribe - Subscribe to push notifications
DELETE /api/notifications/subscribe - Unsubscribe from notifications
GET /api/notifications/vapid-key - Get VAPID public key

Health & Status

GET /api/health - Application health check
GET /api/llm-health - LLM service availability check

Tandoor Integration

POST /api/tandoor - Upload recipe to Tandoor
GET /api/tandoor-config - Get Tandoor configuration status

Legacy/Deprecated

POST /api/extract - ⚠️ Deprecated (returns 410 Gone)

Known Constraints

Browser Automation

Requires Chromium/Chrome installation
Headless mode used in production
Cookie handling for authenticated Instagram content

LLM Integration

Requires OpenAI-compatible API endpoint
Configurable model selection
Structured output using Zod schemas

Tandoor Integration

Optional feature (disabled without credentials)
Requires Tandoor API token
Supports ingredient partitioning across steps

SSL Requirements

HTTPS required for Service Worker registration
Local development uses self-signed certificates
Certificates managed via external Caddy CA

Testing Coverage

Test Distribution

Unit Tests: Core logic validation
Integration Tests: Multi-component workflows
API Tests: Endpoint behavior verification
Browser Tests: Svelte component rendering

Test Files

queue-manager.spec.ts
queue-processor.spec.ts
queue-api.spec.ts
queue-sse.spec.ts
scheduler.spec.ts
instagram-url-validation.spec.ts
thumbnail-validation.spec.ts
extraction-url-validation.integration.spec.ts
page.svelte.spec.ts

Mock Strategy

Environment variables mocked via vi.mock('$env/dynamic/private')
External services mocked at module level
Browser automation mocked for unit tests

Documentation Inventory

Existing Documentation

README.md - Project overview and setup
docs/API.md - API endpoint specifications
docs/MIGRATION.md - Migration guides
docs/SVELTEKIT_SSR_GUIDE.md - SSR implementation notes
docs/TESTING.md - Testing guide and mocking patterns
docs/Tandoor (2.3.6).yaml - OpenAPI spec for Tandoor

Plan Documentation

docs/plans/ contains 20+ implementation plans:

Execution plans for completed features
Technical specifications
Story breakdowns with acceptance criteria

Outcome Documentation

docs/outcomes/ contains 20+ outcome reports:

Implementation summaries
Changes made
Testing results
Lessons learned

Agent Pipeline Notes

Build Commands

Build: npm run build
Test: npm test (alias for npm run test:unit -- --run)
Dev: npm run dev
Lint: npm run lint
Format: npm run format

Development Workflow

Make changes in src/
Run tests: npm test
Verify build: npm run build
Test locally: npm run dev

Continuous Integration

ESLint checks code quality
Prettier enforces formatting
TypeScript checks type safety
Vitest runs test suite

Next Steps

This document will be updated by subsequent agents:

Planner: Append research findings and analysis
Developer: Document implementation discoveries
Reviewer: Record review observations and recommendations

[Planner] Research Notes - RECIPE-0002 (2026-02-16)

Task: Complete PWA implementation (installability, push notifications, share target)

PWA Documentation Research

Research Date: 2026-02-16
Sources: MDN Web Docs, web.dev, W3C specifications

Progressive Web Apps (PWA) - Key Requirements:

Web App Manifest (manifest.json)
- Required members: name or short_name, icons (192x192 PNG minimum), start_url, display
- Share target support via share_target member (method, action, params)
- Icons should include 192x192 and 512x512 sizes for optimal display
- Browser compatibility: Chrome/Edge (full), Firefox/Safari (limited for share_target)
Service Worker
- Must be registered to enable offline functionality
- Lifecycle: install → activate → fetch events
- Required for push notifications
- Must be served over HTTPS (or localhost)
HTTPS Requirement
- Mandatory for service worker registration
- Required for push notifications and other secure contexts
- Local development: http://localhost is treated as secure
Installability Criteria (from MDN/web.dev):
- Valid manifest with required members
- Service worker registered with fetch event handler
- Served over HTTPS
- At least one 192x192 PNG or SVG icon
- Display mode set (fullscreen, standalone, minimal-ui)

Push Notifications (Web Push API):

Requires service worker to receive push events
VAPID authentication (application server keys) required for Chrome
Subscription process: permission → subscribe → store subscription → send push
Push service (browser vendor controlled) routes messages
Notification permissions: default, granted, denied
Best practice: request permission after user interaction

Web Share Target API:

Registers PWA as share destination
Configuration via manifest share_target member
Supports GET or POST methods
params define query string mapping (title, text, url)
Files can be shared via POST with multipart/form-data
Currently Chrome/Edge only (experimental)
App must be installed to appear in share sheet

Current Implementation Analysis

Research Date: 2026-02-16
Files Analyzed: manifest.json, service-worker.ts, app.html, svelte.config.js, PWAInstallManager.ts, PushNotificationManager.ts

Manifest Analysis (static/manifest.json):

✅ Has all required PWA members (name, short_name, start_url, display, scope, theme_color, background_color)
✅ Share target configured correctly (GET /share with title/text/url params)
⚠️ Icons reference /favicon.png but file does NOT exist in static folder
⚠️ Uses same icon path for both 192x192 and 512x512 sizes
ℹ️ Missing optional but recommended members: description, screenshots, categories

Service Worker Analysis (src/service-worker.ts):

✅ Native SvelteKit service worker (migrated from vite-pwa plugin)
✅ Install event: caches all build assets and static files
✅ Activate event: cleans up old caches
✅ Fetch event: cache-first for assets, network-first with cache fallback for others
✅ Push event handler: processes push messages, shows notifications with actions
✅ Notification click handler: opens/focuses app, handles action buttons
✅ Notification close handler: tracks dismissals
✅ Background sync handler: supports retry operations
✅ Message handler: supports service worker communication
✅ Global error handlers present

Service Worker Registration (svelte.config.js):

✅ serviceWorker.register: true enabled
✅ SvelteKit handles registration automatically

Manifest Link (src/app.html):

✅ <link rel="manifest" href="/manifest.json"> present in head

Client-Side Managers:

✅ PushNotificationManager.ts: Full implementation with permission, subscribe, unsubscribe
✅ PWAInstallManager.ts: beforeinstallprompt handling, install prompt triggering
✅ Both are SSR-safe with browser guards

Share Target (/share route):

✅ Route exists at src/routes/share/+page.svelte
✅ Parses query params (text, url) from share target
✅ Extracts Instagram URLs from shared text
✅ Auto-processes URLs on mount
✅ Enqueues items and redirects to dashboard

Icons/Assets Issue:

⚠️ CRITICAL: manifest.json references /favicon.png but file doesn't exist
✅ src/lib/assets/favicon.svg exists (used in layout)
⚠️ No PNG icons in static/ folder
⚠️ Service worker references /favicon.png for notifications

Push Notifications Infrastructure:

✅ VAPID keys configured in queueConfig.push (uses env vars or defaults)
✅ Server endpoint: /api/notifications/vapid-key (GET)
✅ Server endpoint: /api/notifications/subscribe (POST/DELETE)
✅ PushNotificationService stores subscriptions in-memory
ℹ️ Note: Subscriptions are not persisted (lost on restart)

What Works Already:

PWA Structure: Complete Native SvelteKit PWA implementation
Service Worker: Fully functional with caching, push, notifications
Push Notifications: Client and server infrastructure in place
Share Target: Configured in manifest and /share route working
Install Prompts: PWAInstallManager ready to trigger install
HTTPS: App served at https://localhost:5173/

What Needs Attention:

Icons: Create PNG icons (192x192, 512x512) from existing SVG
Icon Verification: Ensure icons are properly sized and optimized
Installability Testing: Verify all criteria met via chrome://pwa-internals
Push Notification Testing: Verify VAPID key generation and push flow
Share Target Testing: Test share from external apps (Instagram)
Manifest Enhancement: Add description, categories for better discoverability

Dependencies & Constraints (from ARCHITECTURE.md, CODE_STYLE.md):

Using native SvelteKit PWA (no plugins needed)
Service worker: $service-worker module provides build, files, version
Environment: uses $env/dynamic/private for server configs
HTTPS required (already configured at https://localhost:5173/)
TypeScript strict mode enabled
All file paths must use SvelteKit path aliases ($lib, $service-worker)

Code Style Requirements (from CODE_STYLE.md):

FilesNaming: manifest.json, service-worker.ts, lowercase for utilities
Type annotations required for public APIs
SSR-safe code: all browser API usage must be guarded with browser check
Error handling: try-catch with descriptive messages
Comments: JSDoc for public APIs, inline for complex logic

[Planner] Research Notes - RECIPE-0003 (2026-02-16)

Task: Update application icon and configure Docker deployment

PWA Icon Generation - icon-source.png

Research Date: 2026-02-16
Source: Project analysis, PWA best practices, sharp documentation

Icon Source File:

Location: static/icon-source.png
Size: 672KB PNG file
Format: PNG with transparency (confirmed via file analysis)
Destination sizes: 192x192 (favicon.png), 512x512 (icon-512.png)

PWA Icon Requirements: From RECIPE-0002 research and W3C Web App Manifest specification:

Minimum Size: 192x192 pixels (required for PWA installability)
Recommended Size: 512x512 pixels (for splash screens, high-DPI displays)
Format: PNG with transparency support
Purpose: "any maskable" for optimal Android compatibility
Location: static/ directory (served at root path)

Sharp Library Configuration:

Version: 0.34.5 (already in dependencies)
Method: resize() with fit: 'contain' to preserve aspect ratio
Background: transparent (rgba 0,0,0,0)
Format: PNG with optimization
Quality: Default compression for web delivery

Implementation Pattern:

await sharp('static/icon-source.png')
  .resize(192, 192, {
    fit: 'contain',
    background: { r: 0, g: 0, b: 0, alpha: 0 }
  })
  .png()
  .toFile('static/favicon.png');

Rationale:

fit: 'contain' preserves aspect ratio without cropping
Transparent background maintains icon transparency
PNG format required by Web App Manifest spec
Same approach for both 192x192 and 512x512 variants

Docker Volume Configuration

Research Date: 2026-02-16
Source: Codebase analysis, Dockerfile, scheduler.ts, extraction.ts

Volume Requirements Analysis: From code analysis, only one persistent volume is required:

1. /app/secrets - Instagram Authentication Storage

Purpose: Persist Instagram session cookies across container restarts
File: auth.json (Playwright storage state)
Usage:
- scheduler.ts: Checks /app/secrets/auth.json for Docker deployments
- extraction.ts: Loads authentication from /app/secrets/auth.json
- gen-auth.js: Browser automation saves session to secrets/auth.json
Rationale: Prevents re-login on every container restart
Docker Path: /app/secrets
Host Path: ./secrets (relative to docker-compose.yml)

Volumes NOT Required:

Database: Queue uses in-memory storage (QueueManager.ts)
Cache: Service worker cache is ephemeral
Uploads: No file upload functionality
Logs: Console logs to stdout/stderr (Docker logging)
Build artifacts: Built into image at build time

VOLUME Directive:

VOLUME ["/app/secrets"]

docker-compose.yml Volume Mount:

volumes:
  - ./secrets:/app/secrets

Environment Variable Inventory

Research Date: 2026-02-16
Source: queue/config.ts, llm.ts, tandoor-config.ts, scheduler.ts

Comprehensive Variable List:

LLM Configuration (REQUIRED):

OPENAI_BASE_URL - OpenAI-compatible API endpoint
OPENAI_API_KEY - API authentication key
LLM_MODEL - Model identifier (default: gpt-4o)

Queue Configuration (OPTIONAL):

QUEUE_CONCURRENCY - Parallel processing limit (default: 2)
QUEUE_MAX_RETRIES - Retry attempts (default: 3)

Tandoor Integration (OPTIONAL):

TANDOOR_ENABLED - Enable Tandoor upload (default: false)
TANDOOR_SERVER_URL - Tandoor base URL
TANDOOR_SPACE - Space ID (default: 1)
TANDOOR_TOKEN - API token

Push Notifications (OPTIONAL):

VAPID_PUBLIC_KEY - Web Push public key (has default)
VAPID_PRIVATE_KEY - Web Push private key (has default)

Authentication Scheduler (OPTIONAL):

AUTH_SCHEDULER_ENABLED - Enable auto-renewal (default: false)
AUTH_SCHEDULER_INTERVAL_MINUTES - Renewal interval (default: 720)

Runtime Configuration:

NODE_ENV - Environment mode (production/development)
PORT - SvelteKit port (default: 3000)
DISPLAY - X11 display for Playwright (set to :99 in docker-compose.yml)

Default Values: All variables have sensible defaults except:

OPENAI_BASE_URL (required)
OPENAI_API_KEY (required)

VAPID Keys: Current defaults in queue/config.ts:

Public: BNextdcB_fQ0BVvyGioM5L8Tf9vKQjs-WnF-rUbnU8MdWIZQYfggIHxBnW21I-lq_0HykLCdMpYj8d5joavWdxQ
Private: JwxI_KcsBcehYcTOufMcbVWJjCq1QbH5FJmSyQuG680
Note: These should be regenerated for production deployments

Variable Access Pattern:

Server-side only: Uses $env/dynamic/private from SvelteKit
No client-side environment variable exposure
Runtime configuration (no build-time substitution)

Docker Health Check Configuration

Research Date: 2026-02-16
Source: routes/api/health/+server.ts analysis

Health Check Endpoint:

Path: /api/health
Method: GET
Response: 200 OK with JSON body
Implementation: src/routes/api/health/+server.ts

Health Check Response:

{
  "status": "ok",
  "timestamp": "2026-02-16T..."
}

Docker Health Check Configuration:

healthcheck:
  test: ["CMD", "node", "-e", "fetch('http://localhost:3000/api/health').then(r => r.ok ? process.exit(0) : process.exit(1)).catch(() => process.exit(1))"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 40s

Rationale:

interval: 30s - Balance between responsiveness and overhead
timeout: 10s - Sufficient for app initialization
retries: 3 - Allow transient failures
start_period: 40s - Accounts for Playwright browser initialization
Uses internal fetch to avoid curl dependency

Docker Deployment Constraints

Research Date: 2026-02-16
Source: Dockerfile, app.server.ts, browser.ts

Current Dockerfile Analysis:

Base: node:22-alpine (minimal, production-ready)
Chromium: Installed via apk (headless browser for Instagram extraction)
Fonts: liberation-fonts, noto, noto-cjk (text rendering)
Build: npm ci + npm run build
Runtime: Node.js ESM import
Port: 3000 (EXPOSE)
Environment: NODE_ENV=production

Browser Initialization: From app.server.ts:

initializeBrowser() called on server start
Graceful shutdown handlers (SIGTERM, SIGINT)
Critical for extraction.ts Playwright usage

Security Options:

seccomp=unconfined - Required for Chromium sandbox
--no-sandbox in browser.ts launch args
Necessary for containerized Chromium

No Changes Required: Current Dockerfile is production-ready, only needs VOLUME addition.

[Planner] Research Notes - RECIPE-0003 Iteration 1 (2026-02-16)

Task: Fix Docker deployment issues (Alpine packages, Playwright installation)

Alpine Linux Font Packages

Research Date: 2026-02-16
Source: https://wiki.alpinelinux.org/wiki/Fonts, Alpine package database

Incorrect Package Names in Current Dockerfile:

liberation-fonts → No such package (ERROR)
noto → No such package (ERROR)
noto-cjk → No such package (ERROR)

Correct Alpine Font Package Names:

font-liberation → Correct (already in Dockerfile)
font-noto → Correct name for Noto fonts
font-noto-cjk → Correct name for Noto CJK (Chinese, Japanese, Korean) fonts

Rationale:

Alpine Linux uses font-* prefix for all font packages
Common mistake: using Debian/Ubuntu package names which differ from Alpine
These fonts are essential for rendering text in Instagram content extraction

Recommended Font Installation:

RUN apk add --no-cache \
    chromium \
    font-liberation \
    font-noto \
    font-noto-cjk

Playwright on Alpine Linux

Research Date: 2026-02-16
Source: https://playwright.dev/docs/docker, Playwright GitHub issues

Official Playwright + Alpine Status:

Not officially supported: Browser builds require glibc, Alpine uses musl
Firefox/WebKit: Cannot run on Alpine (glibc dependency)
Chromium: Can work using system chromium package

Problem Analysis:

Current Dockerfile installs system chromium via apk add chromium
Playwright's chromium.launch() expects Playwright's own Chromium binary
Playwright's Chromium is built for glibc environments (Ubuntu/Debian)
npx playwright install chromium will download glibc binary that won't run on Alpine

Solution: Configure Playwright to Use System Chromium

Approach A - Use System Chromium (Recommended):

// src/lib/server/browser.ts
browser = await chromium.launch({
  executablePath: '/usr/bin/chromium-browser',
  headless: true,
  args: [...]
});

Environment Variable Approach:

ENV PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1
ENV PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH=/usr/bin/chromium-browser

Approach B - Switch to Debian Base:

FROM node:22-bookworm
RUN npx -y playwright@1.56.1 install --with-deps chromium

Recommendation:

Use Approach A (system chromium with executablePath)
Minimal changes to existing Alpine setup
System chromium is already installed and working
Avoids full base image migration

Chromium System Dependencies: When using system chromium on Alpine, these packages are auto-installed as dependencies:

ca-certificates, mesa-gbm, wayland-libs-server, libxkbcommon
ffmpeg-libs, gtk+3.0, libexif, libevent, nss, etc. (64 total dependencies)

Playwright Version Compatibility

Research Date: 2026-02-16
Source: package.json analysis

Current Version: playwright@1.56.1 (production dependency) Chromium Version: Bundled with Playwright 1.56.1

System Chromium Compatibility:

Alpine edge: chromium 145.0.7632.75 (as of 2026-02-15)
Playwright 1.56.1 expects: Chromium ~133.x
Version mismatch OK: Playwright API is compatible across minor Chromium versions
System chromium is newer, should work without issues

executablePath Configuration:

Path on Alpine: /usr/bin/chromium-browser
Must be set in browser.ts or via environment variable
No additional Playwright installation needed when using system browser

Docker Compose Configuration for Playwright

Research Date: 2026-02-16
Source: resolution_context.yaml, docker-compose.yml analysis

Current Configuration Analysis:

environment:
  - DISPLAY=:99  # X11 display (not needed for headless)
security_opt:
  - seccomp=unconfined  # Required for Chromium sandbox

Issues:

DISPLAY=:99 set but no X11 server (Xvfb) running
Headless mode doesn't need DISPLAY
docker-compose.yml has DISPLAY but it's unused

Recommendation:

Keep DISPLAY=:99 as harmless fallback (no changes needed)
seccomp=unconfined is necessary for Chromium sandbox (keep as-is)
No additional configuration needed for Playwright

[Planner] Node.js Versions and npm Lockfile Compatibility - RECIPE-0003 Iteration 2 (2026-02-16)

Research Date: 2026-02-16T17:00:00.000Z
Source: Node.js Release Schedule, npm documentation (v10 & v11), Docker Hub

Problem Analysis

Docker build fails at npm ci with error: "package-lock.json and package.json are out of sync"

Root Cause: package.json updated to Tailwind v4, but package-lock.json still contains Tailwind v3 dependencies (@csstools/*)
Secondary Issue: npm version mismatch - local (npm 11.6.2) vs Docker (npm 10.9.4)

Node.js LTS Status Research

Source: https://github.com/nodejs/release, https://nodejs.org/en/about/previous-releases

Currently Supported Versions:

Node.js 20 (Iron): Maintenance LTS - EOL 2026-04-30
Node.js 22 (Jod): Maintenance LTS - EOL 2027-04-30 ← Current Dockerfile
Node.js 24 (Krypton): Active LTS - EOL 2028-04-30 ← Best choice
Node.js 25: Current (not LTS) - EOL 2026-06-01

LTS Phase Definitions:

Current: Latest features, 6-month cycle for odd versions
Active LTS: Audited features and updates (18 months for even versions since v12)
Maintenance: Critical fixes only (12 months)

Conclusion: Node.js 24 is Active LTS (until Oct 2026) providing better support than Node.js 22 (already in Maintenance).

npm Lockfile Version Compatibility

Source: https://docs.npmjs.com/cli/v10/configuring-npm/package-lock-json, https://docs.npmjs.com/cli/v11/configuring-npm/package-lock-json

Lockfile Version History:

lockfileVersion: 1 - npm v5-v6
lockfileVersion: 2 - npm v7-v8 (backwards compatible with v1)
lockfileVersion: 3 - npm v9+ (backwards compatible with v7)

npm Version Bundled with Node.js:

node:22-alpine → npm 10.9.4 (uses lockfileVersion: 3)
node:24-alpine → npm 11.x (uses lockfileVersion: 3)
Local environment → npm 11.6.2 (uses lockfileVersion: 3)

Compatibility Analysis:

Current package-lock.json has "lockfileVersion": 3 ✓
npm 10 and npm 11 both support lockfileVersion: 3 ✓
The issue is NOT version incompatibility but stale dependency data

npm ci Strict Behavior: npm ci performs strict validation:

Requires exact match between package.json and package-lock.json
Does not update lockfile automatically (unlike npm install)
Fails if dependencies are missing or mismatched
This is intentional for reproducible builds in CI/CD

Tailwind CSS v3 → v4 Migration Impact

Source: package.json analysis, package-lock.json inspection

Current State:

// package.json (Tailwind v4)
"@tailwindcss/vite": "^4.1.17",
"tailwindcss": "^4.1.17"

// package-lock.json (still has Tailwind v3 transitive deps)
"@csstools/css-parser-algorithms": "3.0.5",
"@csstools/css-tokenizer": "3.0.4"

Why This Happened:

package.json was updated to Tailwind v4
package-lock.json was NOT regenerated afterward
Tailwind v4 has different dependency tree than v3 (no @csstools/*)
npm ci detects mismatch and fails

Solution Options Analysis

Option A: Regenerate with Docker node:22-alpine (Review's RECOMMENDED)

docker run --rm -v "$PWD":/app -w /app node:22-alpine sh -c "rm package-lock.json && npm install"

✓ Ensures exact npm version match with deployment
✗ Stays on Maintenance LTS (Node 22)
✗ Doesn't align with local development (node 24)

Option B: Update to node:24-alpine

FROM node:24-alpine

rm package-lock.json && npm install

✓ Uses Active LTS (better support)
✓ Aligns Docker with local development
✗ Changes base image (minimal risk)

Option C: Hybrid (BEST SOLUTION)

Update Dockerfile to node:24-alpine
Regenerate package-lock.json locally (npm 11.x matches node:24)

✓ Active LTS with longer support window
✓ Perfect alignment between local dev and Docker
✓ Single lockfile regeneration
✓ Future-proof (Active LTS until Oct 2026)

Chosen Approach: Option C

Implementation Details

Files to Modify:

Dockerfile - Change FROM node:22-alpine → node:24-alpine
package-lock.json - Regenerate to sync with package.json

Verification Steps:

npm install - Regenerate lockfile
npm run build - Verify local build
npm test - Verify all tests pass
docker build - Verify Docker build succeeds
docker compose up - Verify runtime

No Code Changes Needed:

All application code remains unchanged
.env.example already complete (no new variables)
docker-compose.yml does not need changes (node version transparent)

[Planner] Research Notes - RECIPE-0004 (2026-02-16)

Task: Fix .dockerignore, favicon.ico, push notifications, e2e tests, and logging serialization

.dockerignore Research

Research Date: 2026-02-16
Source: Project analysis, .gitignore comparison, Docker best practices

Current State:

No .dockerignore file exists in project root
.gitignore exists and excludes: node_modules, build outputs, env files, SSL certs, symlinks, prompts/

Docker Build Context Issues: Without .dockerignore, Docker sends entire workspace to build context including:

node_modules/ (if exists locally) - causes conflicts with npm ci in Dockerfile
build/ outputs - unnecessary
.git/ directory - large, unused in container
prompts/ directory - development artifacts
.env files - should use environment variables instead

Recommended .dockerignore Content: Based on .gitignore and Docker best practices:

node_modules
.git
build
.output
.vercel
.netlify
.wrangler
.svelte-kit
.DS_Store
Thumbs.db
.env
.env.*
!.env.example
.ssl/
vite.config.*.timestamp-*
debug_page.txt
prompts/
*.md
!README.md
.github/
.vscode/
*.log
coverage/
.vitest/

Rationale:

Exclude development dependencies and build artifacts
Keep README.md for documentation
Exclude version control metadata
Reduce build context size significantly
Prevent conflicts with Dockerfile's npm ci

Favicon 404 Error Research

Research Date: 2026-02-16
Source: Static folder analysis, browser behavior, PWA specifications

Files Present:

static/favicon.png (192x192 PNG) ✓ exists
static/icon-512.png (512x512 PNG) ✓ exists
static/icon-source.png (source file) ✓ exists
static/manifest.json references both PNG files ✓

404 Source:

Browsers automatically request /favicon.ico (legacy format)
SvelteKit serves from static/ folder
No favicon.ico file exists → 404 error

Solution Options:

Option A - Create favicon.ico (Recommended): Use Sharp to generate ICO from PNG source:

// New script: scripts/gen-favicon-ico.js
await sharp('static/icon-source.png')
  .resize(32, 32)
  .png()
  .toFile('static/favicon.ico');

Option B - SvelteKit Hook Redirect: Add server hook to redirect /favicon.ico → /favicon.png

More complex
Adds runtime overhead
Not recommended

Chosen Approach: Option A (generate favicon.ico during build)

Push Notifications Implementation Research

Research Date: 2026-02-16
Source: PushNotificationService.ts, web-push library docs, Web Push Protocol RFC 8030

Current Implementation Analysis:

Client-Side (Complete):

PushNotificationManager.ts - Full implementation ✓
- Permission request ✓
- VAPID key fetch ✓
- pushManager.subscribe() ✓
- Server subscription registration ✓
service-worker.ts - Push event handler ✓
NotificationSettings.svelte - UI toggle ✓

Server-Side (Mock Only):

// Current PushNotificationService.ts line 106-125
private async sendToSubscription(subscription: PushSubscription, data: any): Promise<void> {
  // In production, use web-push library:
  // [COMMENTED OUT CODE]
  
  // For development, we'll log the notification
  console.log(`[PushService] Would send push notification:`, {
    endpoint: subscription.endpoint,
    data: data
  });
  
  await new Promise(resolve => setTimeout(resolve, 100)); // Simulate
}

Problem: Push notifications are logged but never actually sent to browser.

Web Push Library Integration:

1. Install Dependency:

// package.json
{
  "dependencies": {
    "web-push": "^3.6.7"
  }
}

2. Implementation Pattern:

import webpush from 'web-push';

// On init
webpush.setVapidDetails(
  'mailto:your-email@example.com',
  vapidPublicKey,
  vapidPrivateKey
);

// In sendToSubscription
await webpush.sendNotification(
  subscription,
  JSON.stringify(payload),
  {
    TTL: 60 * 60 * 24 // 24 hours
  }
);

3. Configuration Requirements:

VAPID keys already configured in queueConfig.push
Default keys present (should regenerate for production)
Email contact required by spec (add env var)

Files to Modify:

package.json - add web-push dependency
src/lib/server/notifications/PushNotificationService.ts - implement actual sending
src/lib/server/queue/config.ts - add VAPID_EMAIL env var

Manual Push Notification Test Button Research

Research Date: 2026-02-16
Source: NotificationSettings.svelte, PushNotificationService API

Current UI:

Only has enable/disable toggle
No manual trigger for testing different notification types

Test Button Requirements:

Trigger different notification types:
- Success notification (recipe completed)
- Error notification (parsing failed)
- Progress notification (extraction in progress)
Send to own subscription only
Debug output showing notification payload

Implementation Approach:

Frontend Component: Add to NotificationSettings.svelte:

<button onclick={testNotification('success')}>Test Success</button>
<button onclick={testNotification('error')}>Test Error</button>
<button onclick={testNotification('progress')}>Test Progress</button>

async function testNotification(type: 'success' | 'error' | 'progress') {
  await fetch('/api/notifications/test', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ type })
  });
}

Backend Endpoint: New file: src/routes/api/notifications/test/+server.ts

export const POST: RequestHandler = async ({ request }) => {
  const { type } = await request.json();
  
  const payload = {
    success: { /* ... */ },
    error: { /* ... */ },
    progress: { /* ... */ }
  }[type];
  
  await pushNotificationService.sendNotification(payload);
  return json({ success: true });
};

Playwright E2E Push Notification Testing Research

Research Date: 2026-02-16
Source: Playwright API docs (BrowserContext.grantPermissions), existing test patterns

Playwright Push Notification Testing Pattern:

Key Methods:

context.grantPermissions(['notifications']) - Grant permission without prompt
page.evaluate() - Access PushManager in browser context
page.waitForEvent() - Wait for service worker events

Test Structure:

// New file: src/tests/push-notifications.e2e.spec.ts
import { test, expect } from '@playwright/test';

test.describe('Push Notifications E2E', () => {
  test('should subscribe to push notifications', async ({ browser }) => {
    const context = await browser.newContext();
    await context.grantPermissions(['notifications']);
    
    const page = await context.newPage();
    await page.goto('http://localhost:5173');
    
    // Click notification toggle
    await page.getByRole('button', { name: /enable notifications/i }).click();
    
    // Verify subscription created
    const subscription = await page.evaluate(async () => {
      const reg = await navigator.serviceWorker.ready;
      return await reg.pushManager.getSubscription();
    });
    
    expect(subscription).toBeTruthy();
    expect(subscription.endpoint).toBeDefined();
    
    await context.close();
  });
});

Test Coverage:

Permission grant flow
Subscription creation via PushManager
Server registration (POST /api/notifications/subscribe)
Manual test notification trigger
Subscription persistence in localStorage
Unsubscribe flow

Vitest Configuration: Current project uses Vitest with @vitest/browser-playwright:

Already configured for browser tests
Playwright already installed (playwright@^1.56.1)
Pattern: *.e2e.spec.ts for e2e tests vs *.spec.ts for unit tests

Logging Serialization Research

Research Date: 2026-02-16
Source: Codebase grep analysis, Node.js console behavior, error object structure

Problem Analysis:

Root Cause: JavaScript error objects logged directly show [object Object]:

// Current pattern (WRONG)
console.error('[Label]', error);  // Output: [Label] [object Object]
console.log('[Label]', data);     // Output: [Label] [object Object]

Affected Files (25 matches found):

src/lib/server/extraction.ts - 12 occurrences
src/lib/server/parser.ts - 4 occurrences
src/lib/server/queue/QueueProcessor.ts - 3 occurrences
src/lib/server/notifications/PushNotificationService.ts - 1 occurrence
src/lib/server/api/errorHandler.ts - 1 occurrence
src/lib/server/llm.ts - 2 occurrences
src/lib/server/scheduler.ts - 1 occurrence
Others: QueueManager.ts, tandoor.ts

Solution Patterns:

1. Error Objects:

// GOOD - Extract relevant properties
console.error('[Label]', error.message, error.stack);
console.error('[Label] Error:', {
  message: error.message,
  stack: error.stack,
  name: error.name
});

2. Complex Objects:

// GOOD - JSON.stringify with formatting
console.log('[Label] Data:', JSON.stringify(data, null, 2));

// GOOD - Specific properties
console.log('[Label] Response:', {
  status: response.status,
  statusText: response.statusText,
  body: responseBody
});

3. Utility Function: Create src/lib/server/utils/logger.ts:

export function serializeError(error: unknown): string {
  if (error instanceof Error) {
    return JSON.stringify({
      name: error.name,
      message: error.message,
      stack: error.stack,
      ...error
    }, null, 2);
  }
  return JSON.stringify(error, null, 2);
}

console.error('[Label]', serializeError(error));

Testing Impact:

Logs are visible in Docker deployments (stdout/stderr)
JSON format easier for log aggregation tools
Stack traces preserved for debugging
Human-readable in console

[Planner] Research Notes - RECIPE-0004 Iteration 1 (2026-02-17)

Task: Fix TypeScript type error - NodeJS.Timer should be NodeJS.Timeout in scheduler.ts

Node.js Timer Types Research

Research Date: 2026-02-17
Source: Node.js v25.6.1 Official Documentation (https://nodejs.org/docs/latest/api/timers.html)

Problem Analysis: TypeScript compile error in src/lib/server/scheduler.ts:180:

Argument of type 'Timer' is not assignable to parameter of type 'Timeout'
Type 'Timer' is missing the following properties from type 'Timeout': 
  close, _onTimeout, [Symbol.dispose]

Root Cause: The SchedulerState interface incorrectly uses NodeJS.Timer type for intervalId, but setInterval() returns NodeJS.Timeout and clearInterval() expects NodeJS.Timeout parameter.

Official Node.js API Documentation:

Class: Timeout

Returned by setInterval() and setTimeout()
Can be passed to clearInterval() or clearTimeout()
Has methods: ref(), unref(), hasRef(), close(), refresh(), [Symbol.toPrimitive](), [Symbol.dispose]()
TypeScript type: NodeJS.Timeout

API Signatures:

// setInterval returns Timeout
function setInterval(
  callback: Function, 
  delay?: number, 
  ...args: any[]
): NodeJS.Timeout;

// clearInterval expects Timeout
function clearInterval(
  timeout: NodeJS.Timeout | string | number
): void;

NodeJS.Timer Type:

Deprecated/incorrect type for timer return values
Missing required properties: close, _onTimeout, [Symbol.dispose]
Should NOT be used for setInterval()/setTimeout() return types
Causes TypeScript strict mode errors when passed to clearInterval()

Codebase Analysis:

grep -r "NodeJS.Timer" src/
  src/lib/server/scheduler.ts:13    intervalId: NodeJS.Timer | null;
  src/tests/fixtures.ts:151         let timers: NodeJS.Timer[] = [];

grep -r "NodeJS.Timeout" src/
  src/routes/api/queue/stream/+server.ts:54    let keepAliveInterval: NodeJS.Timeout | null = null;

Findings:

Incorrect usage (2 occurrences):
- src/lib/server/scheduler.ts:13 — SchedulerState interface
- src/tests/fixtures.ts:151 — Timer array in test helper
Correct usage (1 occurrence):
- src/routes/api/queue/stream/+server.ts:54 — keepAliveInterval type

Solution: Change all NodeJS.Timer to NodeJS.Timeout to align with Node.js official API contracts and TypeScript type definitions.

Files to Modify:

src/lib/server/scheduler.ts:13 — Type in SchedulerState interface
src/tests/fixtures.ts:151 — Type in createTimerSpy helper

Impact:

Type-only change, no runtime behavior modification
Fixes TypeScript strict mode compile error
Aligns codebase with Node.js standard types
Existing tests (260 total) already provide 100% coverage

References:

Node.js Timers Documentation: https://nodejs.org/docs/latest/api/timers.html#class-timeout
TypeScript @types/node package: Official Node.js type definitions
Related Error: RECIPE-0004 iteration 0 review_report.yaml

Document Version: 1.7
Last Updated by: Planner Agent (RECIPE-0005 Iteration 0)
Next Update: Developer Agent

[Planner] Research Notes - RECIPE-0005 (2026-02-17)

Task: Fix Playwright Docker dependencies and create LMStudio integration for E2E testing

Playwright Alpine Linux Docker Integration - RECIPE-0005

Research Date: 2026-02-17
Source: FINDINGS.md (RECIPE-0003), Dockerfile analysis, browser.ts, Playwright documentation

Problem Analysis:

Container fails with: "Executable doesn't exist at /root/.cache/ms-playwright/chromium_headless_shell-1208/"
Alpine Linux uses musl libc, Playwright's bundled browsers require glibc
Current Dockerfile installs system chromium via apk add chromium but browser.ts doesn't specify executable path
Playwright API defaults to searching for its own bundled browser binary (not present)

Solution (Already Researched in RECIPE-0003): Configure Playwright to use system chromium installed by Alpine APK:

// src/lib/server/browser.ts - initializeBrowser()
browser = await chromium.launch({
  executablePath: '/usr/bin/chromium-browser',  // System chromium path
  headless: true,
  args: [
    '--disable-blink-features=AutomationControlled',
    '--disable-dev-shm-usage',
    '--no-sandbox',
    '--disable-setuid-sandbox',
    '--disable-gpu'
  ]
});

Files to Modify:

src/lib/server/browser.ts - Add executablePath: '/usr/bin/chromium-browser' to launch options

No Changes Needed:

Dockerfile already has chromium and fonts installed correctly
No need for npx playwright install (would fail on Alpine anyway)

LMStudio Docker Networking - RECIPE-0005

Research Date: 2026-02-17
Source: Docker networking documentation, LMStudio API patterns, OpenAI-compatible endpoints

Problem:

LMStudio runs on host at http://localhost:1234
Docker containers have isolated networking - localhost inside container != host localhost
Container needs to access host services

Docker Networking Solutions:

Option A - network_mode: host (Recommended for LMStudio):

services:
  app:
    network_mode: host

Container shares host network stack
localhost:1234 inside container = host's localhost:1234
Trade-off: Loses container network isolation, port mapping ignored
Best for: Local development/testing with host services

Option B - extra_hosts (Alternative):

services:
  app:
    extra_hosts:
      - "host.docker.internal:host-gateway"
    environment:
      - OPENAI_BASE_URL=http://host.docker.internal:1234/v1

Works on Docker Desktop (Mac/Windows) and Linux with Docker 20.10+
Maintains container network isolation
Trade-off: Requires changing OPENAI_BASE_URL from localhost

Chosen Approach: network_mode: host

Rationale: Simplest for local LMStudio integration, no URL changes needed
Tool mandate specifies "http://localhost:1234" must work
Matches requirement for local development/testing setup

LMStudio + Gemma 3 Configuration - RECIPE-0005

Research Date: 2026-02-17
Source: .env.example, llm.ts, prompt.yaml tool mandates

Current Configuration:

OPENAI_BASE_URL=http://localhost:1234/v1
OPENAI_API_KEY=your-api-key-here
LLM_MODEL=google/gemma-3-4b

LMStudio API Compatibility:

LMStudio provides OpenAI-compatible endpoint at /v1
Uses same API client: openai@^4.20.0
Model identifiers match LMStudio's loaded model names
API key can be any non-empty value (LMStudio doesn't validate in local mode)

Model Availability Check: From prior research (RECIPE-0001), llm.ts already implements:

checkModelAvailability(model: string) - verifies model loaded via client.models.list()
Returns available models if specified model not found
User must manually load model in LMStudio UI before running container

No Code Changes Needed:

LLM integration already OpenAI-compatible
Model check already implemented
Only need environment variable configuration

Docker Compose Complete Configuration - RECIPE-0005

Research Date: 2026-02-17
Source: docker-compose.yml, .env.example, queueConfig, tandoorConfig

Required Changes:

Add network_mode: host for LMStudio access
Update LLM_MODEL default to google/gemma-3-4b
Update .env.example defaults to match tool mandates

Current docker-compose.yml:

Already has all environment variables configured
Already has ./secrets:/app/secrets volume mount
Already has healthcheck configured
Already has seccomp=unconfined for Chromium

Port Mapping with network_mode: host:

ports: section ignored when using network_mode: host
App will bind directly to host port 3000
No conflicts expected (LMStudio uses 1234, app uses 3000, Tandoor external)

End-to-End Testing Strategy - RECIPE-0005

Research Date: 2026-02-17
Source: Test URL from prompt, queue system architecture

Test URL: https://www.instagram.com/reel/DP6oN7JCEo8/?utm_source=ig_web_button_share_sheet

Testing Workflow:

Build Docker image: docker-compose build
Start container: docker-compose up
Verify LMStudio loaded Gemma 3 model: http://localhost:1234/v1/models
Verify app health: http://localhost:3000/api/health
Verify LLM health: http://localhost:3000/api/llm-health
Enqueue test URL: POST http://localhost:3000/api/queue
Monitor progress: GET http://localhost:3000/api/queue/stream
Verify extraction succeeds with Gemma 3
Check Tandoor upload (if configured)

Success Indicators:

Chromium launches without "Executable doesn't exist" error
LLM health check passes
Extraction phase completes successfully
Recipe parsing succeeds with Gemma 3
All existing tests pass (npm test)

Files Summary - RECIPE-0005

Modified Files:

src/lib/server/browser.ts - Add executablePath for Alpine chromium
docker-compose.yml - Add network_mode: host, update LLM_MODEL default
.env.example - Update LLM_MODEL default to google/gemma-3-4b

No Changes:

Dockerfile - Already correct (chromium + fonts installed)
src/lib/server/llm.ts - Already OpenAI-compatible
src/lib/server/queue/config.ts - Already reads env vars correctly
Test files - All existing tests should pass

Testing:

Manual E2E test with provided Instagram URL
Verify in Docker container with LMStudio
All unit tests must pass

Dependencies:

User must have LMStudio running on host at localhost:1234
User must manually load google/gemma-3-4b model in LMStudio
Secrets volume must exist for Instagram auth (optional)

[Planner] Research Notes - RECIPE-0006 Iteration 1 (2026-02-17)

Task: Transform E2E test to unit test with mocked fixtures and fix extraction logic iteratively

Problem Analysis

Research Date: 2026-02-17T10:00:00.000Z
Source: review_report.yaml, extraction.ts analysis, test fixtures

Iteration 0 Failure:

E2E test created but never executed during development
User manually ran test and it FAILED
Current output: "16K likes, 325 comments - chef.antonio.la.cava on October 17, 2025: "La cacio e pepe..."
Expected output: Full recipe starting with "La cacio e pepe infallibile di Luciano Monosilio 🍝"

Root Cause Analysis:

DOM selectors failing: Lines 331-341 of extraction.ts try selectors but none match Instagram's current structure
Fallback to og:description: Line 348-357 extracts from <meta property="og:description"> which contains metadata prefix
Regex cleanup insufficient: Line 356 tries to clean metadata with regex ^\d+K?\s+likes,\s+\d+\s+comments\s+-\s+[\w.]+\s+on\s+[^:]+:\s+ but it's not removing the text properly

Current extractFromDOM() Flow:

1. Try selectors: article h1, article span[dir="auto"], article div[role="button"] + span, article span:not([aria-label])
   → All fail (return null or < 100 chars)
2. Fallback to og:description meta tag
   → Returns: "16K likes, 325 comments - username on date: caption..."
3. Apply metadata cleanup regex
   → Regex doesn't match properly (or matches but leaves quotes)
4. Pass to cleanText()
   → cleanText() removes hashtags but metadata prefix remains

Vitest Unit Testing for Playwright Mocking

Research Date: 2026-02-17T10:00:00.000Z
Source: TESTING.md, existing tests (queue-processor.spec.ts, scheduler.spec.ts)

Mocking Strategy: From TESTING.md and existing test patterns, Vitest provides module-level mocking:

// Mock entire module BEFORE imports
vi.mock('$lib/server/extraction', () => ({
  extractTextAndThumbnail: vi.fn().mockResolvedValue({
    bodyText: 'Mocked text',
    thumbnail: 'https://example.com/thumb.jpg'
  })
}));

For Unit Testing extractFromDOM():

Cannot mock the entire extraction.ts module (we're testing functions inside it)
Need to test internal functions directly (extractFromDOM, cleanText are not exported)
Options:
1. Export functions for testing (add export to extractFromDOM and cleanText)
2. Mock Playwright Page.evaluate() (mock the browser automation layer)
3. Integration test with mocked browser context

Chosen Approach: Export Internal Functions

Cleanest separation of concerns
Allows direct unit testing without browser overhead
Follows existing pattern (extractTextAndThumbnail is already exported)
Test Runtime: < 10ms (vs 30s for E2E test)

Test Structure:

// Unit test with fixtures
import { extractFromDOM, cleanText } from '$lib/server/extraction';

describe('Instagram Caption Extraction Unit Tests', () => {
  it('should clean metadata prefix from og:description', async () => {
    const input = '16K likes, 325 comments - chef.antonio.la.cava on October 17, 2025: "La cacio e pepe...';
    const expected = 'La cacio e pepe infallibile di Luciano Monosilio...';
    
    // Create mock page that returns problematic og:description
    const mockPage = {
      evaluate: vi.fn().mockResolvedValue(input)
    };
    
    const result = await extractFromDOM(mockPage as any);
    expect(result.bodyText).toBe(expected);
  });
});

Metadata Prefix Regex Analysis

Research Date: 2026-02-17T10:00:00.000Z
Source: extraction.ts line 356, test fixtures

Current Regex (Line 356):

const cleanedContent = content.replace(/^\d+K?\s+likes,\s+\d+\s+comments\s+-\s+[\w.]+\s+on\s+[^:]+:\s+/, '');

Test Against Actual Input:

Input:    '16K likes, 325 comments - chef.antonio.la.cava on October 17, 2025: "La cacio e pepe...'
Pattern:  '^\d+K?\s+likes,\s+\d+\s+comments\s+-\s+[\w.]+\s+on\s+[^:]+:\s+'
          ^----- Should match "16K likes, 325 comments - chef.antonio.la.cava on October 17, 2025: "

Issue: Pattern matches but leaves opening quote " after the colon.

Problems Identified:

Pattern doesn't account for quotes after colon
Date pattern [^:]+ is too greedy (matches "October 17, 2025")
Pattern assumes single space after colon, but actual format may have ": " (colon-space-quote)

Improved Regex:

// Match: "X likes, Y comments - username on date: " (with optional quote)
/^\d+K?\s+likes,\s+\d+\s+comments\s+-\s+[\w.]+\s+on\s+[^:]+:\s*["']?/

Breakdown:

^\d+K? - Matches "16K" or "16" (K is optional)
\s+likes,\s+\d+\s+comments - Matches " likes, 325 comments"
\s+-\s+[\w.]+ - Matches " - chef.antonio.la.cava" (alphanumeric + dots)
\s+on\s+[^:]+: - Matches " on October 17, 2025:" (anything before colon)
\s* - Optional whitespace after colon
["']? - Optional quote character (single or double)

This should properly strip:

"16K likes, 325 comments - chef.antonio.la.cava on October 17, 2025: " → (empty)

Files to Modify - RECIPE-0006 Iteration 1

Primary Changes:

src/lib/server/extraction.ts
- Export extractFromDOM for unit testing
- Export cleanText for unit testing
- Fix metadata prefix regex in extractFromDOM() (line 356)
src/tests/instagram-caption-extraction.unit.spec.ts (NEW)
- Replace E2E test with unit test
- Mock page.evaluate() to return test fixtures
- Test both problematic and expected outputs
- Runtime < 100ms
src/tests/instagram-caption-extraction.e2e.spec.ts (MODIFY)
- Mark as .skip or remove (replaced by unit test)
- Keep file for future real-world validation (optional)

Dependencies:

Vitest mocking (vi.fn(), mockResolvedValue)
Test fixtures from context_compact.yaml
No external libraries needed

Parallelization:

All changes are independent
Unit test can be written in parallel with extraction.ts fix
Test validates fix iteratively

Document Version: 1.8
Last Updated by: Planner Agent (RECIPE-0006 Iteration 1)
Next Update: Developer Agent

58 KiB Raw Blame History Unescape Escape

Findings & Research Documentation

Purpose

Initial Codebase Analysis

Language & Framework

Project Type

Architecture Style

Key Technical Components

Design Patterns Identified

Dependencies Overview

File Structure

Code Quality Indicators

Environment Configuration

Deployment Setup

Notable Features

Technical Debt & Opportunities

Identified Issues

Potential Improvements

Research Findings

[Planner] Research Notes - RECIPE-0001 (2026-02-15)

Issue 1: Model Loading - "400 No models loaded"

Issue 2: Frontend Error Display - "[object Object]"

Dependencies & Constraints (from ARCHITECTURE.md)

Code Style Requirements (from CODE_STYLE.md)

[Developer] Implementation Notes

[Reviewer] Review Notes

API Endpoint Catalog

Active Endpoints

Queue Management

Push Notifications

Health & Status

Tandoor Integration

Legacy/Deprecated

Known Constraints

Browser Automation

LLM Integration

Tandoor Integration

SSL Requirements

Testing Coverage

Test Distribution

Test Files

Mock Strategy

Documentation Inventory

Existing Documentation

Plan Documentation

Outcome Documentation

Agent Pipeline Notes

Build Commands

Development Workflow

Continuous Integration

Next Steps

[Planner] Research Notes - RECIPE-0002 (2026-02-16)

PWA Documentation Research

Current Implementation Analysis

What Works Already:

What Needs Attention:

Dependencies & Constraints (from ARCHITECTURE.md, CODE_STYLE.md):

Code Style Requirements (from CODE_STYLE.md):

[Planner] Research Notes - RECIPE-0003 (2026-02-16)

PWA Icon Generation - icon-source.png

Docker Volume Configuration

Environment Variable Inventory

Docker Health Check Configuration

Docker Deployment Constraints

[Planner] Research Notes - RECIPE-0003 Iteration 1 (2026-02-16)

Alpine Linux Font Packages

Playwright on Alpine Linux

Playwright Version Compatibility

Docker Compose Configuration for Playwright

[Planner] Node.js Versions and npm Lockfile Compatibility - RECIPE-0003 Iteration 2 (2026-02-16)

Problem Analysis

Node.js LTS Status Research

npm Lockfile Version Compatibility

Tailwind CSS v3 → v4 Migration Impact

Solution Options Analysis

Implementation Details

[Planner] Research Notes - RECIPE-0004 (2026-02-16)

.dockerignore Research

Favicon 404 Error Research

Push Notifications Implementation Research

58 KiB

Raw Blame History