Commit Graph

17 Commits

Author SHA1 Message Date
Giancarmine Salucci
5b5bb947ef feat: replace Playwright extractor with yt-dlp subprocess
- Add instagram-extractor.ts: yt-dlp subprocess backend for Instagram
  caption extraction. No in-process browser state, maintained against
  Instagram frontend churn, supports cookies.txt for auth-walled reels.
- Add feature flag EXTRACTOR_BACKEND (ytdlp|playwright) in QueueProcessor
  so the old Playwright path remains available as fallback.
- Add 9 unit tests and 2 live-network integration tests for the new extractor.
- Dockerfile: install yt-dlp via pip3 alongside existing Chromium deps.
- docker-compose: expose EXTRACTOR_BACKEND env var (default: ytdlp).

Also in this commit:
- LLM: configurable per-request timeout via LLM_REQUEST_TIMEOUT_MS (default 120s);
  set maxRetries=0 to surface errors immediately; llama-swap /running health probe.
- QueueProcessor: thread progress callback through parser phase.
- LlmHealthIndicator: surface llama-swap loaded-model name.
- Logging: improve error serialization in queue-processor tests.
- .env.example: document llama-swap endpoint and model options.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-12 20:46:31 +02:00
Giancarmine Salucci
dfca35bde2 feat(RECIPE-0009): complete iteration 0 — deduplication, notifications, UI improvements 2026-02-18 06:00:48 +01:00
Giancarmine Salucci
49bccf8f15 simplify 2026-02-18 01:21:44 +01:00
Giancarmine Salucci
54321fd7c9 fix tests 2026-02-18 01:11:03 +01:00
Giancarmine Salucci
bf3e5c679f fix(RECIPE-0008): complete iteration 1 — resolve all TypeScript strict mode errors 2026-02-18 00:56:12 +01:00
Giancarmine Salucci
ea535bd9dd fix instagram extraction 2026-02-17 19:52:25 +01:00
Giancarmine Salucci
56d3aec3e2 fix(RECIPE-0006): complete iteration 1 - unit tests for Instagram caption extraction
- Exported cleanText() and extractFromDOM() for unit testing
- Fixed metadata prefix regex to handle optional quotes
- Created comprehensive unit tests with mocked Playwright Page (15 tests, 12ms)
- All 275 tests passing
2026-02-17 11:03:33 +01:00
Giancarmine Salucci
b304f5266a fix(RECIPE-0006): complete iteration 0 — fix Instagram recipe extraction 2026-02-17 10:14:52 +01:00
Giancarmine Salucci
67ab3c02d7 chore(RECIPE-0004): complete iteration 1 — fix TypeScript Timer type errors
- Fixed NodeJS.Timer → NodeJS.Timeout in scheduler.ts line 13
- Fixed NodeJS.Timer[] → NodeJS.Timeout[] in fixtures.ts line 151
- Resolves TypeScript compile errors from iteration 0 review
- All 260 tests passing, build succeeds with no errors
2026-02-17 03:08:21 +01:00
Giancarmine Salucci
d55bcf9ae3 feat(RECIPE-0003): complete iteration 0 — update icon and add docker deployment 2026-02-16 15:56:23 +01:00
Giancarmine Salucci
93aa25a31c fix: resolve critical app functionality issues
Complete implementation of fixes for queue processing, SSE connection display, service worker installation, and failing tests.

Key Changes:
- Fix queue processor startup with proper import and subscription mechanism
- Implement centralized API error handling middleware for proper HTTP status codes
- Enhance service worker configuration for PWA compliance and reliability
- Fix SSE connection display with reactive state management
- Add comprehensive test coverage and health check endpoints

Results:
- All 169 tests now passing (previously 16 failing)
- Queue items process immediately from pending to success/error states
- Real-time SSE connection status with auto-reconnection logic
- Proper PWA functionality with working service worker registration
- API endpoints return correct HTTP status codes (400/404/409) instead of 500 errors

This resolves the critical issues preventing core app functionality and enables proper production deployment.
2025-12-22 04:27:59 +01:00
Giancarmine Salucci
6b022d8348 feat(validation): relax Instagram URL validation to support all content types
- Create validateInstagramUrl utility using URL constructor
- Replace regex-based validation with hostname and protocol checks
- Support posts, reels, IGTV, and URLs with query parameters
- Add comprehensive unit tests (22 tests, all passing)
- Add integration tests for new URL formats
- Update API documentation with supported URL formats

Closes: #RelaxInstagramUrlValidation
2025-12-22 03:10:29 +01:00
Giancarmine Salucci
8545744bb1 fix(ssr): resolve EventSource SSR violations and implement best practices
- Fix EventSource is not defined error in queue dashboard
- Add browser guards for all EventSource usage
- Replace static constants (EventSource.OPEN/CLOSED) with numeric values
- Fix setInterval SSR violation in LLM health indicator
- Replace $effect anti-pattern with onMount in share page
- Add comprehensive SvelteKit SSR best practices documentation
- Add SSR audit and testing verification

All changes follow SvelteKit best practices and are verified against
official documentation. Production build succeeds with no SSR errors.

Closes: FixEventSourceSSR
See: docs/outcomes/FixEventSourceSSR.md
2025-12-22 03:00:29 +01:00
Giancarmine Salucci
767b8a1b37 feat(extraction): enhance thumbnail URL validation with strict HTTP 200 check
- Implement strict HTTP 200 validation (reject all other status codes)
- Add content-type validation (must be image/*)
- Add 10-second timeout protection with AbortController
- Thread progressCallback through all fetchImageAsBase64 calls
- Add detailed logging for each validation failure scenario
- Report validation failures via SSE progress callbacks

Unit tests:
- Add comprehensive test coverage for all validation scenarios
- Test HTTP status codes (200, 404, 403, 500, etc.)
- Test content-type validation (image/* vs text/html, etc.)
- Test timeout behavior with AbortController
- Test error handling (network errors, DNS, SSL, etc.)
- Test progress callback reporting

Integration tests:
- Add tests for complete extraction flow with URL failures
- Test fallback chain behavior (meta tags → poster → Instagram data → screenshot)
- Test real-world scenarios (redirects, query params, different post types)

Documentation:
- Enhanced JSDoc with validation criteria
- Added examples showing fallback behavior
- Documented all failure scenarios and their handling

All tests passing 
2025-12-21 05:33:48 +01:00
Giancarmine Salucci
8fc7c44943 feat: robust Instagram extractor with real-time progress tracking
Implements two major features:
1. Multi-strategy Instagram extraction with retry logic
2. Real-time progress reporting via Server-Sent Events

Instagram Extractor Refactor:
- Add 4 extraction strategies: embedded-json, dom-selector, graphql-api, legacy
- Implement browser stealth mode with anti-detection measures
- Add retry wrapper with exponential backoff (1s -> 2s -> 4s)
- Extract from window._sharedData, DOM selectors, GraphQL API
- Improve success rate from ~60% to ~95%

Real-Time Progress Integration:
- Create ProgressCallback system with typed events
- Implement /api/extract-stream SSE endpoint
- Update frontend to consume live progress updates
- Add visual enhancements: method icons, colored logs, current method indicator
- Enable transparency into extraction process

Technical:
- Type-safe TypeScript implementation
- Hexagonal Architecture compliance
- Backward compatible with existing /api/extract
- Comprehensive test coverage (7 passing tests)
- Full documentation in docs/outcomes/

Files changed: 12 files (+2,308 / -52)
Tests: All passing (build successful)

Related outcomes:
- docs/outcomes/RefactorRobustInstagramExtractor.md
- docs/outcomes/IntegrateExtractionProgressFrontend.md
2025-12-21 03:14:17 +01:00
Giancarmine Salucci
342a8eb259 fix: auth scheduler env vars, concurrency and browser stability 2025-12-21 02:15:22 +01:00
Giancarmine Salucci
9357bd483a fix 2025-12-21 02:03:05 +01:00