- Add instagram-extractor.ts: yt-dlp subprocess backend for Instagram
caption extraction. No in-process browser state, maintained against
Instagram frontend churn, supports cookies.txt for auth-walled reels.
- Add feature flag EXTRACTOR_BACKEND (ytdlp|playwright) in QueueProcessor
so the old Playwright path remains available as fallback.
- Add 9 unit tests and 2 live-network integration tests for the new extractor.
- Dockerfile: install yt-dlp via pip3 alongside existing Chromium deps.
- docker-compose: expose EXTRACTOR_BACKEND env var (default: ytdlp).
Also in this commit:
- LLM: configurable per-request timeout via LLM_REQUEST_TIMEOUT_MS (default 120s);
set maxRetries=0 to surface errors immediately; llama-swap /running health probe.
- QueueProcessor: thread progress callback through parser phase.
- LlmHealthIndicator: surface llama-swap loaded-model name.
- Logging: improve error serialization in queue-processor tests.
- .env.example: document llama-swap endpoint and model options.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Exported cleanText() and extractFromDOM() for unit testing
- Fixed metadata prefix regex to handle optional quotes
- Created comprehensive unit tests with mocked Playwright Page (15 tests, 12ms)
- All 275 tests passing
- Fixed NodeJS.Timer → NodeJS.Timeout in scheduler.ts line 13
- Fixed NodeJS.Timer[] → NodeJS.Timeout[] in fixtures.ts line 151
- Resolves TypeScript compile errors from iteration 0 review
- All 260 tests passing, build succeeds with no errors
- Implement strict HTTP 200 validation (reject all other status codes)
- Add content-type validation (must be image/*)
- Add 10-second timeout protection with AbortController
- Thread progressCallback through all fetchImageAsBase64 calls
- Add detailed logging for each validation failure scenario
- Report validation failures via SSE progress callbacks
Unit tests:
- Add comprehensive test coverage for all validation scenarios
- Test HTTP status codes (200, 404, 403, 500, etc.)
- Test content-type validation (image/* vs text/html, etc.)
- Test timeout behavior with AbortController
- Test error handling (network errors, DNS, SSL, etc.)
- Test progress callback reporting
Integration tests:
- Add tests for complete extraction flow with URL failures
- Test fallback chain behavior (meta tags → poster → Instagram data → screenshot)
- Test real-world scenarios (redirects, query params, different post types)
Documentation:
- Enhanced JSDoc with validation criteria
- Added examples showing fallback behavior
- Documented all failure scenarios and their handling
All tests passing ✅
- Fix authentication header from 'Bearer' to 'Token' (DRF TokenAuth)
- Implement three-strategy upload system:
1. URL pass-through for direct URLs (most efficient)
2. Base64 data URL conversion for screenshots
3. Fallback blob upload for any other format
- Add comprehensive error handling with response details
- Add detailed logging for debugging upload strategies
- Document thumbnail formats in extractThumbnailStealth()
Fixes#30 - Tandoor image upload 400 Bad Request error
Based on Tandoor source code analysis (cookbook/views/api.py):
- RecipeImageSerializer accepts 'image_url' field for server-side download
- Uses Token authentication, not Bearer
- Supports multipart file upload with proper MIME types
- Add progressCallback parameter to extractFromEmbeddedJSON and extractFromDOM
- Pass onProgress callback from extractWithStrategies to all strategies
- Fix legacy strategy to use correct callback variable name
- Verify extractViaGraphQL correctly returns null thumbnail
This fixes ReferenceError that was preventing all extraction methods from working.
All extraction strategies now properly emit thumbnail progress events via SSE.
Closes: FixProgressCallbackUndefinedErrors