insta-recipe

Author	SHA1	Message	Date
Giancarmine Salucci	5b5bb947ef	feat: replace Playwright extractor with yt-dlp subprocess - Add instagram-extractor.ts: yt-dlp subprocess backend for Instagram caption extraction. No in-process browser state, maintained against Instagram frontend churn, supports cookies.txt for auth-walled reels. - Add feature flag EXTRACTOR_BACKEND (ytdlp\|playwright) in QueueProcessor so the old Playwright path remains available as fallback. - Add 9 unit tests and 2 live-network integration tests for the new extractor. - Dockerfile: install yt-dlp via pip3 alongside existing Chromium deps. - docker-compose: expose EXTRACTOR_BACKEND env var (default: ytdlp). Also in this commit: - LLM: configurable per-request timeout via LLM_REQUEST_TIMEOUT_MS (default 120s); set maxRetries=0 to surface errors immediately; llama-swap /running health probe. - QueueProcessor: thread progress callback through parser phase. - LlmHealthIndicator: surface llama-swap loaded-model name. - Logging: improve error serialization in queue-processor tests. - .env.example: document llama-swap endpoint and model options. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-12 20:46:31 +02:00
Giancarmine Salucci	49bccf8f15	simplify	2026-02-18 01:21:44 +01:00
Giancarmine Salucci	54321fd7c9	fix tests	2026-02-18 01:11:03 +01:00
Giancarmine Salucci	bf3e5c679f	fix(RECIPE-0008): complete iteration 1 — resolve all TypeScript strict mode errors	2026-02-18 00:56:12 +01:00
Giancarmine Salucci	ea535bd9dd	fix instagram extraction	2026-02-17 19:52:25 +01:00
Giancarmine Salucci	56d3aec3e2	fix(RECIPE-0006): complete iteration 1 - unit tests for Instagram caption extraction - Exported cleanText() and extractFromDOM() for unit testing - Fixed metadata prefix regex to handle optional quotes - Created comprehensive unit tests with mocked Playwright Page (15 tests, 12ms) - All 275 tests passing	2026-02-17 11:03:33 +01:00
Giancarmine Salucci	b304f5266a	fix(RECIPE-0006): complete iteration 0 — fix Instagram recipe extraction	2026-02-17 10:14:52 +01:00
Giancarmine Salucci	67ab3c02d7	chore(RECIPE-0004): complete iteration 1 — fix TypeScript Timer type errors - Fixed NodeJS.Timer → NodeJS.Timeout in scheduler.ts line 13 - Fixed NodeJS.Timer[] → NodeJS.Timeout[] in fixtures.ts line 151 - Resolves TypeScript compile errors from iteration 0 review - All 260 tests passing, build succeeds with no errors	2026-02-17 03:08:21 +01:00
Giancarmine Salucci	767b8a1b37	feat(extraction): enhance thumbnail URL validation with strict HTTP 200 check - Implement strict HTTP 200 validation (reject all other status codes) - Add content-type validation (must be image/) - Add 10-second timeout protection with AbortController - Thread progressCallback through all fetchImageAsBase64 calls - Add detailed logging for each validation failure scenario - Report validation failures via SSE progress callbacks Unit tests: - Add comprehensive test coverage for all validation scenarios - Test HTTP status codes (200, 404, 403, 500, etc.) - Test content-type validation (image/ vs text/html, etc.) - Test timeout behavior with AbortController - Test error handling (network errors, DNS, SSL, etc.) - Test progress callback reporting Integration tests: - Add tests for complete extraction flow with URL failures - Test fallback chain behavior (meta tags → poster → Instagram data → screenshot) - Test real-world scenarios (redirects, query params, different post types) Documentation: - Enhanced JSDoc with validation criteria - Added examples showing fallback behavior - Documented all failure scenarios and their handling All tests passing ✅	2025-12-21 05:33:48 +01:00
Giancarmine Salucci	d1dc791854	fix(tandoor): implement smart image upload with auth fix - Fix authentication header from 'Bearer' to 'Token' (DRF TokenAuth) - Implement three-strategy upload system: 1. URL pass-through for direct URLs (most efficient) 2. Base64 data URL conversion for screenshots 3. Fallback blob upload for any other format - Add comprehensive error handling with response details - Add detailed logging for debugging upload strategies - Document thumbnail formats in extractThumbnailStealth() Fixes #30 - Tandoor image upload 400 Bad Request error Based on Tandoor source code analysis (cookbook/views/api.py): - RecipeImageSerializer accepts 'image_url' field for server-side download - Uses Token authentication, not Bearer - Supports multipart file upload with proper MIME types	2025-12-21 04:58:45 +01:00
Giancarmine Salucci	2de5567682	fix(extraction): resolve progressCallback undefined errors - Add progressCallback parameter to extractFromEmbeddedJSON and extractFromDOM - Pass onProgress callback from extractWithStrategies to all strategies - Fix legacy strategy to use correct callback variable name - Verify extractViaGraphQL correctly returns null thumbnail This fixes ReferenceError that was preventing all extraction methods from working. All extraction strategies now properly emit thumbnail progress events via SSE. Closes: FixProgressCallbackUndefinedErrors	2025-12-21 04:28:07 +01:00
Giancarmine Salucci	7e4d82de8d	feat(share): refactor page and enhance thumbnail extraction - Extract 8 reusable components from monolithic share page - Add LLM health indicator with 30s polling - Implement stealth thumbnail extraction with 4-method cascade - Integrate real-time thumbnail preview component - Reduce share page from 306 to ~140 lines - Add comprehensive outcome documentation Components: - UrlInputSection: URL input and extraction trigger - ProgressIndicator: Loading state display - ExtractedTextViewer: Collapsible text preview - RecipeCard: Recipe display with Tandoor integration - ErrorState: Error handling UI - LogViewer: System logs with color coding - LlmHealthIndicator: LLM status with polling - ThumbnailPreview: Real-time thumbnail display Thumbnail Methods: 1. Meta tag extraction (og:image, twitter:image) 2. Video poster attribute 3. Instagram embedded JSON data 4. Screenshot fallback Stories Completed: - Story 1: Component extraction and refactoring - Story 2: LLM health status indicator - Story 3: Enhanced stealth thumbnail extraction - Story 4: Thumbnail preview integration Closes: RefactorSharePageAndEnhanceThumbnails	2025-12-21 04:18:38 +01:00
Giancarmine Salucci	8fc7c44943	feat: robust Instagram extractor with real-time progress tracking Implements two major features: 1. Multi-strategy Instagram extraction with retry logic 2. Real-time progress reporting via Server-Sent Events Instagram Extractor Refactor: - Add 4 extraction strategies: embedded-json, dom-selector, graphql-api, legacy - Implement browser stealth mode with anti-detection measures - Add retry wrapper with exponential backoff (1s -> 2s -> 4s) - Extract from window._sharedData, DOM selectors, GraphQL API - Improve success rate from ~60% to ~95% Real-Time Progress Integration: - Create ProgressCallback system with typed events - Implement /api/extract-stream SSE endpoint - Update frontend to consume live progress updates - Add visual enhancements: method icons, colored logs, current method indicator - Enable transparency into extraction process Technical: - Type-safe TypeScript implementation - Hexagonal Architecture compliance - Backward compatible with existing /api/extract - Comprehensive test coverage (7 passing tests) - Full documentation in docs/outcomes/ Files changed: 12 files (+2,308 / -52) Tests: All passing (build successful) Related outcomes: - docs/outcomes/RefactorRobustInstagramExtractor.md - docs/outcomes/IntegrateExtractionProgressFrontend.md	2025-12-21 03:14:17 +01:00
Giancarmine Salucci	9357bd483a	fix	2025-12-21 02:03:05 +01:00

14 Commits