insta-recipe

Author	SHA1	Message	Date
Giancarmine Salucci	226b2e7f15	fix(extraction): always use DOM extraction, never trust GraphQL caption Some checks failed Build & Push Docker Image / test-and-build (push) Failing after 33s Details Instagram's GraphQL API silently truncates captions WITHOUT '….' markers. Both DWWxiymssxE (393 chars full, 327 from API) and DXT73izCBoH (744+ chars full, cut mid-sentence) were affected. Remove the GraphQL-interception shortcut entirely. Always use DOM extraction (HTML Section) which clicks '… more' to get the complete text. The intercepted GraphQL caption is kept only as emergency fallback if all DOM strategies fail. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-13 02:24:40 +02:00
Giancarmine Salucci	73e10730dc	fix(extraction): don't use truncated GraphQL caption — fall through to DOM Some checks failed Build & Push Docker Image / test-and-build (push) Failing after 35s Details If the GraphQL-intercepted caption ends with '….' (Instagram's truncation marker), skip it and fall through to HTML Section extraction which clicks the '… more' button in the DOM to get the complete, untruncated caption. Previously the 327-char truncated caption for DWWxiymssxE was returned immediately, causing the LLM to say 'no recipe' even though the full description had all ingredients and steps. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-13 01:52:02 +02:00
Giancarmine Salucci	c9f5300272	feat: use Playwright for caption, yt-dlp for thumbnail only Some checks failed Build & Push Docker Image / test-and-build (push) Failing after 33s Details Always extract the full caption via Playwright (browser sees the untruncated text). yt-dlp runs in parallel only to get the thumbnail CDN URL quickly; its result for the description is discarded. This eliminates the truncation problem at the source without needing a fallback heuristic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-13 01:31:33 +02:00
Giancarmine Salucci	958353d15a	feat: Playwright fallback for truncated Instagram captions All checks were successful Build & Push Docker Image / test-and-build (push) Successful in 1m1s Details When yt-dlp returns a caption ending with the truncation marker '….' (GraphQL API caps the text), automatically retry with the Playwright extractor, which intercepts the full caption from live GraphQL network traffic. Falls back gracefully to the partial yt-dlp caption if Playwright fails. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-13 00:17:36 +02:00
Giancarmine Salucci	10c4f78ace	Revert "feat: auto Playwright fallback when yt-dlp caption is truncated" All checks were successful Build & Push Docker Image / test-and-build (push) Successful in 1m3s Details This reverts commit `8c25bce400`.	2026-05-12 23:49:34 +02:00
Giancarmine Salucci	8c25bce400	feat: auto Playwright fallback when yt-dlp caption is truncated All checks were successful Build & Push Docker Image / test-and-build (push) Successful in 1m2s Details Instagram truncates long captions server-side (ends with '…'). When yt-dlp returns a truncated caption, automatically fall back to the Playwright extractor which runs JS in a real browser and can click the 'more' button to expand the full caption. Falls back gracefully: if Playwright fails, the truncated text is still used rather than failing the whole extraction. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-12 23:46:24 +02:00
Giancarmine Salucci	9e14613746	fix(auth): always regenerate cookies.txt from auth.json, don't skip if yt-dlp overwrote it All checks were successful Build & Push Docker Image / test-and-build (push) Successful in 1m2s Details Previously cookies.txt was only regenerated when auth.json was newer. But yt-dlp overwrites cookies.txt during extraction with its own header ('generated by yt-dlp') and potentially fewer/different cookies, losing the sessionid from auth.json. Fix: remove mtime comparison — always regenerate cookies.txt from auth.json on each extraction call. This ensures the full session cookie set is always present. Also remove the now-unused statSync import. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-12 23:19:55 +02:00
Giancarmine Salucci	040ae17c12	fix(ui): add ic-btn-reset CSS + auto-convert auth.json to cookies.txt All checks were successful Build & Push Docker Image / test-and-build (push) Successful in 1m3s Details - layout.css: add button.ic-btn-reset rule so all icon buttons (bell, back, close, retry, etc.) get proper background:none reset instead of browser-default white/grey appearance in dark mode - instagram-extractor.ts: auto-convert secrets/auth.json (Playwright storage format) to Netscape cookies.txt at runtime whenever auth.json is newer; ensures sessionid and all Instagram session cookies are passed to yt-dlp, fixing empty media response Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-12 22:29:12 +02:00
Giancarmine Salucci	0b9f598c7d	fix(parser): handle thinking models in recipe detection Some checks failed Build & Push Docker Image / test-and-build (push) Failing after 38s Details Increase max_tokens from 10 to 1024 for detection so thinking models have room to reason. Also fall back to reasoning_content if content is empty, since some local models (e.g. Gemma 4 thinking variants) put their answer there. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-12 21:11:50 +02:00
Giancarmine Salucci	5b5bb947ef	feat: replace Playwright extractor with yt-dlp subprocess - Add instagram-extractor.ts: yt-dlp subprocess backend for Instagram caption extraction. No in-process browser state, maintained against Instagram frontend churn, supports cookies.txt for auth-walled reels. - Add feature flag EXTRACTOR_BACKEND (ytdlp\|playwright) in QueueProcessor so the old Playwright path remains available as fallback. - Add 9 unit tests and 2 live-network integration tests for the new extractor. - Dockerfile: install yt-dlp via pip3 alongside existing Chromium deps. - docker-compose: expose EXTRACTOR_BACKEND env var (default: ytdlp). Also in this commit: - LLM: configurable per-request timeout via LLM_REQUEST_TIMEOUT_MS (default 120s); set maxRetries=0 to surface errors immediately; llama-swap /running health probe. - QueueProcessor: thread progress callback through parser phase. - LlmHealthIndicator: surface llama-swap loaded-model name. - Logging: improve error serialization in queue-processor tests. - .env.example: document llama-swap endpoint and model options. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-12 20:46:31 +02:00
Giancarmine Salucci	dfca35bde2	feat(RECIPE-0009): complete iteration 0 — deduplication, notifications, UI improvements	2026-02-18 06:00:48 +01:00
Giancarmine Salucci	49bccf8f15	simplify	2026-02-18 01:21:44 +01:00
Giancarmine Salucci	54321fd7c9	fix tests	2026-02-18 01:11:03 +01:00
Giancarmine Salucci	bf3e5c679f	fix(RECIPE-0008): complete iteration 1 — resolve all TypeScript strict mode errors	2026-02-18 00:56:12 +01:00
Giancarmine Salucci	ea535bd9dd	fix instagram extraction	2026-02-17 19:52:25 +01:00
Giancarmine Salucci	56d3aec3e2	fix(RECIPE-0006): complete iteration 1 - unit tests for Instagram caption extraction - Exported cleanText() and extractFromDOM() for unit testing - Fixed metadata prefix regex to handle optional quotes - Created comprehensive unit tests with mocked Playwright Page (15 tests, 12ms) - All 275 tests passing	2026-02-17 11:03:33 +01:00
Giancarmine Salucci	b304f5266a	fix(RECIPE-0006): complete iteration 0 — fix Instagram recipe extraction	2026-02-17 10:14:52 +01:00
Giancarmine Salucci	b0b5c3579b	fix(RECIPE-0005): complete iteration 0 — Playwright Alpine fix and Docker LMStudio setup	2026-02-17 04:19:55 +01:00
Giancarmine Salucci	67ab3c02d7	chore(RECIPE-0004): complete iteration 1 — fix TypeScript Timer type errors - Fixed NodeJS.Timer → NodeJS.Timeout in scheduler.ts line 13 - Fixed NodeJS.Timer[] → NodeJS.Timeout[] in fixtures.ts line 151 - Resolves TypeScript compile errors from iteration 0 review - All 260 tests passing, build succeeds with no errors	2026-02-17 03:08:21 +01:00
Giancarmine Salucci	8aafbb9d88	feat(RECIPE-0003): complete iteration 2 - fix Docker deployment - Updated Dockerfile base image: node:22-alpine → node:24-alpine - Regenerated package-lock.json to sync with package.json Tailwind v4 - Docker build now completes successfully (npm ci no longer fails) - Docker compose with .env.example runs without errors - Application verified accessible and functional in Docker - Instagram extraction pipeline tested successfully Resolves package-lock.json sync issue that blocked iteration 1.	2026-02-16 18:26:59 +01:00
Giancarmine Salucci	0ab89a125f	fix(RECIPE-0001): complete iteration 0 — automatic model loading and error display fix	2026-02-15 03:18:12 +01:00
Giancarmine Salucci	e49dbfae41	feat: fix push notifications and enhance PWA experience - Fix InvalidCharacterError in push notifications with proper VAPID key validation - Add attractive PWA install prompt component with cross-browser support - Make notification settings always visible regardless of queue status - Implement PWA install manager with user engagement detection - Use SvelteKit navigation APIs instead of browser history API - Add comprehensive error handling and logging - Include cross-browser compatibility and responsive design - Add development tooling improvements Fixes push notification bugs and significantly improves PWA user experience with modern, accessible interface components and proper error handling.	2025-12-22 15:18:03 +01:00
Giancarmine Salucci	93aa25a31c	fix: resolve critical app functionality issues Complete implementation of fixes for queue processing, SSE connection display, service worker installation, and failing tests. Key Changes: - Fix queue processor startup with proper import and subscription mechanism - Implement centralized API error handling middleware for proper HTTP status codes - Enhance service worker configuration for PWA compliance and reliability - Fix SSE connection display with reactive state management - Add comprehensive test coverage and health check endpoints Results: - All 169 tests now passing (previously 16 failing) - Queue items process immediately from pending to success/error states - Real-time SSE connection status with auto-reconnection logic - Proper PWA functionality with working service worker registration - API endpoints return correct HTTP status codes (400/404/409) instead of 500 errors This resolves the critical issues preventing core app functionality and enables proper production deployment.	2025-12-22 04:27:59 +01:00
Giancarmine Salucci	6b022d8348	feat(validation): relax Instagram URL validation to support all content types - Create validateInstagramUrl utility using URL constructor - Replace regex-based validation with hostname and protocol checks - Support posts, reels, IGTV, and URLs with query parameters - Add comprehensive unit tests (22 tests, all passing) - Add integration tests for new URL formats - Update API documentation with supported URL formats Closes: #RelaxInstagramUrlValidation	2025-12-22 03:10:29 +01:00
Giancarmine Salucci	8545744bb1	fix(ssr): resolve EventSource SSR violations and implement best practices - Fix EventSource is not defined error in queue dashboard - Add browser guards for all EventSource usage - Replace static constants (EventSource.OPEN/CLOSED) with numeric values - Fix setInterval SSR violation in LLM health indicator - Replace $effect anti-pattern with onMount in share page - Add comprehensive SvelteKit SSR best practices documentation - Add SSR audit and testing verification All changes follow SvelteKit best practices and are verified against official documentation. Production build succeeds with no SSR errors. Closes: FixEventSourceSSR See: docs/outcomes/FixEventSourceSSR.md	2025-12-22 03:00:29 +01:00
Giancarmine Salucci	767b8a1b37	feat(extraction): enhance thumbnail URL validation with strict HTTP 200 check - Implement strict HTTP 200 validation (reject all other status codes) - Add content-type validation (must be image/) - Add 10-second timeout protection with AbortController - Thread progressCallback through all fetchImageAsBase64 calls - Add detailed logging for each validation failure scenario - Report validation failures via SSE progress callbacks Unit tests: - Add comprehensive test coverage for all validation scenarios - Test HTTP status codes (200, 404, 403, 500, etc.) - Test content-type validation (image/ vs text/html, etc.) - Test timeout behavior with AbortController - Test error handling (network errors, DNS, SSL, etc.) - Test progress callback reporting Integration tests: - Add tests for complete extraction flow with URL failures - Test fallback chain behavior (meta tags → poster → Instagram data → screenshot) - Test real-world scenarios (redirects, query params, different post types) Documentation: - Enhanced JSDoc with validation criteria - Added examples showing fallback behavior - Documented all failure scenarios and their handling All tests passing ✅	2025-12-21 05:33:48 +01:00
Giancarmine Salucci	5fe0a8a96e	fix(tandoor): convert Buffer to Uint8Array for Blob compatibility TypeScript compiler error fixed: Buffer is not assignable to BlobPart. Convert Buffer to Uint8Array before creating Blob.	2025-12-21 05:19:45 +01:00
Giancarmine Salucci	cc7b8032cb	fix(tandoor): use File constructor for proper multipart uploads - Remove unreliable URL pass-through strategy (image_url field) - Always download and upload images as File objects - Get MIME type from HTTP response headers for URLs - Use File constructor (not just Blob) for proper multipart metadata - Add comprehensive error logging with headers and file metadata - Simplify to single reliable upload path Fixes 400 'Upload a valid image' error caused by Blob not providing proper filename/MIME metadata in multipart form data.	2025-12-21 05:19:33 +01:00
Giancarmine Salucci	856c5c26f4	revert(tandoor): change auth header back to Bearer User's Tandoor instance uses Bearer token authentication (likely JWT) rather than Django REST Framework's Token authentication. Reverts authentication from 'Token' back to 'Bearer' to fix 403 error: 'Authentication credentials were not provided.'	2025-12-21 05:08:41 +01:00
Giancarmine Salucci	d1dc791854	fix(tandoor): implement smart image upload with auth fix - Fix authentication header from 'Bearer' to 'Token' (DRF TokenAuth) - Implement three-strategy upload system: 1. URL pass-through for direct URLs (most efficient) 2. Base64 data URL conversion for screenshots 3. Fallback blob upload for any other format - Add comprehensive error handling with response details - Add detailed logging for debugging upload strategies - Document thumbnail formats in extractThumbnailStealth() Fixes #30 - Tandoor image upload 400 Bad Request error Based on Tandoor source code analysis (cookbook/views/api.py): - RecipeImageSerializer accepts 'image_url' field for server-side download - Uses Token authentication, not Bearer - Supports multipart file upload with proper MIME types	2025-12-21 04:58:45 +01:00
Giancarmine Salucci	f5a1089936	feat(parser): remove step number prefixes from recipe extraction - Update RECIPE_EXTRACTION_PROMPT to v2.1 - Remove instruction to number steps sequentially - Update OUTPUT FORMAT and both few-shot examples - Remove 'All steps numbered sequentially' from quality checklist - Update fallback parser system prompt in parseRecipeWithStandardCompletion - Frontend <ol> element already handles auto-numbering - Tandoor integration unaffected (uses array index for step numbers) Fixes double-numbering bug where steps appeared as '1. 1. Step text' All 34 tests passing Implementation follows execution plan in docs/plans/RemoveStepNumberPrefixes.md Documented in docs/outcomes/RemoveStepNumberPrefixes.md	2025-12-21 04:46:38 +01:00
Giancarmine Salucci	2de5567682	fix(extraction): resolve progressCallback undefined errors - Add progressCallback parameter to extractFromEmbeddedJSON and extractFromDOM - Pass onProgress callback from extractWithStrategies to all strategies - Fix legacy strategy to use correct callback variable name - Verify extractViaGraphQL correctly returns null thumbnail This fixes ReferenceError that was preventing all extraction methods from working. All extraction strategies now properly emit thumbnail progress events via SSE. Closes: FixProgressCallbackUndefinedErrors	2025-12-21 04:28:07 +01:00
Giancarmine Salucci	7e4d82de8d	feat(share): refactor page and enhance thumbnail extraction - Extract 8 reusable components from monolithic share page - Add LLM health indicator with 30s polling - Implement stealth thumbnail extraction with 4-method cascade - Integrate real-time thumbnail preview component - Reduce share page from 306 to ~140 lines - Add comprehensive outcome documentation Components: - UrlInputSection: URL input and extraction trigger - ProgressIndicator: Loading state display - ExtractedTextViewer: Collapsible text preview - RecipeCard: Recipe display with Tandoor integration - ErrorState: Error handling UI - LogViewer: System logs with color coding - LlmHealthIndicator: LLM status with polling - ThumbnailPreview: Real-time thumbnail display Thumbnail Methods: 1. Meta tag extraction (og:image, twitter:image) 2. Video poster attribute 3. Instagram embedded JSON data 4. Screenshot fallback Stories Completed: - Story 1: Component extraction and refactoring - Story 2: LLM health status indicator - Story 3: Enhanced stealth thumbnail extraction - Story 4: Thumbnail preview integration Closes: RefactorSharePageAndEnhanceThumbnails	2025-12-21 04:18:38 +01:00
Giancarmine Salucci	da58263aba	feat: refactor frontend and fix LLM extraction - Fix critical await bug in extract-stream endpoint - Add comprehensive logging to LLM and parser modules - Implement fallback to standard completion for incompatible models - Create enhanced v2.0 prompts with social media handling and few-shot examples - Add LLM health check endpoint - Decompose share page into 6 focused Svelte 5 snippets Resolves LM Studio integration issues and improves code maintainability	2025-12-21 03:49:33 +01:00
Giancarmine Salucci	8fc7c44943	feat: robust Instagram extractor with real-time progress tracking Implements two major features: 1. Multi-strategy Instagram extraction with retry logic 2. Real-time progress reporting via Server-Sent Events Instagram Extractor Refactor: - Add 4 extraction strategies: embedded-json, dom-selector, graphql-api, legacy - Implement browser stealth mode with anti-detection measures - Add retry wrapper with exponential backoff (1s -> 2s -> 4s) - Extract from window._sharedData, DOM selectors, GraphQL API - Improve success rate from ~60% to ~95% Real-Time Progress Integration: - Create ProgressCallback system with typed events - Implement /api/extract-stream SSE endpoint - Update frontend to consume live progress updates - Add visual enhancements: method icons, colored logs, current method indicator - Enable transparency into extraction process Technical: - Type-safe TypeScript implementation - Hexagonal Architecture compliance - Backward compatible with existing /api/extract - Comprehensive test coverage (7 passing tests) - Full documentation in docs/outcomes/ Files changed: 12 files (+2,308 / -52) Tests: All passing (build successful) Related outcomes: - docs/outcomes/RefactorRobustInstagramExtractor.md - docs/outcomes/IntegrateExtractionProgressFrontend.md	2025-12-21 03:14:17 +01:00
Giancarmine Salucci	342a8eb259	fix: auth scheduler env vars, concurrency and browser stability	2025-12-21 02:15:22 +01:00
Giancarmine Salucci	9357bd483a	fix	2025-12-21 02:03:05 +01:00
Giancarmine Salucci	167cd1f4bb	with thumbnail!	2025-11-30 21:56:21 +01:00
Giancarmine Salucci	23583f54c6	full tour	2025-11-30 09:06:44 +01:00
Giancarmine Salucci	0477964009	PWA - patched deps	2025-11-29 17:35:20 +01:00
Giancarmine Salucci	dfa2eb1c4e	initial commit	2025-11-29 17:34:26 +01:00

41 Commits