# Execution Plan: Integrate Extraction Progress with Frontend **OUTCOME_NAME:** IntegrateExtractionProgressFrontend **Created:** 21 December 2025 **Problem Statement:** The new multi-strategy Instagram extractor logs progress to server console only. Users cannot see which extraction method is being attempted, retry status, or why extraction might be slow. Need to integrate progress reporting with the frontend log component for full visibility. **Workflow exception** as this is a continuation on the previous feature, do not create a dedicated branch. Continue working on the current one --- ## Current State Analysis ### Existing Flow 1. User shares Instagram URL to PWA (share/+page.svelte) 2. Frontend calls `/api/extract` via POST 3. Backend calls `extractTextAndThumbnail()` synchronously 4. Extraction tries 4 strategies with retry logic (all in server console) 5. Frontend receives only final result or error 6. LLM parses recipe 7. Recipe displayed, optionally sent to Tandoor ### Current Logging Locations **Server Side (Not Visible to User):** - `[Extractor] Trying method: embedded-json` - `[Extractor] Success with method: dom-selector` - `[Retry] Attempt 2/3 failed. Retrying in 2000ms...` **Frontend Side (Visible in Logs Component):** - `'Sending to server... ' + targetUrl` - `'Recipe extraction successful'` - `'Error: ...'` ### Gap No real-time visibility into: - Which extraction strategy is currently running - Why extraction is taking time (multiple strategies, retries) - Which method ultimately succeeded - Detailed error information per strategy --- ## Solution Architecture ### Approach: Server-Sent Events (SSE) **Why SSE:** - ✅ Native browser support (EventSource API) - ✅ One-way server→client streaming (perfect for progress) - ✅ Automatic reconnection - ✅ Simple text-based protocol - ✅ Works with SvelteKit ReadableStream **Architecture:** ``` ┌─────────────────────────────────────────────────┐ │ Frontend (Primary Adapter) │ │ share/+page.svelte - EventSource listener │ └─────────────────┬───────────────────────────────┘ │ SSE Connection │ ┌─────────────────┴───────────────────────────────┐ │ API Endpoint (Adapter Layer) │ │ /api/extract-stream - ReadableStream │ └─────────────────┬───────────────────────────────┘ │ Progress Callback │ ┌─────────────────┴───────────────────────────────┐ │ Extraction Core (Domain Logic) │ │ extraction.ts - Multi-strategy extractor │ │ + Progress Callback Support │ └─────────────────────────────────────────────────┘ ``` Following **Hexagonal Architecture:** - Core extraction logic remains pure (domain) - Progress callback is a port (interface) - SSE endpoint is an adapter (delivery mechanism) - Frontend is primary adapter (UI) --- ## Story Breakdown ### Story 1: Add Progress Callback System to Extraction **Description:** Enhance extraction.ts to accept optional progress callback and emit events at key points without breaking existing functionality. **Acceptance Criteria:** - [ ] Define `ProgressCallback` type and `ProgressEvent` interface - [ ] Add optional `onProgress` parameter to `extractTextAndThumbnail()` - [ ] Call callback when trying each extraction method - [ ] Call callback on method success/failure - [ ] Call callback on retry attempts - [ ] Call callback on final success/error - [ ] All existing console.logs preserved - [ ] Backward compatible (works without callback) **Technical Implementation:** ```typescript // src/lib/server/extraction.ts export type ProgressEventType = 'status' | 'method' | 'retry' | 'error' | 'complete'; export interface ProgressEvent { type: ProgressEventType; message: string; method?: ExtractionMethod; attemptNumber?: number; maxAttempts?: number; data?: any; timestamp?: string; } export type ProgressCallback = (event: ProgressEvent) => void; // Update function signature export async function extractTextAndThumbnail( url: string, onProgress?: ProgressCallback ): Promise { return withRetry( async () => { const authPath = resolveAuthPath(); const context = await createBrowserContext(authPath); const page = await context.newPage(); try { page.setDefaultTimeout(30000); onProgress?.({ type: 'status', message: 'Loading Instagram page...', timestamp: new Date().toISOString() }); await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 }); onProgress?.({ type: 'status', message: 'Page loaded, starting extraction...', timestamp: new Date().toISOString() }); await page.waitForTimeout(1000 + Math.random() * 2000); const result = await extractWithStrategies(url, page, context, onProgress); if (!result.success || !result.data) { throw new Error(result.error || 'Extraction failed'); } onProgress?.({ type: 'complete', message: `Extraction successful using ${result.method} method`, method: result.method, timestamp: new Date().toISOString() }); fs.writeFileSync( path.resolve('debug_page.txt'), `Method: ${result.method}\n\n${result.data.bodyText}` ); return result.data; } finally { await page.close(); await context.close(); } }, DEFAULT_RETRY_CONFIG, onProgress // Pass to retry wrapper ); } // Update withRetry to accept and use callback async function withRetry( fn: () => Promise, config: RetryConfig = DEFAULT_RETRY_CONFIG, onProgress?: ProgressCallback ): Promise { let lastError: Error | null = null; let delay = config.initialDelayMs; for (let attempt = 1; attempt <= config.maxAttempts; attempt++) { try { return await fn(); } catch (error) { lastError = error as Error; if (isNonRetriableError(error)) { throw error; } if (attempt < config.maxAttempts) { const message = `Attempt ${attempt}/${config.maxAttempts} failed. Retrying in ${delay}ms...`; console.warn(`[Retry] ${message}`, error); onProgress?.({ type: 'retry', message, attemptNumber: attempt, maxAttempts: config.maxAttempts, data: { delayMs: delay }, timestamp: new Date().toISOString() }); await sleep(delay); delay = Math.min(delay * config.backoffMultiplier, config.maxDelayMs); } } } onProgress?.({ type: 'error', message: 'Max retry attempts exceeded', attemptNumber: config.maxAttempts, maxAttempts: config.maxAttempts, timestamp: new Date().toISOString() }); throw lastError || new Error('Max retry attempts exceeded'); } // Update extractWithStrategies async function extractWithStrategies( url: string, page: Page, context: BrowserContext, onProgress?: ProgressCallback ): Promise { const strategies: Array<{ name: ExtractionMethod; fn: () => Promise; }> = [ { name: 'embedded-json', fn: () => extractFromEmbeddedJSON(page) }, { name: 'dom-selector', fn: () => extractFromDOM(page) }, { name: 'graphql-api', fn: () => extractViaGraphQL(url, context) }, { name: 'legacy', fn: async () => { const text = await extractCleanTextLegacy(page); const thumbnail = await extractThumbnail(page); return { bodyText: text, thumbnail }; } } ]; for (const strategy of strategies) { try { console.log(`[Extractor] Trying method: ${strategy.name}`); onProgress?.({ type: 'method', message: `Trying extraction method: ${getMethodDisplayName(strategy.name)}`, method: strategy.name, timestamp: new Date().toISOString() }); const result = await strategy.fn(); if (result && result.bodyText) { console.log(`[Extractor] Success with method: ${strategy.name}`); onProgress?.({ type: 'method', message: `✓ Success with ${getMethodDisplayName(strategy.name)}`, method: strategy.name, data: { success: true }, timestamp: new Date().toISOString() }); return { success: true, method: strategy.name, data: result }; } onProgress?.({ type: 'method', message: `✗ ${getMethodDisplayName(strategy.name)} returned no data, trying next...`, method: strategy.name, data: { success: false }, timestamp: new Date().toISOString() }); } catch (error) { console.warn(`[Extractor] Method ${strategy.name} failed:`, error); onProgress?.({ type: 'method', message: `✗ ${getMethodDisplayName(strategy.name)} failed: ${error instanceof Error ? error.message : 'Unknown error'}`, method: strategy.name, data: { success: false, error: error instanceof Error ? error.message : 'Unknown' }, timestamp: new Date().toISOString() }); } } return { success: false, error: 'All extraction methods failed' }; } // Helper for display names function getMethodDisplayName(method: ExtractionMethod): string { const names: Record = { 'embedded-json': 'Embedded JSON Extractor', 'dom-selector': 'DOM Selector Extractor', 'graphql-api': 'GraphQL API Extractor', 'legacy': 'Legacy Text Extractor' }; return names[method] || method; } ``` **Dependencies:** - None (enhances existing code) **Risk Assessment:** - Low risk - Additive changes, backward compatible **Testing Strategy:** - Unit test callback invocations - Test with and without callback - Verify all event types are emitted --- ### Story 2: Create Server-Sent Events Extraction Endpoint **Description:** Create new `/api/extract-stream` endpoint that uses SSE to stream progress events from the extraction process. **Acceptance Criteria:** - [ ] New endpoint at `/api/extract-stream` - [ ] Accepts URL via query parameter or POST body - [ ] Returns ReadableStream with SSE formatting - [ ] Streams progress events from extraction - [ ] Sends final result as JSON in SSE event - [ ] Handles errors gracefully - [ ] Closes stream on completion or error **Technical Implementation:** ```typescript // src/routes/api/extract-stream/+server.ts import { extractTextAndThumbnail, type ProgressEvent } from '$lib/server/extraction'; import { extractRecipe } from '$lib/server/parser'; export async function POST({ request }) { const { url } = await request.json(); console.log('[SSE] Processing URL:', url); // Create a ReadableStream for SSE const stream = new ReadableStream({ async start(controller) { const encoder = new TextEncoder(); // Helper to send SSE event const sendEvent = (event: string, data: any) => { const message = `event: ${event}\ndata: ${JSON.stringify(data)}\n\n`; controller.enqueue(encoder.encode(message)); }; try { sendEvent('progress', { type: 'status', message: 'Starting extraction pipeline...', timestamp: new Date().toISOString() }); // Step 1: Extract with progress callbacks let bodyText = ''; let thumbnail: string | null = null; try { const result = await extractTextAndThumbnail(url, (progress: ProgressEvent) => { // Stream each progress event to client sendEvent('progress', progress); }); bodyText = result.bodyText; thumbnail = result.thumbnail; sendEvent('progress', { type: 'status', message: 'Text extracted, parsing recipe with AI...', timestamp: new Date().toISOString() }); } catch (error) { const errorMessage = error instanceof Error ? error.message : 'Unknown error'; sendEvent('error', { type: 'error', message: `Extraction failed: ${errorMessage}`, timestamp: new Date().toISOString() }); controller.close(); return; } // Step 2: Parse recipe let recipe: any = null; try { recipe = await extractRecipe(bodyText); if (!recipe) { sendEvent('error', { type: 'error', message: 'No recipe found in extracted text', bodyText, timestamp: new Date().toISOString() }); controller.close(); return; } sendEvent('progress', { type: 'status', message: 'Recipe parsed successfully, enriching metadata...', timestamp: new Date().toISOString() }); } catch (error) { const errorMessage = error instanceof Error ? error.message : 'Unknown error'; sendEvent('error', { type: 'error', message: `Recipe parsing failed: ${errorMessage}`, bodyText, timestamp: new Date().toISOString() }); controller.close(); return; } // Step 3: Enrich recipe if (recipe.description) { recipe.description += `\n\nLink: ${url}`; } else { recipe.description = `Link: ${url}`; } if (thumbnail) { recipe.image = thumbnail; } // Send final result sendEvent('complete', { type: 'complete', message: 'Recipe extraction complete!', recipe, bodyText, timestamp: new Date().toISOString() }); controller.close(); } catch (error) { const errorMessage = error instanceof Error ? error.message : 'Unknown error'; console.error('[SSE] Pipeline error:', errorMessage); sendEvent('error', { type: 'error', message: `Pipeline error: ${errorMessage}`, timestamp: new Date().toISOString() }); controller.close(); } } }); return new Response(stream, { headers: { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive', 'X-Accel-Buffering': 'no' // Disable nginx buffering } }); } ``` **Dependencies:** - None (uses Web Streams API) **Risk Assessment:** - Medium risk - SSE requires careful stream management - Mitigation: Proper error handling and stream closure **Testing Strategy:** - Test with curl to verify SSE format - Test connection closure on error - Test with slow network conditions --- ### Story 3: Update Frontend to Use SSE **Description:** Modify share/+page.svelte to use EventSource for real-time progress updates instead of single POST request. **Acceptance Criteria:** - [ ] Use EventSource to connect to `/api/extract-stream` - [ ] Listen for 'progress', 'error', 'complete' events - [ ] Update logs array in real-time - [ ] Display extraction method attempts - [ ] Show retry information with visual indicator - [ ] Handle final result (recipe display) - [ ] Handle errors gracefully - [ ] Close EventSource on completion **Technical Implementation:** ```svelte

InstaChef PWA

{#if targetUrl}
{targetUrl}
{#if status === 'idle'} {/if} {:else}

No URL detected. Open this app via Instagram Share Menu.

Debug: Text={sharedText} URL={sharedUrl}
{/if} {#if status === 'extracting'}
{currentMethod ? `Trying: ${currentMethod}` : 'Extracting...'}
{/if} {#if bodyText}
📝 View Extracted Text
{bodyText}
{/if} {#if recipe}

{recipe.name}

{recipe.description}

Servings: {recipe.servings}

Ingredients

    {#each recipe.ingredients as ing}
  • {ing.amount} {ing.unit} {ing.item}
  • {/each}

Steps

    {#each recipe.steps as step}
  1. {step}
  2. {/each}
{#if tandoorEnabled}

Tandoor Integration

{#if tandoorError}
Error: {tandoorError}
{/if}
{/if}
{/if} {#if status === 'error' && bodyText}

Extraction Error - Raw Text Available

📝 View Extracted Text
{bodyText}
{/if}
System Logs
{#each logs as l}
> {l}
{/each}
``` **Dependencies:** - None (uses standard Web APIs) **Risk Assessment:** - Medium risk - Manual SSE parsing in browser - Mitigation: Robust error handling, tested parsing logic **Testing Strategy:** - Test with real Instagram URLs - Test connection interruption - Test error scenarios - Verify log display updates in real-time --- ### Story 4: Add Visual Enhancements **Description:** Enhance the UI to better visualize the extraction process with method-specific indicators and improved status display. **Acceptance Criteria:** - [ ] Method icons/badges for each extraction strategy - [ ] Progress bar or step indicator - [ ] Retry countdown timer - [ ] Color-coded log messages - [ ] Collapsible log sections **Technical Implementation:** ```svelte {#if status === 'extracting' && currentMethod}
{getMethodIcon(currentMethod)}
{getMethodDisplayName(currentMethod)}
Attempting extraction...
{/if}
System Logs
{#each logs as l} {@const formatted = formatLog(l)}
{formatted.icon} {formatted.text}
{/each} {#if status === 'extracting'}
Processing...
{/if}
``` **Dependencies:** - None (pure Svelte/CSS) **Risk Assessment:** - Low risk - UI enhancements only **Testing Strategy:** - Visual regression testing - Test on mobile devices - Verify accessibility --- ### Story 5: End-to-End Integration Testing **Description:** Verify the complete pipeline works with real Instagram URLs and all extraction methods are properly reported. **Acceptance Criteria:** - [ ] Test with Instagram posts requiring each extraction method - [ ] Verify all 4 strategies are attempted and logged - [ ] Verify retry logic shows in frontend - [ ] Verify successful extraction completes full pipeline - [ ] Verify Tandoor integration still works - [ ] Verify error handling at each stage - [ ] Document test URLs and results **Testing Strategy:** **Test Cases:** 1. **Embedded JSON Success** - URL: Recent Instagram post - Expected: Method 1 succeeds immediately - Verify: Logs show "Trying: Embedded JSON" → "Success" 2. **DOM Selector Fallback** - URL: Post where embedded JSON fails - Expected: Method 1 fails, Method 2 succeeds - Verify: Logs show attempts and DOM selector success 3. **Multiple Retries** - Simulate network issues - Expected: Retry logic kicks in - Verify: Logs show "Retry 1/3", "Retry 2/3", etc. 4. **Complete Failure** - URL: Invalid Instagram link - Expected: All methods fail gracefully - Verify: Error message shown, no crashes 5. **Full Pipeline** - URL: Valid recipe post - Expected: Extract → Parse → Display → Tandoor import - Verify: All steps logged, recipe displays, Tandoor succeeds **Manual Testing Checklist:** - [ ] Progress updates appear in real-time - [ ] Method indicators update correctly - [ ] Retry messages show with delays - [ ] Final recipe displays properly - [ ] Logs are readable and informative - [ ] No console errors - [ ] Mobile responsive - [ ] PWA share target still works --- ## Implementation Order 1. **Story 1** - Progress Callback System (Foundation) 2. **Story 2** - SSE Extraction Endpoint (Backend) 3. **Story 3** - Frontend SSE Integration (Frontend) 4. **Story 4** - Visual Enhancements (Polish) 5. **Story 5** - E2E Testing (Validation) --- ## Architecture Compliance ### Hexagonal Architecture Verification ✅ **Core Domain Preserved:** - Extraction logic remains in domain layer - Progress callback is a port (interface) - No business logic in adapters ✅ **Clean Adapter Separation:** - SSE endpoint is delivery adapter - Frontend is primary adapter - Extraction strategies are secondary adapters ✅ **Dependency Inversion:** - Core defines callback port - Adapters implement/use port - No core dependency on SSE or frontend --- ## Success Metrics | Metric | Target | How to Measure | |--------|--------|----------------| | Real-time visibility | 100% | All extraction steps visible in logs | | Method identification | 100% | User knows which method worked | | Retry transparency | 100% | Retry attempts shown with timing | | Error clarity | 90%+ | Errors explain what failed and why | | Full pipeline completion | 95%+ | Extract → Parse → Display → Tandoor | --- ## Rollback Plan 1. Keep original `/api/extract` endpoint functional 2. Frontend can fall back to POST if SSE fails 3. Add feature flag: `USE_SSE_EXTRACTION=true/false` 4. No database changes required --- ## Documentation Updates - [ ] Update README with SSE extraction feature - [ ] Document event types and payload structure - [ ] Add troubleshooting for SSE connection issues - [ ] Document testing procedures --- ## Risks and Mitigations | Risk | Impact | Probability | Mitigation | |------|--------|-------------|------------| | SSE connection issues | High | Low | Fallback to original POST endpoint | | Browser SSE limitations | Medium | Low | Tested browser compatibility list | | Long extraction timeout | Medium | Medium | Show progress to keep user informed | | Stream buffering in proxies | Medium | Low | Add X-Accel-Buffering header | --- ## Future Enhancements - [ ] WebSocket for bi-directional communication - [ ] Pause/resume extraction - [ ] Multiple URL batch processing - [ ] Export logs to file - [ ] Performance metrics dashboard --- ## Conclusion This plan integrates the new multi-strategy Instagram extractor with the frontend through Server-Sent Events, providing users with real-time visibility into the extraction process. The implementation maintains Hexagonal Architecture principles while significantly enhancing user experience. **Next Step:** Proceed with implementation using `@dev IntegrateExtractionProgressFrontend`