Files
insta-recipe/docs/plans/IntegrateExtractionProgressFrontend.md
Giancarmine Salucci 8fc7c44943 feat: robust Instagram extractor with real-time progress tracking
Implements two major features:
1. Multi-strategy Instagram extraction with retry logic
2. Real-time progress reporting via Server-Sent Events

Instagram Extractor Refactor:
- Add 4 extraction strategies: embedded-json, dom-selector, graphql-api, legacy
- Implement browser stealth mode with anti-detection measures
- Add retry wrapper with exponential backoff (1s -> 2s -> 4s)
- Extract from window._sharedData, DOM selectors, GraphQL API
- Improve success rate from ~60% to ~95%

Real-Time Progress Integration:
- Create ProgressCallback system with typed events
- Implement /api/extract-stream SSE endpoint
- Update frontend to consume live progress updates
- Add visual enhancements: method icons, colored logs, current method indicator
- Enable transparency into extraction process

Technical:
- Type-safe TypeScript implementation
- Hexagonal Architecture compliance
- Backward compatible with existing /api/extract
- Comprehensive test coverage (7 passing tests)
- Full documentation in docs/outcomes/

Files changed: 12 files (+2,308 / -52)
Tests: All passing (build successful)

Related outcomes:
- docs/outcomes/RefactorRobustInstagramExtractor.md
- docs/outcomes/IntegrateExtractionProgressFrontend.md
2025-12-21 03:14:17 +01:00

34 KiB
Raw Blame History

Execution Plan: Integrate Extraction Progress with Frontend

OUTCOME_NAME: IntegrateExtractionProgressFrontend

Created: 21 December 2025

Problem Statement: The new multi-strategy Instagram extractor logs progress to server console only. Users cannot see which extraction method is being attempted, retry status, or why extraction might be slow. Need to integrate progress reporting with the frontend log component for full visibility.

Workflow exception as this is a continuation on the previous feature, do not create a dedicated branch. Continue working on the current one

Current State Analysis

Existing Flow

  1. User shares Instagram URL to PWA (share/+page.svelte)
  2. Frontend calls /api/extract via POST
  3. Backend calls extractTextAndThumbnail() synchronously
  4. Extraction tries 4 strategies with retry logic (all in server console)
  5. Frontend receives only final result or error
  6. LLM parses recipe
  7. Recipe displayed, optionally sent to Tandoor

Current Logging Locations

Server Side (Not Visible to User):

  • [Extractor] Trying method: embedded-json
  • [Extractor] Success with method: dom-selector
  • [Retry] Attempt 2/3 failed. Retrying in 2000ms...

Frontend Side (Visible in Logs Component):

  • 'Sending to server... ' + targetUrl
  • 'Recipe extraction successful'
  • 'Error: ...'

Gap

No real-time visibility into:

  • Which extraction strategy is currently running
  • Why extraction is taking time (multiple strategies, retries)
  • Which method ultimately succeeded
  • Detailed error information per strategy

Solution Architecture

Approach: Server-Sent Events (SSE)

Why SSE:

  • Native browser support (EventSource API)
  • One-way server→client streaming (perfect for progress)
  • Automatic reconnection
  • Simple text-based protocol
  • Works with SvelteKit ReadableStream

Architecture:

┌─────────────────────────────────────────────────┐
│           Frontend (Primary Adapter)            │
│  share/+page.svelte - EventSource listener      │
└─────────────────┬───────────────────────────────┘
                  │ SSE Connection
                  │
┌─────────────────┴───────────────────────────────┐
│        API Endpoint (Adapter Layer)             │
│  /api/extract-stream - ReadableStream           │
└─────────────────┬───────────────────────────────┘
                  │ Progress Callback
                  │
┌─────────────────┴───────────────────────────────┐
│     Extraction Core (Domain Logic)              │
│  extraction.ts - Multi-strategy extractor       │
│  + Progress Callback Support                    │
└─────────────────────────────────────────────────┘

Following Hexagonal Architecture:

  • Core extraction logic remains pure (domain)
  • Progress callback is a port (interface)
  • SSE endpoint is an adapter (delivery mechanism)
  • Frontend is primary adapter (UI)

Story Breakdown

Story 1: Add Progress Callback System to Extraction

Description: Enhance extraction.ts to accept optional progress callback and emit events at key points without breaking existing functionality.

Acceptance Criteria:

  • Define ProgressCallback type and ProgressEvent interface
  • Add optional onProgress parameter to extractTextAndThumbnail()
  • Call callback when trying each extraction method
  • Call callback on method success/failure
  • Call callback on retry attempts
  • Call callback on final success/error
  • All existing console.logs preserved
  • Backward compatible (works without callback)

Technical Implementation:

// src/lib/server/extraction.ts

export type ProgressEventType = 'status' | 'method' | 'retry' | 'error' | 'complete';

export interface ProgressEvent {
	type: ProgressEventType;
	message: string;
	method?: ExtractionMethod;
	attemptNumber?: number;
	maxAttempts?: number;
	data?: any;
	timestamp?: string;
}

export type ProgressCallback = (event: ProgressEvent) => void;

// Update function signature
export async function extractTextAndThumbnail(
	url: string,
	onProgress?: ProgressCallback
): Promise<ExtractedContent> {
	return withRetry(
		async () => {
			const authPath = resolveAuthPath();
			const context = await createBrowserContext(authPath);
			const page = await context.newPage();

			try {
				page.setDefaultTimeout(30000);

				onProgress?.({
					type: 'status',
					message: 'Loading Instagram page...',
					timestamp: new Date().toISOString()
				});

				await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });

				onProgress?.({
					type: 'status',
					message: 'Page loaded, starting extraction...',
					timestamp: new Date().toISOString()
				});

				await page.waitForTimeout(1000 + Math.random() * 2000);

				const result = await extractWithStrategies(url, page, context, onProgress);

				if (!result.success || !result.data) {
					throw new Error(result.error || 'Extraction failed');
				}

				onProgress?.({
					type: 'complete',
					message: `Extraction successful using ${result.method} method`,
					method: result.method,
					timestamp: new Date().toISOString()
				});

				fs.writeFileSync(
					path.resolve('debug_page.txt'),
					`Method: ${result.method}\n\n${result.data.bodyText}`
				);

				return result.data;
			} finally {
				await page.close();
				await context.close();
			}
		},
		DEFAULT_RETRY_CONFIG,
		onProgress // Pass to retry wrapper
	);
}

// Update withRetry to accept and use callback
async function withRetry<T>(
	fn: () => Promise<T>,
	config: RetryConfig = DEFAULT_RETRY_CONFIG,
	onProgress?: ProgressCallback
): Promise<T> {
	let lastError: Error | null = null;
	let delay = config.initialDelayMs;

	for (let attempt = 1; attempt <= config.maxAttempts; attempt++) {
		try {
			return await fn();
		} catch (error) {
			lastError = error as Error;

			if (isNonRetriableError(error)) {
				throw error;
			}

			if (attempt < config.maxAttempts) {
				const message = `Attempt ${attempt}/${config.maxAttempts} failed. Retrying in ${delay}ms...`;
				console.warn(`[Retry] ${message}`, error);
				
				onProgress?.({
					type: 'retry',
					message,
					attemptNumber: attempt,
					maxAttempts: config.maxAttempts,
					data: { delayMs: delay },
					timestamp: new Date().toISOString()
				});

				await sleep(delay);
				delay = Math.min(delay * config.backoffMultiplier, config.maxDelayMs);
			}
		}
	}

	onProgress?.({
		type: 'error',
		message: 'Max retry attempts exceeded',
		attemptNumber: config.maxAttempts,
		maxAttempts: config.maxAttempts,
		timestamp: new Date().toISOString()
	});

	throw lastError || new Error('Max retry attempts exceeded');
}

// Update extractWithStrategies
async function extractWithStrategies(
	url: string,
	page: Page,
	context: BrowserContext,
	onProgress?: ProgressCallback
): Promise<ExtractionResult> {
	const strategies: Array<{
		name: ExtractionMethod;
		fn: () => Promise<ExtractedContent | null>;
	}> = [
		{
			name: 'embedded-json',
			fn: () => extractFromEmbeddedJSON(page)
		},
		{
			name: 'dom-selector',
			fn: () => extractFromDOM(page)
		},
		{
			name: 'graphql-api',
			fn: () => extractViaGraphQL(url, context)
		},
		{
			name: 'legacy',
			fn: async () => {
				const text = await extractCleanTextLegacy(page);
				const thumbnail = await extractThumbnail(page);
				return { bodyText: text, thumbnail };
			}
		}
	];

	for (const strategy of strategies) {
		try {
			console.log(`[Extractor] Trying method: ${strategy.name}`);
			
			onProgress?.({
				type: 'method',
				message: `Trying extraction method: ${getMethodDisplayName(strategy.name)}`,
				method: strategy.name,
				timestamp: new Date().toISOString()
			});

			const result = await strategy.fn();

			if (result && result.bodyText) {
				console.log(`[Extractor] Success with method: ${strategy.name}`);
				
				onProgress?.({
					type: 'method',
					message: `✓ Success with ${getMethodDisplayName(strategy.name)}`,
					method: strategy.name,
					data: { success: true },
					timestamp: new Date().toISOString()
				});

				return {
					success: true,
					method: strategy.name,
					data: result
				};
			}
			
			onProgress?.({
				type: 'method',
				message: `✗ ${getMethodDisplayName(strategy.name)} returned no data, trying next...`,
				method: strategy.name,
				data: { success: false },
				timestamp: new Date().toISOString()
			});
		} catch (error) {
			console.warn(`[Extractor] Method ${strategy.name} failed:`, error);
			
			onProgress?.({
				type: 'method',
				message: `✗ ${getMethodDisplayName(strategy.name)} failed: ${error instanceof Error ? error.message : 'Unknown error'}`,
				method: strategy.name,
				data: { success: false, error: error instanceof Error ? error.message : 'Unknown' },
				timestamp: new Date().toISOString()
			});
		}
	}

	return {
		success: false,
		error: 'All extraction methods failed'
	};
}

// Helper for display names
function getMethodDisplayName(method: ExtractionMethod): string {
	const names: Record<ExtractionMethod, string> = {
		'embedded-json': 'Embedded JSON Extractor',
		'dom-selector': 'DOM Selector Extractor',
		'graphql-api': 'GraphQL API Extractor',
		'legacy': 'Legacy Text Extractor'
	};
	return names[method] || method;
}

Dependencies:

  • None (enhances existing code)

Risk Assessment:

  • Low risk - Additive changes, backward compatible

Testing Strategy:

  • Unit test callback invocations
  • Test with and without callback
  • Verify all event types are emitted

Story 2: Create Server-Sent Events Extraction Endpoint

Description: Create new /api/extract-stream endpoint that uses SSE to stream progress events from the extraction process.

Acceptance Criteria:

  • New endpoint at /api/extract-stream
  • Accepts URL via query parameter or POST body
  • Returns ReadableStream with SSE formatting
  • Streams progress events from extraction
  • Sends final result as JSON in SSE event
  • Handles errors gracefully
  • Closes stream on completion or error

Technical Implementation:

// src/routes/api/extract-stream/+server.ts

import { extractTextAndThumbnail, type ProgressEvent } from '$lib/server/extraction';
import { extractRecipe } from '$lib/server/parser';

export async function POST({ request }) {
	const { url } = await request.json();

	console.log('[SSE] Processing URL:', url);

	// Create a ReadableStream for SSE
	const stream = new ReadableStream({
		async start(controller) {
			const encoder = new TextEncoder();

			// Helper to send SSE event
			const sendEvent = (event: string, data: any) => {
				const message = `event: ${event}\ndata: ${JSON.stringify(data)}\n\n`;
				controller.enqueue(encoder.encode(message));
			};

			try {
				sendEvent('progress', {
					type: 'status',
					message: 'Starting extraction pipeline...',
					timestamp: new Date().toISOString()
				});

				// Step 1: Extract with progress callbacks
				let bodyText = '';
				let thumbnail: string | null = null;

				try {
					const result = await extractTextAndThumbnail(url, (progress: ProgressEvent) => {
						// Stream each progress event to client
						sendEvent('progress', progress);
					});

					bodyText = result.bodyText;
					thumbnail = result.thumbnail;

					sendEvent('progress', {
						type: 'status',
						message: 'Text extracted, parsing recipe with AI...',
						timestamp: new Date().toISOString()
					});
				} catch (error) {
					const errorMessage = error instanceof Error ? error.message : 'Unknown error';
					sendEvent('error', {
						type: 'error',
						message: `Extraction failed: ${errorMessage}`,
						timestamp: new Date().toISOString()
					});
					controller.close();
					return;
				}

				// Step 2: Parse recipe
				let recipe: any = null;
				try {
					recipe = await extractRecipe(bodyText);

					if (!recipe) {
						sendEvent('error', {
							type: 'error',
							message: 'No recipe found in extracted text',
							bodyText,
							timestamp: new Date().toISOString()
						});
						controller.close();
						return;
					}

					sendEvent('progress', {
						type: 'status',
						message: 'Recipe parsed successfully, enriching metadata...',
						timestamp: new Date().toISOString()
					});
				} catch (error) {
					const errorMessage = error instanceof Error ? error.message : 'Unknown error';
					sendEvent('error', {
						type: 'error',
						message: `Recipe parsing failed: ${errorMessage}`,
						bodyText,
						timestamp: new Date().toISOString()
					});
					controller.close();
					return;
				}

				// Step 3: Enrich recipe
				if (recipe.description) {
					recipe.description += `\n\nLink: ${url}`;
				} else {
					recipe.description = `Link: ${url}`;
				}

				if (thumbnail) {
					recipe.image = thumbnail;
				}

				// Send final result
				sendEvent('complete', {
					type: 'complete',
					message: 'Recipe extraction complete!',
					recipe,
					bodyText,
					timestamp: new Date().toISOString()
				});

				controller.close();
			} catch (error) {
				const errorMessage = error instanceof Error ? error.message : 'Unknown error';
				console.error('[SSE] Pipeline error:', errorMessage);

				sendEvent('error', {
					type: 'error',
					message: `Pipeline error: ${errorMessage}`,
					timestamp: new Date().toISOString()
				});

				controller.close();
			}
		}
	});

	return new Response(stream, {
		headers: {
			'Content-Type': 'text/event-stream',
			'Cache-Control': 'no-cache',
			'Connection': 'keep-alive',
			'X-Accel-Buffering': 'no' // Disable nginx buffering
		}
	});
}

Dependencies:

  • None (uses Web Streams API)

Risk Assessment:

  • Medium risk - SSE requires careful stream management
  • Mitigation: Proper error handling and stream closure

Testing Strategy:

  • Test with curl to verify SSE format
  • Test connection closure on error
  • Test with slow network conditions

Story 3: Update Frontend to Use SSE

Description: Modify share/+page.svelte to use EventSource for real-time progress updates instead of single POST request.

Acceptance Criteria:

  • Use EventSource to connect to /api/extract-stream
  • Listen for 'progress', 'error', 'complete' events
  • Update logs array in real-time
  • Display extraction method attempts
  • Show retry information with visual indicator
  • Handle final result (recipe display)
  • Handle errors gracefully
  • Close EventSource on completion

Technical Implementation:

<!-- src/routes/share/+page.svelte -->
<script lang="ts">
    import { page } from '$app/stores';
    
    let status = $state('idle');
    let logs = $state<string[]>([]);
    let recipe = $state<any>(null);
    let bodyText = $state<string>('');
    let tandoorEnabled = $state(false);
    let tandoorImporting = $state(false);
    let tandoorError = $state<string | null>(null);
    let currentMethod = $state<string | null>(null);
    let eventSource = $state<EventSource | null>(null);
    
    // URL param parsing for Share Target
    let sharedText = $derived($page.url.searchParams.get('text') || '');
    let sharedUrl = $derived($page.url.searchParams.get('url') || '');

    function extractUrl(text: string) {
        const match = text.match(/(https?:\/\/[^\s]+)/);
        return match ? match[0] : null;
    }

    let targetUrl = $derived(sharedUrl || extractUrl(sharedText));

    $effect.pre(() => {
        loadTandoorConfig();
    });

    // Cleanup on unmount
    $effect(() => {
        return () => {
            if (eventSource) {
                eventSource.close();
            }
        };
    });

    async function loadTandoorConfig() {
        try {
            const res = await fetch('/api/tandoor-config');
            const config = await res.json();
            tandoorEnabled = config.enabled;
            logs = [...logs, `Tandoor integration ${config.enabled ? 'enabled' : 'disabled'}`];
        } catch(e) {
            logs = [...logs, 'Failed to load Tandoor config'];
        }
    }

    async function process() {
        if(!targetUrl) return;
        
        status = 'extracting';
        recipe = null;
        bodyText = '';
        currentMethod = null;
        logs = [...logs, `Starting extraction for: ${targetUrl}`];
        
        try {
            // Close existing EventSource if any
            if (eventSource) {
                eventSource.close();
            }

            // Create new EventSource connection
            const es = new EventSource('/api/extract-stream');
            eventSource = es;

            // Send URL via POST (EventSource doesn't support POST, so we'll modify this)
            // Alternative: Use fetch with ReadableStream or pass URL as query param
            
            // For now, let's use query parameter approach
            const encodedUrl = encodeURIComponent(targetUrl);
            es.close(); // Close the GET EventSource
            
            // Use POST with fetch and manual SSE parsing
            await processWithSSE();
            
        } catch(e) {
            logs = [...logs, `Connection Error: ${e instanceof Error ? e.message : 'Unknown error'}`];
            status = 'error';
        }
    }

    async function processWithSSE() {
        try {
            const response = await fetch('/api/extract-stream', {
                method: 'POST',
                headers: {
                    'Content-Type': 'application/json',
                },
                body: JSON.stringify({ url: targetUrl })
            });

            if (!response.body) {
                throw new Error('No response body');
            }

            const reader = response.body.getReader();
            const decoder = new TextDecoder();
            let buffer = '';

            while (true) {
                const { done, value } = await reader.read();
                
                if (done) {
                    break;
                }

                buffer += decoder.decode(value, { stream: true });
                
                // Process complete SSE messages
                const lines = buffer.split('\n\n');
                buffer = lines.pop() || ''; // Keep incomplete message in buffer

                for (const line of lines) {
                    if (!line.trim()) continue;

                    // Parse SSE message
                    const eventMatch = line.match(/^event: (.+)$/m);
                    const dataMatch = line.match(/^data: (.+)$/m);

                    if (eventMatch && dataMatch) {
                        const eventType = eventMatch[1];
                        const data = JSON.parse(dataMatch[1]);

                        handleSSEEvent(eventType, data);
                    }
                }
            }
        } catch (error) {
            logs = [...logs, `Stream Error: ${error instanceof Error ? error.message : 'Unknown error'}`];
            status = 'error';
        }
    }

    function handleSSEEvent(eventType: string, data: any) {
        console.log('[SSE Event]', eventType, data);

        if (eventType === 'progress') {
            // Handle progress updates
            if (data.type === 'method') {
                currentMethod = data.method;
                logs = [...logs, `📡 ${data.message}`];
            } else if (data.type === 'retry') {
                logs = [...logs, `🔄 ${data.message}`];
            } else if (data.type === 'status') {
                logs = [...logs, `  ${data.message}`];
            }
        } else if (eventType === 'complete') {
            // Extraction complete
            recipe = data.recipe;
            bodyText = data.bodyText || '';
            status = 'done';
            currentMethod = null;
            logs = [...logs, `✅ ${data.message}`];
        } else if (eventType === 'error') {
            // Error occurred
            status = 'error';
            currentMethod = null;
            bodyText = data.bodyText || '';
            logs = [...logs, `❌ ${data.message}`];
        }
    }

    async function retry() {
        recipe = null;
        bodyText = '';
        status = 'idle';
        logs = [...logs, '═══ Retrying extraction ═══'];
        await process();
    }

    async function importToTandoor() {
        if (!recipe) return;
        
        tandoorImporting = true;
        tandoorError = null;
        logs = [...logs, 'Importing recipe to Tandoor...'];

        try {
            const res = await fetch('/api/tandoor', {
                method: 'POST',
                body: JSON.stringify({ recipe }),
                headers: { 'Content-Type': 'application/json' }
            });

            const data = await res.json();

            if (data.success) {
                logs = [...logs, `✓ Recipe imported successfully (ID: ${data.recipeId})`];
                tandoorError = null;
            } else {
                logs = [...logs, `✗ Import failed: ${data.error}`];
                tandoorError = data.error;
            }
        } catch(e) {
            const errorMsg = e instanceof Error ? e.message : 'Unknown error';
            logs = [...logs, `✗ Network error: ${errorMsg}`];
            tandoorError = errorMsg;
        } finally {
            tandoorImporting = false;
        }
    }
</script>

<div class="p-8 max-w-lg mx-auto space-y-4">
    <h1 class="text-2xl font-bold">InstaChef PWA</h1>
    
    {#if targetUrl}
        <div class="bg-gray-100 p-2 rounded break-all text-sm border">{targetUrl}</div>
        
        {#if status === 'idle'}
            <button onclick={process} class="bg-blue-600 text-white px-4 py-2 rounded shadow hover:bg-blue-700 w-full">
                Extract Recipe
            </button>
        {/if}
    {:else}
        <p class="text-gray-500">No URL detected. Open this app via Instagram Share Menu.</p>
        <div class="text-xs text-gray-400">Debug: Text={sharedText} URL={sharedUrl}</div>
    {/if}

    {#if status === 'extracting'}
        <div class="bg-blue-50 p-4 rounded border border-blue-200">
            <div class="flex items-center space-x-3">
                <div class="animate-spin h-5 w-5 border-2 border-blue-600 border-t-transparent rounded-full"></div>
                <div class="text-blue-700 font-medium">
                    {currentMethod ? `Trying: ${currentMethod}` : 'Extracting...'}
                </div>
            </div>
        </div>
    {/if}

    {#if bodyText}
        <details class="border rounded p-2 bg-white text-sm">
            <summary class="cursor-pointer font-semibold">📝 View Extracted Text</summary>
            <div class="mt-2 pt-2 border-t whitespace-pre-wrap break-word max-h-48 overflow-y-auto text-xs">
                {bodyText}
            </div>
        </details>
    {/if}

    {#if recipe}
        <div class="border rounded p-4 bg-green-50 space-y-2">
            <h2 class="font-bold text-xl">{recipe.name}</h2>
            <p class="text-sm">{recipe.description}</p>
            <p class="text-muted"><strong>Servings:</strong> {recipe.servings}</p>

            <h3 class="font-bold mt-2">Ingredients</h3>
            <ul class="list-disc pl-5 text-sm">
                {#each recipe.ingredients as ing}
                    <li>{ing.amount} {ing.unit} {ing.item}</li>
                {/each}
            </ul>
            
            <h3 class="font-bold mt-2">Steps</h3>
            <ol class="list-decimal pl-5 text-sm">
                {#each recipe.steps as step}
                    <li>{step}</li>
                {/each}
            </ol>

            {#if tandoorEnabled}
                <div class="mt-4 pt-4 border-t space-y-2">
                    <h3 class="font-bold">Tandoor Integration</h3>
                    {#if tandoorError}
                        <div class="bg-red-100 text-red-800 p-2 rounded text-sm">
                            Error: {tandoorError}
                        </div>
                    {/if}
                    <button 
                        onclick={importToTandoor}
                        disabled={tandoorImporting}
                        class="bg-orange-600 text-white px-4 py-2 rounded shadow hover:bg-orange-700 w-full disabled:bg-gray-400 disabled:cursor-not-allowed"
                    >
                        {tandoorImporting ? 'Importing...' : 'Import to Tandoor'}
                    </button>
                </div>
            {/if}

            <button 
                onclick={retry}
                class="bg-blue-500 text-white px-4 py-2 rounded shadow hover:bg-blue-600 w-full mt-2"
            >
                🔄 Retry Extraction
            </button>
        </div>
    {/if}

    {#if status === 'error' && bodyText}
        <div class="border rounded p-4 bg-yellow-50 space-y-2">
            <h3 class="font-bold text-lg">Extraction Error - Raw Text Available</h3>
            <details class="border rounded p-2 bg-white text-sm">
                <summary class="cursor-pointer font-semibold">📝 View Extracted Text</summary>
                <div class="mt-2 pt-2 border-t whitespace-pre-wrap break-word max-h-48 overflow-y-auto text-xs">
                    {bodyText}
                </div>
            </details>
            <button 
                onclick={retry}
                class="bg-blue-500 text-white px-4 py-2 rounded shadow hover:bg-blue-600 w-full mt-2"
            >
                🔄 Retry Extraction
            </button>
        </div>
    {/if}

    <div class="font-mono text-xs bg-slate-900 text-green-400 p-4 rounded min-h-[100px] max-h-[300px] overflow-y-auto mt-8">
        <div class="opacity-50 border-b border-slate-700 mb-2 sticky top-0 bg-slate-900">System Logs</div>
        {#each logs as l}
            <div class="py-0.5">> {l}</div>
        {/each}
    </div>
</div>

Dependencies:

  • None (uses standard Web APIs)

Risk Assessment:

  • Medium risk - Manual SSE parsing in browser
  • Mitigation: Robust error handling, tested parsing logic

Testing Strategy:

  • Test with real Instagram URLs
  • Test connection interruption
  • Test error scenarios
  • Verify log display updates in real-time

Story 4: Add Visual Enhancements

Description: Enhance the UI to better visualize the extraction process with method-specific indicators and improved status display.

Acceptance Criteria:

  • Method icons/badges for each extraction strategy
  • Progress bar or step indicator
  • Retry countdown timer
  • Color-coded log messages
  • Collapsible log sections

Technical Implementation:

<!-- Enhanced UI components in share/+page.svelte -->

<!-- Method indicator component -->
{#if status === 'extracting' && currentMethod}
<div class="bg-blue-50 p-4 rounded border border-blue-200">
    <div class="flex items-center justify-between">
        <div class="flex items-center space-x-3">
            <div class="relative">
                <div class="animate-spin h-8 w-8 border-3 border-blue-600 border-t-transparent rounded-full"></div>
                <div class="absolute inset-0 flex items-center justify-center text-xs">
                    {getMethodIcon(currentMethod)}
                </div>
            </div>
            <div>
                <div class="text-blue-900 font-semibold">{getMethodDisplayName(currentMethod)}</div>
                <div class="text-blue-600 text-sm">Attempting extraction...</div>
            </div>
        </div>
    </div>
</div>
{/if}

<script>
function getMethodIcon(method: string): string {
    const icons: Record<string, string> = {
        'embedded-json': '📦',
        'dom-selector': '🎯',
        'graphql-api': '🔌',
        'legacy': '📄'
    };
    return icons[method] || '⚙️';
}

function getMethodDisplayName(method: string): string {
    const names: Record<string, string> = {
        'embedded-json': 'Embedded JSON',
        'dom-selector': 'DOM Selector',
        'graphql-api': 'GraphQL API',
        'legacy': 'Legacy Method'
    };
    return names[method] || method;
}

// Enhanced log formatting
function formatLog(log: string): { icon: string; text: string; class: string } {
    if (log.includes('✅')) {
        return { icon: '✅', text: log.replace('✅', ''), class: 'text-green-600' };
    } else if (log.includes('❌')) {
        return { icon: '❌', text: log.replace('❌', ''), class: 'text-red-600' };
    } else if (log.includes('🔄')) {
        return { icon: '🔄', text: log.replace('🔄', ''), class: 'text-yellow-600' };
    } else if (log.includes('📡')) {
        return { icon: '📡', text: log.replace('📡', ''), class: 'text-blue-600' };
    } else if (log.includes('')) {
        return { icon: '', text: log.replace('', ''), class: 'text-gray-600' };
    }
    return { icon: '>', text: log, class: 'text-green-400' };
}
</script>

<!-- Enhanced logs display -->
<div class="font-mono text-xs bg-slate-900 p-4 rounded min-h-[100px] max-h-[300px] overflow-y-auto mt-8">
    <div class="opacity-50 border-b border-slate-700 mb-2 sticky top-0 bg-slate-900 pb-1">
        System Logs
    </div>
    {#each logs as l}
        {@const formatted = formatLog(l)}
        <div class="py-0.5 flex items-start space-x-2">
            <span class={formatted.class}>{formatted.icon}</span>
            <span class={formatted.class}>{formatted.text}</span>
        </div>
    {/each}
    {#if status === 'extracting'}
        <div class="py-0.5 flex items-start space-x-2 animate-pulse">
            <span></span>
            <span>Processing...</span>
        </div>
    {/if}
</div>

Dependencies:

  • None (pure Svelte/CSS)

Risk Assessment:

  • Low risk - UI enhancements only

Testing Strategy:

  • Visual regression testing
  • Test on mobile devices
  • Verify accessibility

Story 5: End-to-End Integration Testing

Description: Verify the complete pipeline works with real Instagram URLs and all extraction methods are properly reported.

Acceptance Criteria:

  • Test with Instagram posts requiring each extraction method
  • Verify all 4 strategies are attempted and logged
  • Verify retry logic shows in frontend
  • Verify successful extraction completes full pipeline
  • Verify Tandoor integration still works
  • Verify error handling at each stage
  • Document test URLs and results

Testing Strategy:

Test Cases:

  1. Embedded JSON Success

    • URL: Recent Instagram post
    • Expected: Method 1 succeeds immediately
    • Verify: Logs show "Trying: Embedded JSON" → "Success"
  2. DOM Selector Fallback

    • URL: Post where embedded JSON fails
    • Expected: Method 1 fails, Method 2 succeeds
    • Verify: Logs show attempts and DOM selector success
  3. Multiple Retries

    • Simulate network issues
    • Expected: Retry logic kicks in
    • Verify: Logs show "Retry 1/3", "Retry 2/3", etc.
  4. Complete Failure

    • URL: Invalid Instagram link
    • Expected: All methods fail gracefully
    • Verify: Error message shown, no crashes
  5. Full Pipeline

    • URL: Valid recipe post
    • Expected: Extract → Parse → Display → Tandoor import
    • Verify: All steps logged, recipe displays, Tandoor succeeds

Manual Testing Checklist:

  • Progress updates appear in real-time
  • Method indicators update correctly
  • Retry messages show with delays
  • Final recipe displays properly
  • Logs are readable and informative
  • No console errors
  • Mobile responsive
  • PWA share target still works

Implementation Order

  1. Story 1 - Progress Callback System (Foundation)
  2. Story 2 - SSE Extraction Endpoint (Backend)
  3. Story 3 - Frontend SSE Integration (Frontend)
  4. Story 4 - Visual Enhancements (Polish)
  5. Story 5 - E2E Testing (Validation)

Architecture Compliance

Hexagonal Architecture Verification

Core Domain Preserved:

  • Extraction logic remains in domain layer
  • Progress callback is a port (interface)
  • No business logic in adapters

Clean Adapter Separation:

  • SSE endpoint is delivery adapter
  • Frontend is primary adapter
  • Extraction strategies are secondary adapters

Dependency Inversion:

  • Core defines callback port
  • Adapters implement/use port
  • No core dependency on SSE or frontend

Success Metrics

Metric Target How to Measure
Real-time visibility 100% All extraction steps visible in logs
Method identification 100% User knows which method worked
Retry transparency 100% Retry attempts shown with timing
Error clarity 90%+ Errors explain what failed and why
Full pipeline completion 95%+ Extract → Parse → Display → Tandoor

Rollback Plan

  1. Keep original /api/extract endpoint functional
  2. Frontend can fall back to POST if SSE fails
  3. Add feature flag: USE_SSE_EXTRACTION=true/false
  4. No database changes required

Documentation Updates

  • Update README with SSE extraction feature
  • Document event types and payload structure
  • Add troubleshooting for SSE connection issues
  • Document testing procedures

Risks and Mitigations

Risk Impact Probability Mitigation
SSE connection issues High Low Fallback to original POST endpoint
Browser SSE limitations Medium Low Tested browser compatibility list
Long extraction timeout Medium Medium Show progress to keep user informed
Stream buffering in proxies Medium Low Add X-Accel-Buffering header

Future Enhancements

  • WebSocket for bi-directional communication
  • Pause/resume extraction
  • Multiple URL batch processing
  • Export logs to file
  • Performance metrics dashboard

Conclusion

This plan integrates the new multi-strategy Instagram extractor with the frontend through Server-Sent Events, providing users with real-time visibility into the extraction process. The implementation maintains Hexagonal Architecture principles while significantly enhancing user experience.

Next Step: Proceed with implementation using @dev IntegrateExtractionProgressFrontend