fix(extraction): resolve progressCallback undefined errors

- Add progressCallback parameter to extractFromEmbeddedJSON and extractFromDOM
- Pass onProgress callback from extractWithStrategies to all strategies
- Fix legacy strategy to use correct callback variable name
- Verify extractViaGraphQL correctly returns null thumbnail

This fixes ReferenceError that was preventing all extraction methods from working.
All extraction strategies now properly emit thumbnail progress events via SSE.

Closes: FixProgressCallbackUndefinedErrors
This commit is contained in:
Giancarmine Salucci
2025-12-21 04:28:07 +01:00
parent 7e4d82de8d
commit 2de5567682
3 changed files with 497 additions and 5 deletions

View File

@@ -0,0 +1,256 @@
# Execution Plan: Fix ProgressCallback Undefined Errors
## Overview
**Outcome Name:** FixProgressCallbackUndefinedErrors
**Created:** 2025-12-21
**Status:** Ready for Implementation
## Problem Analysis
The extraction system is failing with `ReferenceError: progressCallback is not defined` at multiple locations in the extraction pipeline. The root cause is that several extraction methods are calling `extractThumbnailStealth(page, progressCallback)` with a `progressCallback` variable that doesn't exist in their scope.
### Affected Locations
1. **Line 224** - `extractFromEmbeddedJSON()` function
2. **Line 239** - `extractFromEmbeddedJSON()` function
3. **Line 346** - `extractFromDOM()` function
4. **Line 459** - Legacy extraction strategy in `extractWithStrategies()`
### Current Function Signatures
```typescript
// These functions don't have progressCallback parameter
async function extractFromEmbeddedJSON(page: Page): Promise<ExtractedContent | null>
async function extractFromDOM(page: Page): Promise<ExtractedContent | null>
// But they call this function with progressCallback
async function extractThumbnailStealth(
page: Page,
progressCallback?: ProgressCallback
): Promise<string | null>
// extractWithStrategies has the callback but doesn't pass it down
async function extractWithStrategies(
url: string,
page: Page,
context: BrowserContext,
onProgress?: ProgressCallback
): Promise<ExtractionResult>
```
## Root Cause
The `extractWithStrategies()` function receives an `onProgress` callback parameter but does NOT pass it to the individual extraction method functions (`extractFromEmbeddedJSON`, `extractFromDOM`). These functions then try to use an undefined `progressCallback` variable when calling `extractThumbnailStealth()`.
## Stories
### Story 1: Add ProgressCallback Parameter to extractFromEmbeddedJSON
**Priority:** High
**Acceptance Criteria:**
- Function signature updated to accept optional `progressCallback?: ProgressCallback`
- All calls to `extractThumbnailStealth()` within the function use the parameter
- No references to undefined variables
**Implementation Steps:**
1. Update function signature:
```typescript
async function extractFromEmbeddedJSON(
page: Page,
progressCallback?: ProgressCallback
): Promise<ExtractedContent | null>
```
2. Ensure both calls to `extractThumbnailStealth` (lines ~224 and ~239) use the parameter correctly (already doing this, just need to receive it)
**Technical Notes:**
- The function already correctly passes `progressCallback` to `extractThumbnailStealth`
- Just needs to receive it as a parameter instead of referencing undefined variable
---
### Story 2: Add ProgressCallback Parameter to extractFromDOM
**Priority:** High
**Acceptance Criteria:**
- Function signature updated to accept optional `progressCallback?: ProgressCallback`
- Call to `extractThumbnailStealth()` at line ~346 uses the parameter
- No references to undefined variables
**Implementation Steps:**
1. Update function signature:
```typescript
async function extractFromDOM(
page: Page,
progressCallback?: ProgressCallback
): Promise<ExtractedContent | null>
```
2. The call to `extractThumbnailStealth` at line ~346 already uses the parameter correctly
**Technical Notes:**
- Single call to `extractThumbnailStealth` in this function
- Already written to use `progressCallback`, just needs to receive it
---
### Story 3: Update extractWithStrategies to Pass Callback to Strategy Functions
**Priority:** High
**Acceptance Criteria:**
- All strategy functions receive the `onProgress` callback
- Legacy strategy inline function updated to use parameter correctly
- No undefined variable references in any strategy
**Implementation Steps:**
1. Update the strategies array to pass `onProgress` to each function:
```typescript
const strategies: Array<{
name: ExtractionMethod;
fn: () => Promise<ExtractedContent | null>;
}> = [
{
name: 'embedded-json',
fn: () => extractFromEmbeddedJSON(page, onProgress)
},
{
name: 'dom-selector',
fn: () => extractFromDOM(page, onProgress)
},
{
name: 'graphql-api',
fn: () => extractViaGraphQL(url, context)
},
{
name: 'legacy',
fn: async () => {
const text = await extractCleanTextLegacy(page);
const thumbnail = await extractThumbnailStealth(page, onProgress);
return { bodyText: text, thumbnail };
}
}
];
```
2. Verify the legacy strategy's inline function uses `onProgress` instead of `progressCallback`
**Technical Notes:**
- The legacy strategy is defined inline at line ~459
- Currently references undefined `progressCallback`
- Should use `onProgress` from the parent function's scope
- GraphQL strategy doesn't extract thumbnails so doesn't need the callback
---
### Story 4: Verify extractViaGraphQL Doesn't Need Callback
**Priority:** Medium
**Acceptance Criteria:**
- Confirm `extractViaGraphQL` doesn't extract thumbnails
- Ensure it doesn't have undefined variable references
- Document if thumbnail extraction should be added in the future
**Implementation Steps:**
1. Review `extractViaGraphQL` function (lines ~360-410)
2. Confirm it only extracts text, not thumbnails
3. Add code comment if thumbnail extraction could be beneficial
**Technical Notes:**
- Currently returns `ExtractedContent` but likely with `thumbnail: null`
- May need enhancement in future to extract thumbnail via GraphQL data
- Not urgent for this fix since it doesn't reference `progressCallback`
---
### Story 5: Verify All Other Functions Using extractThumbnailStealth
**Priority:** Low
**Acceptance Criteria:**
- Search entire codebase for `extractThumbnailStealth` calls
- Ensure all callers properly pass or omit the optional parameter
- No other undefined variable references exist
**Implementation Steps:**
1. Use grep/search to find all calls to `extractThumbnailStealth`
2. Verify each call either:
- Passes a valid `ProgressCallback` parameter, or
- Intentionally omits it (relying on optional parameter)
3. Document any findings
**Technical Notes:**
- The parameter is optional, so omitting it is valid
- But passing an undefined variable is not valid
- Found 5 calls total in extraction.ts (4 problematic, 1 direct call)
---
## Dependencies
```mermaid
graph TD
A[Story 1: Fix extractFromEmbeddedJSON] --> C[Story 3: Update extractWithStrategies]
B[Story 2: Fix extractFromDOM] --> C
C --> D[Story 5: Verify All Callers]
E[Story 4: Verify GraphQL] --> D
```
**Critical Path:** Stories 1, 2, 3 must be completed together as they form a cohesive parameter-passing chain.
## Testing Strategy
### Unit Testing
- Test each extraction function with and without `progressCallback` parameter
- Verify callback is invoked when thumbnail is extracted
- Verify no errors when callback is omitted
### Integration Testing
- Test full extraction flow with SSE endpoint
- Verify thumbnail progress events are emitted correctly
- Test all extraction methods (embedded-json, dom-selector, graphql, legacy)
### Manual Testing
1. Start dev server
2. Attempt to extract from Instagram URL
3. Verify no `ReferenceError: progressCallback is not defined` errors
4. Verify thumbnail progress events appear in SSE stream
5. Test retry logic still works correctly
## Risk Assessment
**Risk Level:** Low
**Impact:** High (currently breaks all extractions)
**Complexity:** Low (parameter passing fix)
### Potential Issues
1. **Breaking existing callers** - Mitigated by making parameter optional
2. **Missing other undefined references** - Mitigated by Story 5 verification
3. **Callback not propagating** - Mitigated by following the call chain
### Rollback Strategy
- Changes are additive (adding optional parameters)
- No breaking changes to function signatures
- Easy to revert if issues arise
## Implementation Checklist
- [ ] Story 1: Update `extractFromEmbeddedJSON` signature
- [ ] Story 2: Update `extractFromDOM` signature
- [ ] Story 3: Update `extractWithStrategies` to pass callbacks
- [ ] Story 4: Review and document `extractViaGraphQL`
- [ ] Story 5: Verify all `extractThumbnailStealth` callers
- [ ] Run extraction tests
- [ ] Manual testing with dev server
- [ ] Commit with descriptive message
## Success Criteria
1. ✅ No `ReferenceError: progressCallback is not defined` errors
2. ✅ All extraction methods work correctly
3. ✅ Thumbnail progress events are emitted via SSE
4. ✅ Retry logic continues to function
5. ✅ No regression in existing functionality
## Technical References
- **File:** [src/lib/server/extraction.ts](src/lib/server/extraction.ts)
- **Type Definition:** `ProgressCallback = (event: ProgressEvent) => void` (line 22)
- **SSE Integration:** [src/routes/api/extract-stream/+server.ts](src/routes/api/extract-stream/+server.ts)
## Notes
This is a straightforward parameter-passing bug introduced during a recent refactor where thumbnail extraction was enhanced with progress callbacks. The extraction functions were updated to call `extractThumbnailStealth` with a callback, but weren't updated to receive that callback as a parameter.
The fix maintains backward compatibility by making the parameter optional, allowing the functions to work with or without progress tracking.