- Add progressCallback parameter to extractFromEmbeddedJSON and extractFromDOM - Pass onProgress callback from extractWithStrategies to all strategies - Fix legacy strategy to use correct callback variable name - Verify extractViaGraphQL correctly returns null thumbnail This fixes ReferenceError that was preventing all extraction methods from working. All extraction strategies now properly emit thumbnail progress events via SSE. Closes: FixProgressCallbackUndefinedErrors
257 lines
9.2 KiB
Markdown
257 lines
9.2 KiB
Markdown
# Execution Plan: Fix ProgressCallback Undefined Errors
|
|
|
|
## Overview
|
|
**Outcome Name:** FixProgressCallbackUndefinedErrors
|
|
**Created:** 2025-12-21
|
|
**Status:** Ready for Implementation
|
|
|
|
## Problem Analysis
|
|
|
|
The extraction system is failing with `ReferenceError: progressCallback is not defined` at multiple locations in the extraction pipeline. The root cause is that several extraction methods are calling `extractThumbnailStealth(page, progressCallback)` with a `progressCallback` variable that doesn't exist in their scope.
|
|
|
|
### Affected Locations
|
|
1. **Line 224** - `extractFromEmbeddedJSON()` function
|
|
2. **Line 239** - `extractFromEmbeddedJSON()` function
|
|
3. **Line 346** - `extractFromDOM()` function
|
|
4. **Line 459** - Legacy extraction strategy in `extractWithStrategies()`
|
|
|
|
### Current Function Signatures
|
|
```typescript
|
|
// These functions don't have progressCallback parameter
|
|
async function extractFromEmbeddedJSON(page: Page): Promise<ExtractedContent | null>
|
|
async function extractFromDOM(page: Page): Promise<ExtractedContent | null>
|
|
|
|
// But they call this function with progressCallback
|
|
async function extractThumbnailStealth(
|
|
page: Page,
|
|
progressCallback?: ProgressCallback
|
|
): Promise<string | null>
|
|
|
|
// extractWithStrategies has the callback but doesn't pass it down
|
|
async function extractWithStrategies(
|
|
url: string,
|
|
page: Page,
|
|
context: BrowserContext,
|
|
onProgress?: ProgressCallback
|
|
): Promise<ExtractionResult>
|
|
```
|
|
|
|
## Root Cause
|
|
|
|
The `extractWithStrategies()` function receives an `onProgress` callback parameter but does NOT pass it to the individual extraction method functions (`extractFromEmbeddedJSON`, `extractFromDOM`). These functions then try to use an undefined `progressCallback` variable when calling `extractThumbnailStealth()`.
|
|
|
|
## Stories
|
|
|
|
### Story 1: Add ProgressCallback Parameter to extractFromEmbeddedJSON
|
|
**Priority:** High
|
|
**Acceptance Criteria:**
|
|
- Function signature updated to accept optional `progressCallback?: ProgressCallback`
|
|
- All calls to `extractThumbnailStealth()` within the function use the parameter
|
|
- No references to undefined variables
|
|
|
|
**Implementation Steps:**
|
|
1. Update function signature:
|
|
```typescript
|
|
async function extractFromEmbeddedJSON(
|
|
page: Page,
|
|
progressCallback?: ProgressCallback
|
|
): Promise<ExtractedContent | null>
|
|
```
|
|
|
|
2. Ensure both calls to `extractThumbnailStealth` (lines ~224 and ~239) use the parameter correctly (already doing this, just need to receive it)
|
|
|
|
**Technical Notes:**
|
|
- The function already correctly passes `progressCallback` to `extractThumbnailStealth`
|
|
- Just needs to receive it as a parameter instead of referencing undefined variable
|
|
|
|
---
|
|
|
|
### Story 2: Add ProgressCallback Parameter to extractFromDOM
|
|
**Priority:** High
|
|
**Acceptance Criteria:**
|
|
- Function signature updated to accept optional `progressCallback?: ProgressCallback`
|
|
- Call to `extractThumbnailStealth()` at line ~346 uses the parameter
|
|
- No references to undefined variables
|
|
|
|
**Implementation Steps:**
|
|
1. Update function signature:
|
|
```typescript
|
|
async function extractFromDOM(
|
|
page: Page,
|
|
progressCallback?: ProgressCallback
|
|
): Promise<ExtractedContent | null>
|
|
```
|
|
|
|
2. The call to `extractThumbnailStealth` at line ~346 already uses the parameter correctly
|
|
|
|
**Technical Notes:**
|
|
- Single call to `extractThumbnailStealth` in this function
|
|
- Already written to use `progressCallback`, just needs to receive it
|
|
|
|
---
|
|
|
|
### Story 3: Update extractWithStrategies to Pass Callback to Strategy Functions
|
|
**Priority:** High
|
|
**Acceptance Criteria:**
|
|
- All strategy functions receive the `onProgress` callback
|
|
- Legacy strategy inline function updated to use parameter correctly
|
|
- No undefined variable references in any strategy
|
|
|
|
**Implementation Steps:**
|
|
1. Update the strategies array to pass `onProgress` to each function:
|
|
```typescript
|
|
const strategies: Array<{
|
|
name: ExtractionMethod;
|
|
fn: () => Promise<ExtractedContent | null>;
|
|
}> = [
|
|
{
|
|
name: 'embedded-json',
|
|
fn: () => extractFromEmbeddedJSON(page, onProgress)
|
|
},
|
|
{
|
|
name: 'dom-selector',
|
|
fn: () => extractFromDOM(page, onProgress)
|
|
},
|
|
{
|
|
name: 'graphql-api',
|
|
fn: () => extractViaGraphQL(url, context)
|
|
},
|
|
{
|
|
name: 'legacy',
|
|
fn: async () => {
|
|
const text = await extractCleanTextLegacy(page);
|
|
const thumbnail = await extractThumbnailStealth(page, onProgress);
|
|
return { bodyText: text, thumbnail };
|
|
}
|
|
}
|
|
];
|
|
```
|
|
|
|
2. Verify the legacy strategy's inline function uses `onProgress` instead of `progressCallback`
|
|
|
|
**Technical Notes:**
|
|
- The legacy strategy is defined inline at line ~459
|
|
- Currently references undefined `progressCallback`
|
|
- Should use `onProgress` from the parent function's scope
|
|
- GraphQL strategy doesn't extract thumbnails so doesn't need the callback
|
|
|
|
---
|
|
|
|
### Story 4: Verify extractViaGraphQL Doesn't Need Callback
|
|
**Priority:** Medium
|
|
**Acceptance Criteria:**
|
|
- Confirm `extractViaGraphQL` doesn't extract thumbnails
|
|
- Ensure it doesn't have undefined variable references
|
|
- Document if thumbnail extraction should be added in the future
|
|
|
|
**Implementation Steps:**
|
|
1. Review `extractViaGraphQL` function (lines ~360-410)
|
|
2. Confirm it only extracts text, not thumbnails
|
|
3. Add code comment if thumbnail extraction could be beneficial
|
|
|
|
**Technical Notes:**
|
|
- Currently returns `ExtractedContent` but likely with `thumbnail: null`
|
|
- May need enhancement in future to extract thumbnail via GraphQL data
|
|
- Not urgent for this fix since it doesn't reference `progressCallback`
|
|
|
|
---
|
|
|
|
### Story 5: Verify All Other Functions Using extractThumbnailStealth
|
|
**Priority:** Low
|
|
**Acceptance Criteria:**
|
|
- Search entire codebase for `extractThumbnailStealth` calls
|
|
- Ensure all callers properly pass or omit the optional parameter
|
|
- No other undefined variable references exist
|
|
|
|
**Implementation Steps:**
|
|
1. Use grep/search to find all calls to `extractThumbnailStealth`
|
|
2. Verify each call either:
|
|
- Passes a valid `ProgressCallback` parameter, or
|
|
- Intentionally omits it (relying on optional parameter)
|
|
3. Document any findings
|
|
|
|
**Technical Notes:**
|
|
- The parameter is optional, so omitting it is valid
|
|
- But passing an undefined variable is not valid
|
|
- Found 5 calls total in extraction.ts (4 problematic, 1 direct call)
|
|
|
|
---
|
|
|
|
## Dependencies
|
|
|
|
```mermaid
|
|
graph TD
|
|
A[Story 1: Fix extractFromEmbeddedJSON] --> C[Story 3: Update extractWithStrategies]
|
|
B[Story 2: Fix extractFromDOM] --> C
|
|
C --> D[Story 5: Verify All Callers]
|
|
E[Story 4: Verify GraphQL] --> D
|
|
```
|
|
|
|
**Critical Path:** Stories 1, 2, 3 must be completed together as they form a cohesive parameter-passing chain.
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Testing
|
|
- Test each extraction function with and without `progressCallback` parameter
|
|
- Verify callback is invoked when thumbnail is extracted
|
|
- Verify no errors when callback is omitted
|
|
|
|
### Integration Testing
|
|
- Test full extraction flow with SSE endpoint
|
|
- Verify thumbnail progress events are emitted correctly
|
|
- Test all extraction methods (embedded-json, dom-selector, graphql, legacy)
|
|
|
|
### Manual Testing
|
|
1. Start dev server
|
|
2. Attempt to extract from Instagram URL
|
|
3. Verify no `ReferenceError: progressCallback is not defined` errors
|
|
4. Verify thumbnail progress events appear in SSE stream
|
|
5. Test retry logic still works correctly
|
|
|
|
## Risk Assessment
|
|
|
|
**Risk Level:** Low
|
|
**Impact:** High (currently breaks all extractions)
|
|
**Complexity:** Low (parameter passing fix)
|
|
|
|
### Potential Issues
|
|
1. **Breaking existing callers** - Mitigated by making parameter optional
|
|
2. **Missing other undefined references** - Mitigated by Story 5 verification
|
|
3. **Callback not propagating** - Mitigated by following the call chain
|
|
|
|
### Rollback Strategy
|
|
- Changes are additive (adding optional parameters)
|
|
- No breaking changes to function signatures
|
|
- Easy to revert if issues arise
|
|
|
|
## Implementation Checklist
|
|
|
|
- [ ] Story 1: Update `extractFromEmbeddedJSON` signature
|
|
- [ ] Story 2: Update `extractFromDOM` signature
|
|
- [ ] Story 3: Update `extractWithStrategies` to pass callbacks
|
|
- [ ] Story 4: Review and document `extractViaGraphQL`
|
|
- [ ] Story 5: Verify all `extractThumbnailStealth` callers
|
|
- [ ] Run extraction tests
|
|
- [ ] Manual testing with dev server
|
|
- [ ] Commit with descriptive message
|
|
|
|
## Success Criteria
|
|
|
|
1. ✅ No `ReferenceError: progressCallback is not defined` errors
|
|
2. ✅ All extraction methods work correctly
|
|
3. ✅ Thumbnail progress events are emitted via SSE
|
|
4. ✅ Retry logic continues to function
|
|
5. ✅ No regression in existing functionality
|
|
|
|
## Technical References
|
|
|
|
- **File:** [src/lib/server/extraction.ts](src/lib/server/extraction.ts)
|
|
- **Type Definition:** `ProgressCallback = (event: ProgressEvent) => void` (line 22)
|
|
- **SSE Integration:** [src/routes/api/extract-stream/+server.ts](src/routes/api/extract-stream/+server.ts)
|
|
|
|
## Notes
|
|
|
|
This is a straightforward parameter-passing bug introduced during a recent refactor where thumbnail extraction was enhanced with progress callbacks. The extraction functions were updated to call `extractThumbnailStealth` with a callback, but weren't updated to receive that callback as a parameter.
|
|
|
|
The fix maintains backward compatibility by making the parameter optional, allowing the functions to work with or without progress tracking.
|