fix(extraction): resolve progressCallback undefined errors
- Add progressCallback parameter to extractFromEmbeddedJSON and extractFromDOM - Pass onProgress callback from extractWithStrategies to all strategies - Fix legacy strategy to use correct callback variable name - Verify extractViaGraphQL correctly returns null thumbnail This fixes ReferenceError that was preventing all extraction methods from working. All extraction strategies now properly emit thumbnail progress events via SSE. Closes: FixProgressCallbackUndefinedErrors
This commit is contained in:
256
docs/plans/FixProgressCallbackUndefinedErrors.md
Normal file
256
docs/plans/FixProgressCallbackUndefinedErrors.md
Normal file
@@ -0,0 +1,256 @@
|
||||
# Execution Plan: Fix ProgressCallback Undefined Errors
|
||||
|
||||
## Overview
|
||||
**Outcome Name:** FixProgressCallbackUndefinedErrors
|
||||
**Created:** 2025-12-21
|
||||
**Status:** Ready for Implementation
|
||||
|
||||
## Problem Analysis
|
||||
|
||||
The extraction system is failing with `ReferenceError: progressCallback is not defined` at multiple locations in the extraction pipeline. The root cause is that several extraction methods are calling `extractThumbnailStealth(page, progressCallback)` with a `progressCallback` variable that doesn't exist in their scope.
|
||||
|
||||
### Affected Locations
|
||||
1. **Line 224** - `extractFromEmbeddedJSON()` function
|
||||
2. **Line 239** - `extractFromEmbeddedJSON()` function
|
||||
3. **Line 346** - `extractFromDOM()` function
|
||||
4. **Line 459** - Legacy extraction strategy in `extractWithStrategies()`
|
||||
|
||||
### Current Function Signatures
|
||||
```typescript
|
||||
// These functions don't have progressCallback parameter
|
||||
async function extractFromEmbeddedJSON(page: Page): Promise<ExtractedContent | null>
|
||||
async function extractFromDOM(page: Page): Promise<ExtractedContent | null>
|
||||
|
||||
// But they call this function with progressCallback
|
||||
async function extractThumbnailStealth(
|
||||
page: Page,
|
||||
progressCallback?: ProgressCallback
|
||||
): Promise<string | null>
|
||||
|
||||
// extractWithStrategies has the callback but doesn't pass it down
|
||||
async function extractWithStrategies(
|
||||
url: string,
|
||||
page: Page,
|
||||
context: BrowserContext,
|
||||
onProgress?: ProgressCallback
|
||||
): Promise<ExtractionResult>
|
||||
```
|
||||
|
||||
## Root Cause
|
||||
|
||||
The `extractWithStrategies()` function receives an `onProgress` callback parameter but does NOT pass it to the individual extraction method functions (`extractFromEmbeddedJSON`, `extractFromDOM`). These functions then try to use an undefined `progressCallback` variable when calling `extractThumbnailStealth()`.
|
||||
|
||||
## Stories
|
||||
|
||||
### Story 1: Add ProgressCallback Parameter to extractFromEmbeddedJSON
|
||||
**Priority:** High
|
||||
**Acceptance Criteria:**
|
||||
- Function signature updated to accept optional `progressCallback?: ProgressCallback`
|
||||
- All calls to `extractThumbnailStealth()` within the function use the parameter
|
||||
- No references to undefined variables
|
||||
|
||||
**Implementation Steps:**
|
||||
1. Update function signature:
|
||||
```typescript
|
||||
async function extractFromEmbeddedJSON(
|
||||
page: Page,
|
||||
progressCallback?: ProgressCallback
|
||||
): Promise<ExtractedContent | null>
|
||||
```
|
||||
|
||||
2. Ensure both calls to `extractThumbnailStealth` (lines ~224 and ~239) use the parameter correctly (already doing this, just need to receive it)
|
||||
|
||||
**Technical Notes:**
|
||||
- The function already correctly passes `progressCallback` to `extractThumbnailStealth`
|
||||
- Just needs to receive it as a parameter instead of referencing undefined variable
|
||||
|
||||
---
|
||||
|
||||
### Story 2: Add ProgressCallback Parameter to extractFromDOM
|
||||
**Priority:** High
|
||||
**Acceptance Criteria:**
|
||||
- Function signature updated to accept optional `progressCallback?: ProgressCallback`
|
||||
- Call to `extractThumbnailStealth()` at line ~346 uses the parameter
|
||||
- No references to undefined variables
|
||||
|
||||
**Implementation Steps:**
|
||||
1. Update function signature:
|
||||
```typescript
|
||||
async function extractFromDOM(
|
||||
page: Page,
|
||||
progressCallback?: ProgressCallback
|
||||
): Promise<ExtractedContent | null>
|
||||
```
|
||||
|
||||
2. The call to `extractThumbnailStealth` at line ~346 already uses the parameter correctly
|
||||
|
||||
**Technical Notes:**
|
||||
- Single call to `extractThumbnailStealth` in this function
|
||||
- Already written to use `progressCallback`, just needs to receive it
|
||||
|
||||
---
|
||||
|
||||
### Story 3: Update extractWithStrategies to Pass Callback to Strategy Functions
|
||||
**Priority:** High
|
||||
**Acceptance Criteria:**
|
||||
- All strategy functions receive the `onProgress` callback
|
||||
- Legacy strategy inline function updated to use parameter correctly
|
||||
- No undefined variable references in any strategy
|
||||
|
||||
**Implementation Steps:**
|
||||
1. Update the strategies array to pass `onProgress` to each function:
|
||||
```typescript
|
||||
const strategies: Array<{
|
||||
name: ExtractionMethod;
|
||||
fn: () => Promise<ExtractedContent | null>;
|
||||
}> = [
|
||||
{
|
||||
name: 'embedded-json',
|
||||
fn: () => extractFromEmbeddedJSON(page, onProgress)
|
||||
},
|
||||
{
|
||||
name: 'dom-selector',
|
||||
fn: () => extractFromDOM(page, onProgress)
|
||||
},
|
||||
{
|
||||
name: 'graphql-api',
|
||||
fn: () => extractViaGraphQL(url, context)
|
||||
},
|
||||
{
|
||||
name: 'legacy',
|
||||
fn: async () => {
|
||||
const text = await extractCleanTextLegacy(page);
|
||||
const thumbnail = await extractThumbnailStealth(page, onProgress);
|
||||
return { bodyText: text, thumbnail };
|
||||
}
|
||||
}
|
||||
];
|
||||
```
|
||||
|
||||
2. Verify the legacy strategy's inline function uses `onProgress` instead of `progressCallback`
|
||||
|
||||
**Technical Notes:**
|
||||
- The legacy strategy is defined inline at line ~459
|
||||
- Currently references undefined `progressCallback`
|
||||
- Should use `onProgress` from the parent function's scope
|
||||
- GraphQL strategy doesn't extract thumbnails so doesn't need the callback
|
||||
|
||||
---
|
||||
|
||||
### Story 4: Verify extractViaGraphQL Doesn't Need Callback
|
||||
**Priority:** Medium
|
||||
**Acceptance Criteria:**
|
||||
- Confirm `extractViaGraphQL` doesn't extract thumbnails
|
||||
- Ensure it doesn't have undefined variable references
|
||||
- Document if thumbnail extraction should be added in the future
|
||||
|
||||
**Implementation Steps:**
|
||||
1. Review `extractViaGraphQL` function (lines ~360-410)
|
||||
2. Confirm it only extracts text, not thumbnails
|
||||
3. Add code comment if thumbnail extraction could be beneficial
|
||||
|
||||
**Technical Notes:**
|
||||
- Currently returns `ExtractedContent` but likely with `thumbnail: null`
|
||||
- May need enhancement in future to extract thumbnail via GraphQL data
|
||||
- Not urgent for this fix since it doesn't reference `progressCallback`
|
||||
|
||||
---
|
||||
|
||||
### Story 5: Verify All Other Functions Using extractThumbnailStealth
|
||||
**Priority:** Low
|
||||
**Acceptance Criteria:**
|
||||
- Search entire codebase for `extractThumbnailStealth` calls
|
||||
- Ensure all callers properly pass or omit the optional parameter
|
||||
- No other undefined variable references exist
|
||||
|
||||
**Implementation Steps:**
|
||||
1. Use grep/search to find all calls to `extractThumbnailStealth`
|
||||
2. Verify each call either:
|
||||
- Passes a valid `ProgressCallback` parameter, or
|
||||
- Intentionally omits it (relying on optional parameter)
|
||||
3. Document any findings
|
||||
|
||||
**Technical Notes:**
|
||||
- The parameter is optional, so omitting it is valid
|
||||
- But passing an undefined variable is not valid
|
||||
- Found 5 calls total in extraction.ts (4 problematic, 1 direct call)
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
A[Story 1: Fix extractFromEmbeddedJSON] --> C[Story 3: Update extractWithStrategies]
|
||||
B[Story 2: Fix extractFromDOM] --> C
|
||||
C --> D[Story 5: Verify All Callers]
|
||||
E[Story 4: Verify GraphQL] --> D
|
||||
```
|
||||
|
||||
**Critical Path:** Stories 1, 2, 3 must be completed together as they form a cohesive parameter-passing chain.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Testing
|
||||
- Test each extraction function with and without `progressCallback` parameter
|
||||
- Verify callback is invoked when thumbnail is extracted
|
||||
- Verify no errors when callback is omitted
|
||||
|
||||
### Integration Testing
|
||||
- Test full extraction flow with SSE endpoint
|
||||
- Verify thumbnail progress events are emitted correctly
|
||||
- Test all extraction methods (embedded-json, dom-selector, graphql, legacy)
|
||||
|
||||
### Manual Testing
|
||||
1. Start dev server
|
||||
2. Attempt to extract from Instagram URL
|
||||
3. Verify no `ReferenceError: progressCallback is not defined` errors
|
||||
4. Verify thumbnail progress events appear in SSE stream
|
||||
5. Test retry logic still works correctly
|
||||
|
||||
## Risk Assessment
|
||||
|
||||
**Risk Level:** Low
|
||||
**Impact:** High (currently breaks all extractions)
|
||||
**Complexity:** Low (parameter passing fix)
|
||||
|
||||
### Potential Issues
|
||||
1. **Breaking existing callers** - Mitigated by making parameter optional
|
||||
2. **Missing other undefined references** - Mitigated by Story 5 verification
|
||||
3. **Callback not propagating** - Mitigated by following the call chain
|
||||
|
||||
### Rollback Strategy
|
||||
- Changes are additive (adding optional parameters)
|
||||
- No breaking changes to function signatures
|
||||
- Easy to revert if issues arise
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
- [ ] Story 1: Update `extractFromEmbeddedJSON` signature
|
||||
- [ ] Story 2: Update `extractFromDOM` signature
|
||||
- [ ] Story 3: Update `extractWithStrategies` to pass callbacks
|
||||
- [ ] Story 4: Review and document `extractViaGraphQL`
|
||||
- [ ] Story 5: Verify all `extractThumbnailStealth` callers
|
||||
- [ ] Run extraction tests
|
||||
- [ ] Manual testing with dev server
|
||||
- [ ] Commit with descriptive message
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. ✅ No `ReferenceError: progressCallback is not defined` errors
|
||||
2. ✅ All extraction methods work correctly
|
||||
3. ✅ Thumbnail progress events are emitted via SSE
|
||||
4. ✅ Retry logic continues to function
|
||||
5. ✅ No regression in existing functionality
|
||||
|
||||
## Technical References
|
||||
|
||||
- **File:** [src/lib/server/extraction.ts](src/lib/server/extraction.ts)
|
||||
- **Type Definition:** `ProgressCallback = (event: ProgressEvent) => void` (line 22)
|
||||
- **SSE Integration:** [src/routes/api/extract-stream/+server.ts](src/routes/api/extract-stream/+server.ts)
|
||||
|
||||
## Notes
|
||||
|
||||
This is a straightforward parameter-passing bug introduced during a recent refactor where thumbnail extraction was enhanced with progress callbacks. The extraction functions were updated to call `extractThumbnailStealth` with a callback, but weren't updated to receive that callback as a parameter.
|
||||
|
||||
The fix maintains backward compatibility by making the parameter optional, allowing the functions to work with or without progress tracking.
|
||||
Reference in New Issue
Block a user