Files
insta-recipe/docs/plans/FixProgressCallbackUndefinedErrors.md
Giancarmine Salucci 2de5567682 fix(extraction): resolve progressCallback undefined errors
- Add progressCallback parameter to extractFromEmbeddedJSON and extractFromDOM
- Pass onProgress callback from extractWithStrategies to all strategies
- Fix legacy strategy to use correct callback variable name
- Verify extractViaGraphQL correctly returns null thumbnail

This fixes ReferenceError that was preventing all extraction methods from working.
All extraction strategies now properly emit thumbnail progress events via SSE.

Closes: FixProgressCallbackUndefinedErrors
2025-12-21 04:28:07 +01:00

9.2 KiB

Execution Plan: Fix ProgressCallback Undefined Errors

Overview

Outcome Name: FixProgressCallbackUndefinedErrors
Created: 2025-12-21
Status: Ready for Implementation

Problem Analysis

The extraction system is failing with ReferenceError: progressCallback is not defined at multiple locations in the extraction pipeline. The root cause is that several extraction methods are calling extractThumbnailStealth(page, progressCallback) with a progressCallback variable that doesn't exist in their scope.

Affected Locations

  1. Line 224 - extractFromEmbeddedJSON() function
  2. Line 239 - extractFromEmbeddedJSON() function
  3. Line 346 - extractFromDOM() function
  4. Line 459 - Legacy extraction strategy in extractWithStrategies()

Current Function Signatures

// These functions don't have progressCallback parameter
async function extractFromEmbeddedJSON(page: Page): Promise<ExtractedContent | null>
async function extractFromDOM(page: Page): Promise<ExtractedContent | null>

// But they call this function with progressCallback
async function extractThumbnailStealth(
  page: Page,
  progressCallback?: ProgressCallback
): Promise<string | null>

// extractWithStrategies has the callback but doesn't pass it down
async function extractWithStrategies(
  url: string,
  page: Page,
  context: BrowserContext,
  onProgress?: ProgressCallback
): Promise<ExtractionResult>

Root Cause

The extractWithStrategies() function receives an onProgress callback parameter but does NOT pass it to the individual extraction method functions (extractFromEmbeddedJSON, extractFromDOM). These functions then try to use an undefined progressCallback variable when calling extractThumbnailStealth().

Stories

Story 1: Add ProgressCallback Parameter to extractFromEmbeddedJSON

Priority: High
Acceptance Criteria:

  • Function signature updated to accept optional progressCallback?: ProgressCallback
  • All calls to extractThumbnailStealth() within the function use the parameter
  • No references to undefined variables

Implementation Steps:

  1. Update function signature:

    async function extractFromEmbeddedJSON(
      page: Page,
      progressCallback?: ProgressCallback
    ): Promise<ExtractedContent | null>
    
  2. Ensure both calls to extractThumbnailStealth (lines ~224 and ~239) use the parameter correctly (already doing this, just need to receive it)

Technical Notes:

  • The function already correctly passes progressCallback to extractThumbnailStealth
  • Just needs to receive it as a parameter instead of referencing undefined variable

Story 2: Add ProgressCallback Parameter to extractFromDOM

Priority: High
Acceptance Criteria:

  • Function signature updated to accept optional progressCallback?: ProgressCallback
  • Call to extractThumbnailStealth() at line ~346 uses the parameter
  • No references to undefined variables

Implementation Steps:

  1. Update function signature:

    async function extractFromDOM(
      page: Page,
      progressCallback?: ProgressCallback
    ): Promise<ExtractedContent | null>
    
  2. The call to extractThumbnailStealth at line ~346 already uses the parameter correctly

Technical Notes:

  • Single call to extractThumbnailStealth in this function
  • Already written to use progressCallback, just needs to receive it

Story 3: Update extractWithStrategies to Pass Callback to Strategy Functions

Priority: High
Acceptance Criteria:

  • All strategy functions receive the onProgress callback
  • Legacy strategy inline function updated to use parameter correctly
  • No undefined variable references in any strategy

Implementation Steps:

  1. Update the strategies array to pass onProgress to each function:

    const strategies: Array<{
      name: ExtractionMethod;
      fn: () => Promise<ExtractedContent | null>;
    }> = [
      {
        name: 'embedded-json',
        fn: () => extractFromEmbeddedJSON(page, onProgress)
      },
      {
        name: 'dom-selector',
        fn: () => extractFromDOM(page, onProgress)
      },
      {
        name: 'graphql-api',
        fn: () => extractViaGraphQL(url, context)
      },
      {
        name: 'legacy',
        fn: async () => {
          const text = await extractCleanTextLegacy(page);
          const thumbnail = await extractThumbnailStealth(page, onProgress);
          return { bodyText: text, thumbnail };
        }
      }
    ];
    
  2. Verify the legacy strategy's inline function uses onProgress instead of progressCallback

Technical Notes:

  • The legacy strategy is defined inline at line ~459
  • Currently references undefined progressCallback
  • Should use onProgress from the parent function's scope
  • GraphQL strategy doesn't extract thumbnails so doesn't need the callback

Story 4: Verify extractViaGraphQL Doesn't Need Callback

Priority: Medium
Acceptance Criteria:

  • Confirm extractViaGraphQL doesn't extract thumbnails
  • Ensure it doesn't have undefined variable references
  • Document if thumbnail extraction should be added in the future

Implementation Steps:

  1. Review extractViaGraphQL function (lines ~360-410)
  2. Confirm it only extracts text, not thumbnails
  3. Add code comment if thumbnail extraction could be beneficial

Technical Notes:

  • Currently returns ExtractedContent but likely with thumbnail: null
  • May need enhancement in future to extract thumbnail via GraphQL data
  • Not urgent for this fix since it doesn't reference progressCallback

Story 5: Verify All Other Functions Using extractThumbnailStealth

Priority: Low
Acceptance Criteria:

  • Search entire codebase for extractThumbnailStealth calls
  • Ensure all callers properly pass or omit the optional parameter
  • No other undefined variable references exist

Implementation Steps:

  1. Use grep/search to find all calls to extractThumbnailStealth
  2. Verify each call either:
    • Passes a valid ProgressCallback parameter, or
    • Intentionally omits it (relying on optional parameter)
  3. Document any findings

Technical Notes:

  • The parameter is optional, so omitting it is valid
  • But passing an undefined variable is not valid
  • Found 5 calls total in extraction.ts (4 problematic, 1 direct call)

Dependencies

graph TD
    A[Story 1: Fix extractFromEmbeddedJSON] --> C[Story 3: Update extractWithStrategies]
    B[Story 2: Fix extractFromDOM] --> C
    C --> D[Story 5: Verify All Callers]
    E[Story 4: Verify GraphQL] --> D

Critical Path: Stories 1, 2, 3 must be completed together as they form a cohesive parameter-passing chain.

Testing Strategy

Unit Testing

  • Test each extraction function with and without progressCallback parameter
  • Verify callback is invoked when thumbnail is extracted
  • Verify no errors when callback is omitted

Integration Testing

  • Test full extraction flow with SSE endpoint
  • Verify thumbnail progress events are emitted correctly
  • Test all extraction methods (embedded-json, dom-selector, graphql, legacy)

Manual Testing

  1. Start dev server
  2. Attempt to extract from Instagram URL
  3. Verify no ReferenceError: progressCallback is not defined errors
  4. Verify thumbnail progress events appear in SSE stream
  5. Test retry logic still works correctly

Risk Assessment

Risk Level: Low
Impact: High (currently breaks all extractions)
Complexity: Low (parameter passing fix)

Potential Issues

  1. Breaking existing callers - Mitigated by making parameter optional
  2. Missing other undefined references - Mitigated by Story 5 verification
  3. Callback not propagating - Mitigated by following the call chain

Rollback Strategy

  • Changes are additive (adding optional parameters)
  • No breaking changes to function signatures
  • Easy to revert if issues arise

Implementation Checklist

  • Story 1: Update extractFromEmbeddedJSON signature
  • Story 2: Update extractFromDOM signature
  • Story 3: Update extractWithStrategies to pass callbacks
  • Story 4: Review and document extractViaGraphQL
  • Story 5: Verify all extractThumbnailStealth callers
  • Run extraction tests
  • Manual testing with dev server
  • Commit with descriptive message

Success Criteria

  1. No ReferenceError: progressCallback is not defined errors
  2. All extraction methods work correctly
  3. Thumbnail progress events are emitted via SSE
  4. Retry logic continues to function
  5. No regression in existing functionality

Technical References

Notes

This is a straightforward parameter-passing bug introduced during a recent refactor where thumbnail extraction was enhanced with progress callbacks. The extraction functions were updated to call extractThumbnailStealth with a callback, but weren't updated to receive that callback as a parameter.

The fix maintains backward compatibility by making the parameter optional, allowing the functions to work with or without progress tracking.