Files
insta-recipe/docs/ARCHITECTURE.md
Giancarmine Salucci 49bccf8f15 simplify
2026-02-18 01:21:44 +01:00

455 lines
13 KiB
Markdown

# Architecture Documentation
**Last Updated:** 2026-02-15T00:00:00.000Z
**JIRA:** RECIPE-0001
---
## Overview
**Project Name:** InstaRecipe
**Type:** Progressive Web Application (PWA)
**Primary Language:** TypeScript
**Framework:** SvelteKit 2.x with Svelte 5
**Runtime:** Node.js 22+
### Purpose
A modern web application that extracts recipes from Instagram posts and saves them to Tandoor Recipe Manager using an async queue-based processing system.
---
## Project Structure
```
insta-recipe/
├── src/ # Source code
│ ├── lib/ # Library code
│ │ ├── client/ # Client-side modules
│ │ │ ├── PushNotificationManager.ts
│ │ │ ├── PWAInstallManager.ts
│ │ │ └── ServiceWorkerMessageHandler.ts
│ │ ├── server/ # Server-side modules
│ │ │ ├── api/ # API utilities (errors, handlers)
│ │ │ ├── browser.ts # Playwright browser management
│ │ │ ├── extraction.ts # Instagram content extraction
│ │ │ ├── llm.ts # LLM integration (OpenAI)
│ │ │ ├── notifications/ # Push notification service
│ │ │ ├── parser.ts # Recipe parsing with LLM
│ │ │ ├── prompts/ # LLM prompts
│ │ │ ├── queue/ # Queue management system
│ │ │ │ ├── QueueManager.ts
│ │ │ │ ├── QueueProcessor.ts
│ │ │ │ ├── config.ts
│ │ │ │ └── types.ts
│ │ │ ├── scheduler.ts # Background task scheduler
│ │ │ ├── tandoor.ts # Tandoor API integration
│ │ │ ├── tandoor-config.ts
│ │ │ └── validation/ # Input validation
│ │ ├── assets/ # Static assets
│ │ └── index.ts
│ ├── routes/ # SvelteKit routes
│ │ ├── api/ # API endpoints
│ │ │ ├── extract/ # Legacy extraction endpoint (deprecated)
│ │ │ ├── health/ # Health check
│ │ │ ├── llm-health/ # LLM health check
│ │ │ ├── notifications/ # Push notification endpoints
│ │ │ ├── queue/ # Queue management API
│ │ │ │ ├── [id]/ # Individual queue item operations
│ │ │ │ └── stream/ # SSE for real-time updates
│ │ │ ├── tandoor/ # Tandoor integration
│ │ │ ├── tandoor-config/
│ │ │ └── thumbnail/
│ │ ├── components/ # Shared components
│ │ ├── share/ # Share target page
│ │ │ └── components/ # Share-specific components
│ │ ├── +layout.svelte # Root layout
│ │ └── +page.svelte # Queue dashboard (home)
│ ├── tests/ # Test files
│ ├── app.d.ts # Type definitions
│ ├── app.html # HTML template
│ ├── app.server.ts # Server initialization
│ ├── hooks.server.ts # SvelteKit server hooks
│ └── service-worker.ts # Service worker for PWA
├── build/ # Build output
├── docs/ # Documentation
│ ├── plans/ # Implementation plans
│ └── outcomes/ # Implementation outcomes
├── scripts/ # Utility scripts
├── static/ # Static files
├── .ssl/ # SSL certificates (local dev)
├── docker-compose.yml # Docker configuration
├── Dockerfile # Container image
├── package.json # Dependencies
├── svelte.config.js # SvelteKit configuration
├── tsconfig.json # TypeScript configuration
└── vite.config.ts # Vite configuration
```
---
## Key Directories
### `/src/lib/server/`
Server-side business logic following Hexagonal Architecture principles. Contains domain logic, adapters for external systems (Instagram, Tandoor, LLM), and port definitions.
### `/src/lib/client/`
Client-side utilities for PWA features (push notifications, install prompts, service worker messaging).
### `/src/routes/api/`
RESTful API endpoints implemented as SvelteKit server routes. Each directory contains `+server.ts` files exporting HTTP verb handlers.
### `/src/routes/share/`
Share target page allowing users to share Instagram URLs directly from their browser or mobile apps.
### `/src/lib/server/queue/`
Queue management system with in-memory storage, processor workers, and type definitions.
### `/docs/`
Comprehensive documentation including plans, outcomes, API specs, and migration guides.
---
## Design Patterns
### Singleton Pattern
Used for shared service instances:
- `QueueManager` (`queueManager` exported instance)
- `QueueProcessor` (`queueProcessor` exported instance)
- `PushNotificationService` (`pushNotificationService` exported instance)
- `ServiceWorkerMessageHandler` (`serviceWorkerMessageHandler` exported instance)
### Factory Pattern
Used for creating configured instances:
- `createLLM()` - Creates OpenAI client with environment configuration
- `createBrowserContext()` - Creates Playwright browser context with options
- `initializeBrowser()` - Initializes Chromium browser instance
### Observer Pattern
Implemented in QueueManager for real-time updates:
- Subscribers receive notifications on queue item changes
- Server-Sent Events (SSE) stream queue updates to clients
- Push notifications notify users of completion events
### Adapter Pattern (Hexagonal Architecture)
External systems accessed via adapters:
- **Instagram Adapter**: `extraction.ts` - Extracts content via Playwright
- **LLM Adapter**: `llm.ts`, `parser.ts` - Recipe parsing via OpenAI
- **Tandoor Adapter**: `tandoor.ts` - Recipe management system integration
- **Browser Adapter**: `browser.ts` - Playwright browser automation
### Strategy Pattern
Multiple extraction strategies with fallback:
1. Embedded JSON extraction
2. DOM selector extraction
3. GraphQL API extraction
4. Legacy extraction method
---
## Key Components
### Queue Management System
**Location**: `src/lib/server/queue/`
Three-phase processing pipeline:
1. **Extraction Phase**: Extract text and thumbnail from Instagram
2. **Parsing Phase**: Parse recipe using LLM
3. **Uploading Phase**: Upload to Tandoor (if enabled)
**Components**:
- `QueueManager`: In-memory FIFO queue with CRUD operations
- `QueueProcessor`: Worker that processes items with configurable concurrency
- `types.ts`: Comprehensive type definitions for queue items and updates
### API Layer
**Location**: `src/routes/api/`
RESTful endpoints for:
- Queue operations (`POST /api/queue`, `GET /api/queue`, `GET /api/queue/[id]`)
- Real-time updates (`GET /api/queue/stream` - SSE)
- Push notifications (`POST /api/notifications/subscribe`)
- Health checks (`GET /api/health`, `GET /api/llm-health`)
### Client-Side Services
**Location**: `src/lib/client/`
- **PushNotificationManager**: Manages Web Push API subscriptions
- **PWAInstallManager**: Handles PWA install prompts
- **ServiceWorkerMessageHandler**: Processes service worker messages
### Instagram Extraction
**Location**: `src/lib/server/extraction.ts`
Multi-method extraction with intelligent fallback:
- Progress callbacks for real-time feedback
- Retry logic with configurable attempts
- Thumbnail extraction and validation
### LLM Integration
**Location**: `src/lib/server/parser.ts`, `src/lib/server/llm.ts`
- Recipe detection endpoint
- Structured extraction using OpenAI with Zod schemas
- Configurable model and temperature settings
---
## Dependencies
### Production Dependencies
- **@types/uuid** (^10.0.0) - UUID type definitions
- **date-fns** (^4.1.0) - Date utility library
- **openai** (^4.20.0) - OpenAI API client
- **playwright** (^1.56.1) - Browser automation
- **uuid** (^13.0.0) - Unique ID generation
- **zod** (^3.23.0) - Schema validation
### Development Dependencies
- **@sveltejs/kit** (^2.48.5) - SvelteKit framework
- **@sveltejs/adapter-node** (^5.4.0) - Node.js adapter
- **svelte** (^5.43.8) - Svelte 5 framework
- **typescript** (^5.9.3) - TypeScript compiler
- **vite** (^6.0.0) - Build tool
- **vitest** (^4.0.10) - Testing framework
- **@vitest/browser-playwright** (^4.0.10) - Browser testing
- **tailwindcss** (^4.1.17) - CSS framework
- **eslint** (^9.39.1) - Linting
- **prettier** (^3.6.2) - Code formatting
- **typescript-eslint** (^8.47.0) - TypeScript ESLint
---
## Module Organization
### SvelteKit Path Aliases
- `$lib``src/lib/`
- `$lib/*``src/lib/*`
- `$app/*` → SvelteKit app imports
- `$env/dynamic/private` → Environment variables (server-side)
### Directory Structure Conventions
- **Server-only code**: `src/lib/server/` (not bundled to client)
- **Client-only code**: `src/lib/client/` (not executed on server)
- **Shared code**: `src/lib/` (available to both)
- **Routes**: `src/routes/` (file-based routing)
- **Tests**: Colocated with source files (`*.spec.ts`, `*.test.ts`)
---
## Data Flow
### Recipe Extraction Flow
```
User submits URL
POST /api/queue
QueueManager.enqueue()
QueueProcessor picks up item
Phase 1: extractTextAndThumbnail()
Phase 2: extractRecipe() (LLM)
Phase 3: uploadRecipeWithIngredientsDTO() (Tandoor)
Push notification sent
SSE updates notify client
```
### Real-time Updates Flow
```
Client connects to GET /api/queue/stream (SSE)
QueueManager.subscribe(callback)
Queue item changes trigger callback
SSE sends event to client
Client updates UI reactively
```
### Push Notification Flow
```
Client requests permission
POST /api/notifications/subscribe (with subscription)
PushNotificationService stores subscription
Queue item completes
PushNotificationService.sendNotification()
Service worker receives push event
Notification displayed to user
```
---
## Build System
### Build Command
```bash
npm run build
```
Generates production-ready build in `build/` directory using:
- Vite for bundling
- `@sveltejs/adapter-node` for Node.js deployment
- TypeScript compilation
- SvelteKit prerendering and optimization
### Test Command
```bash
npm test
```
Runs test suite using Vitest with two projects:
1. **Server tests**: Node environment for server-side code
2. **Client tests**: Playwright browser for Svelte components
### Development Server
```bash
npm run dev
```
Starts Vite dev server with:
- HTTPS enabled (certificates in `.ssl/`)
- Hot module replacement
- TypeScript checking
- File watching
### Linting & Formatting
```bash
npm run lint # ESLint + Prettier check
npm run format # Prettier write
```
---
## Deployment
### Docker Deployment
Dockerfile includes:
- Node.js 22 Alpine base image
- Playwright Chromium installation
- Production build
- Port 3000 exposure
Run with:
```bash
docker-compose up
```
### Environment Variables
Required configuration:
- `OPENAI_API_KEY` - LLM API access
- `TANDOOR_URL` - Tandoor instance URL (optional)
- `TANDOOR_TOKEN` - Tandoor API token (optional)
- `QUEUE_CONCURRENCY` - Concurrent processing limit (default: 2)
- `QUEUE_MAX_RETRIES` - Failed item retry limit (default: 3)
---
## Testing Architecture
### Test Categories
1. **Unit Tests**: Individual function testing
2. **Integration Tests**: Multi-component workflows
3. **API Tests**: Endpoint behavior validation
4. **Browser Tests**: Svelte component rendering
### Test Coverage
138 tests covering:
- Queue management operations
- Instagram URL validation
- SSE streaming
- API endpoints
- Scheduler functionality
- Notification service
### Test Configuration
- **Server tests**: Node environment with mocked dependencies
- **Client tests**: Playwright Chromium browser with Svelte testing library
---
## Security Considerations
### SSL/TLS
- Development uses local SSL certificates signed by external Caddy CA
- Certificates stored in `.ssl/` (git-ignored)
- Required for PWA features (Service Worker, Push API)
### Authentication
- Basic auth for scheduled tasks (username/password from environment)
- Tandoor integration uses bearer token authentication
### Input Validation
- Instagram URL validation with regex patterns
- Zod schema validation for API payloads
- Error handling with custom error classes
---
**Document Version:** 1.0
**Generated by:** Initializer Agent
**Next Review:** As needed for architectural changes