Files
insta-recipe/README.md
Giancarmine Salucci 49bccf8f15 simplify
2026-02-18 01:21:44 +01:00

407 lines
11 KiB
Markdown

# InstaRecipe - Async Instagram Recipe Extractor
A modern web application that extracts recipes from Instagram posts and saves them to Tandoor Recipe Manager using an async queue-based processing system.
## 🚀 Features
### Core Functionality
- **Async Queue Processing**: Fire-and-forget recipe extraction with background processing
- **Real-time Updates**: Server-Sent Events for live progress tracking
- **Push Notifications**: Background notifications when recipes complete
- **Instagram Integration**: Extract recipes from Instagram posts and stories
- **Tandoor Integration**: Automatic upload to Tandoor Recipe Manager
- **PWA Support**: Installable Progressive Web App with offline capabilities
### User Experience
- **Queue Dashboard**: Monitor all recipe extractions in real-time
- **Share Integration**: Browser share target for easy URL submission
- **Responsive Design**: Works on desktop, tablet, and mobile
- **Error Recovery**: Retry failed extractions with one click
- **Progress Tracking**: Visual progress through extraction phases
### Technical Architecture
- **SvelteKit Frontend**: Modern reactive UI with TypeScript
- **Hexagonal Architecture**: Clean separation of concerns
- **In-Memory Queue**: High-performance processing with configurable concurrency
- **Three-Phase Pipeline**: Extraction → Parsing → Uploading
- **Comprehensive Testing**: 138 tests covering all components
## 📋 API Endpoints
### Queue Management
- `POST /api/queue` - Enqueue Instagram URL for processing
- `GET /api/queue` - List queue items with filtering and pagination
- `GET /api/queue/{id}` - Get specific queue item details
- `POST /api/queue/{id}/retry` - Retry failed item
- `GET /api/queue/stream` - Server-Sent Events for real-time updates
### Push Notifications
- `POST /api/notifications/subscribe` - Subscribe to push notifications
- `DELETE /api/notifications/subscribe` - Unsubscribe from notifications
- `GET /api/notifications/vapid-key` - Get VAPID public key
### Legacy Endpoints (Deprecated)
- ~~`POST /api/extract`~~ - Use `/api/queue` instead
- ~~`GET /api/extract-stream`~~ - Use `/api/queue/stream` instead
## 🛠 Development Setup
### Prerequisites
- Node.js 18+
- npm or pnpm
- Tandoor Recipe Manager instance (optional)
- LLM API access (OpenAI, Anthropic, or local)
### Installation
```bash
# Clone the repository
git clone <repository-url>
cd insta-recipe
# Install dependencies
npm install
# Copy environment template
cp .env.example .env
# Configure your environment variables (see Configuration section)
```
### Local Development
```bash
# Start development server with HTTPS
npm run dev
# Open in browser (certificate must be trusted)
open https://localhost:5173
```
The app runs on HTTPS by default for:
- Service worker support (required for PWA)
- Push notifications
- Browser share target API
- Instagram cookie handling
### SSL Certificate Setup
The application uses HTTPS in development with SSL certificates signed by an external Caddy CA container. The current certificate is valid until **December 20, 2035** (10 years).
**Certificate Information:**
- Location: `.ssl/` directory
- CA Certificate: `.ssl/root.crt` (already trusted on the system)
- Server Certificate: `.ssl/localhost.crt`
- Server Private Key: `.ssl/localhost.key`
Since the Caddy CA is already trusted on the system, the certificate should work without additional trust steps. If you encounter browser warnings:
**Linux (Ubuntu/Debian):**
```bash
sudo cp .ssl/root.crt /usr/local/share/ca-certificates/caddy-local.crt
sudo update-ca-certificates
```
**Chrome/Chromium:**
1. Go to `chrome://settings/certificates`
2. Click "Authorities" → "Import"
3. Select `.ssl/root.crt`
4. Check "Trust this certificate for identifying websites"
**Checking Certificate Expiration:**
```bash
openssl x509 -enddate -noout -in .ssl/localhost.crt
```
**Regenerating the Certificate (if needed):**
If the certificate expires or needs to be regenerated:
```bash
# Identify the Caddy container (usually named caddy-local)
CADDY_CONTAINER="caddy-local"
# Copy Caddy's CA certificate and private key
docker cp $CADDY_CONTAINER:/data/caddy/pki/authorities/local/root.crt .ssl/root.crt
docker cp $CADDY_CONTAINER:/data/caddy/pki/authorities/local/root.key .ssl/caddy-ca.key
# Generate new server private key
openssl genrsa -out .ssl/localhost.key 2048
# Generate Certificate Signing Request (CSR)
openssl req -new \
-key .ssl/localhost.key \
-out .ssl/localhost.csr \
-subj "/O=Caddy Local Authority/CN=localhost"
# Create OpenSSL config for Subject Alternative Names
cat > .ssl/localhost.ext << 'EOF'
authorityKeyIdentifier=keyid,issuer
basicConstraints=CA:FALSE
keyUsage = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names
[alt_names]
DNS.1 = localhost
DNS.2 = *.localhost
IP.1 = 127.0.0.1
IP.2 = ::1
EOF
# Sign certificate with Caddy's CA (10 years = 3650 days)
openssl x509 -req \
-in .ssl/localhost.csr \
-CA .ssl/root.crt \
-CAkey .ssl/caddy-ca.key \
-CAcreateserial \
-out .ssl/localhost.crt \
-days 3650 \
-sha256 \
-extfile .ssl/localhost.ext
# Cleanup temporary files and set permissions
rm .ssl/localhost.csr .ssl/localhost.ext .ssl/caddy-ca.key .ssl/root.srl
chmod 600 .ssl/localhost.key
chmod 644 .ssl/localhost.crt .ssl/root.crt
# Verify the certificate
openssl verify -CAfile .ssl/root.crt .ssl/localhost.crt
```
## ⚙️ Configuration
### Environment Variables
Create a `.env` file with the following variables:
```env
# LLM Configuration
LLM_API_BASE_URL=https://api.openai.com/v1
LLM_API_KEY=your-api-key
LLM_MODEL=gpt-4o-mini
# Tandoor Integration (optional)
TANDOOR_BASE_URL=https://your-tandoor.com
TANDOOR_API_KEY=your-tandoor-token
# Queue Processing
QUEUE_CONCURRENCY=2
QUEUE_TIMEOUT_MS=30000
# Push Notifications (optional)
VAPID_PUBLIC_KEY=your-vapid-public-key
VAPID_PRIVATE_KEY=your-vapid-private-key
# Instagram Authentication (optional)
AUTH_SCHEDULER_ENABLED=true
AUTH_SCHEDULER_INTERVAL_MINUTES=720
```
### Tandoor Setup
To automatically upload extracted recipes to Tandoor:
1. Create an API token in your Tandoor instance
2. Set `TANDOOR_BASE_URL` and `TANDOOR_API_KEY` in `.env`
3. Recipes will be automatically uploaded after successful extraction
### Push Notifications
To enable web push notifications:
1. Generate VAPID keys:
```bash
npx web-push generate-vapid-keys
```
2. Set `VAPID_PUBLIC_KEY` and `VAPID_PRIVATE_KEY` in `.env`
3. Users can enable notifications in the dashboard settings
## 🏗 Architecture Overview
### Queue System
```
User submits URL → Queue Manager → Queue Processor
Extraction Phase ← → Parsing Phase ← → Upload Phase
Push Notifications ← → SSE Updates ← → Dashboard Updates
```
### Processing Pipeline
1. **Extraction Phase**: Browser automation extracts text and images
2. **Parsing Phase**: LLM converts text to structured recipe data
3. **Upload Phase**: Automatic upload to Tandoor (if configured)
Each phase tracks progress and can fail independently with proper error handling.
### Error Classification
- **Recoverable Errors** (`unhealthy`): Temporary issues, can be retried
- **Non-recoverable Errors** (`error`): Invalid URLs, parsing failures, etc.
## 🧪 Testing
```bash
# Run all tests
npm test
# Run specific test suites
npm run test:unit # Unit tests only
npm run test:client # Browser tests only
npm run test:server # Server tests only
# Run tests in watch mode
npm run test:watch
```
Test Coverage:
- **138 total tests** covering all major components
- Queue Manager: 28 tests
- Queue Processor: 5 integration tests
- API Endpoints: 17 tests
- SSE Streaming: 6 tests
- Frontend Components: Browser tests
## 📦 Building & Deployment
### Production Build
```bash
# Build for production
npm run build
# Preview production build locally
npm run preview
```
### Deployment
The app is built as a Node.js application with the following outputs:
- `/.svelte-kit/output/server/` - Server bundle
- `/.svelte-kit/output/client/` - Static assets
- `/build/` - Adapter output
Deploy the server bundle with:
```bash
node build/index.js
```
### Docker Deployment
```dockerfile
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY build ./build
EXPOSE 3000
CMD ["node", "build"]
```
## 🔄 Migration from Synchronous System
### What Changed
The app was migrated from a synchronous extraction system to an async queue-based system:
**Before (Synchronous)**:
- User waited for entire extraction process to complete
- No progress tracking during processing
- No retry capability for failures
- Single-threaded processing
- Limited error handling
**After (Async Queue)**:
- Fire-and-forget: submit URL and redirect immediately
- Real-time progress tracking via SSE
- Comprehensive retry system for failures
- Concurrent processing (configurable)
- Detailed error classification and reporting
- Push notifications for background updates
### API Migration
**Old Synchronous Endpoints** (Deprecated):
```bash
POST /api/extract # Submit URL and wait for completion
GET /api/extract-stream # Long-polling for progress
```
**New Queue Endpoints**:
```bash
POST /api/queue # Submit URL, get queue ID immediately
GET /api/queue # List all queue items
GET /api/queue/{id} # Get specific item status
POST /api/queue/{id}/retry # Retry failed items
GET /api/queue/stream # Real-time SSE updates
```
### Migration Steps
If migrating from the old system:
1. **Update Client Code**: Replace `/api/extract` calls with `/api/queue`
2. **Handle Async Responses**: Process queue ID instead of waiting for completion
3. **Add Progress Tracking**: Implement SSE listeners for real-time updates
4. **Update Error Handling**: Handle new error classification system
5. **Add Retry Logic**: Implement retry functionality for failed items
### Backward Compatibility
The legacy endpoints are still available but deprecated:
- They will return `410 Gone` status with migration instructions
- Support will be removed in a future version
- All new development should use the queue endpoints
## 🤝 Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes with tests
4. Run the test suite (`npm test`)
5. Commit your changes (`git commit -m 'Add amazing feature'`)
6. Push to the branch (`git push origin feature/amazing-feature`)
7. Open a Pull Request
### Development Guidelines
- Follow TypeScript strict mode
- Add tests for all new functionality
- Use the existing architecture patterns (Hexagonal Architecture)
- Update documentation for API changes
- Ensure PWA functionality remains intact
## 📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
## 🙏 Acknowledgments
- [SvelteKit](https://kit.svelte.dev/) - Application framework
- [Tandoor Recipe Manager](https://docs.tandoor.dev/) - Recipe management system
- [Workbox](https://developers.google.com/web/tools/workbox) - PWA capabilities
- [fastq](https://github.com/mcollina/fastq) - High-performance queue processing