407 lines
11 KiB
Markdown
407 lines
11 KiB
Markdown
# InstaRecipe - Async Instagram Recipe Extractor
|
|
|
|
A modern web application that extracts recipes from Instagram posts and saves them to Tandoor Recipe Manager using an async queue-based processing system.
|
|
|
|
## 🚀 Features
|
|
|
|
### Core Functionality
|
|
|
|
- **Async Queue Processing**: Fire-and-forget recipe extraction with background processing
|
|
- **Real-time Updates**: Server-Sent Events for live progress tracking
|
|
- **Push Notifications**: Background notifications when recipes complete
|
|
- **Instagram Integration**: Extract recipes from Instagram posts and stories
|
|
- **Tandoor Integration**: Automatic upload to Tandoor Recipe Manager
|
|
- **PWA Support**: Installable Progressive Web App with offline capabilities
|
|
|
|
### User Experience
|
|
|
|
- **Queue Dashboard**: Monitor all recipe extractions in real-time
|
|
- **Share Integration**: Browser share target for easy URL submission
|
|
- **Responsive Design**: Works on desktop, tablet, and mobile
|
|
- **Error Recovery**: Retry failed extractions with one click
|
|
- **Progress Tracking**: Visual progress through extraction phases
|
|
|
|
### Technical Architecture
|
|
|
|
- **SvelteKit Frontend**: Modern reactive UI with TypeScript
|
|
- **Hexagonal Architecture**: Clean separation of concerns
|
|
- **In-Memory Queue**: High-performance processing with configurable concurrency
|
|
- **Three-Phase Pipeline**: Extraction → Parsing → Uploading
|
|
- **Comprehensive Testing**: 138 tests covering all components
|
|
|
|
## 📋 API Endpoints
|
|
|
|
### Queue Management
|
|
|
|
- `POST /api/queue` - Enqueue Instagram URL for processing
|
|
- `GET /api/queue` - List queue items with filtering and pagination
|
|
- `GET /api/queue/{id}` - Get specific queue item details
|
|
- `POST /api/queue/{id}/retry` - Retry failed item
|
|
- `GET /api/queue/stream` - Server-Sent Events for real-time updates
|
|
|
|
### Push Notifications
|
|
|
|
- `POST /api/notifications/subscribe` - Subscribe to push notifications
|
|
- `DELETE /api/notifications/subscribe` - Unsubscribe from notifications
|
|
- `GET /api/notifications/vapid-key` - Get VAPID public key
|
|
|
|
### Legacy Endpoints (Deprecated)
|
|
|
|
- ~~`POST /api/extract`~~ - Use `/api/queue` instead
|
|
- ~~`GET /api/extract-stream`~~ - Use `/api/queue/stream` instead
|
|
|
|
## 🛠 Development Setup
|
|
|
|
### Prerequisites
|
|
|
|
- Node.js 18+
|
|
- npm or pnpm
|
|
- Tandoor Recipe Manager instance (optional)
|
|
- LLM API access (OpenAI, Anthropic, or local)
|
|
|
|
### Installation
|
|
|
|
```bash
|
|
# Clone the repository
|
|
git clone <repository-url>
|
|
cd insta-recipe
|
|
|
|
# Install dependencies
|
|
npm install
|
|
|
|
# Copy environment template
|
|
cp .env.example .env
|
|
|
|
# Configure your environment variables (see Configuration section)
|
|
```
|
|
|
|
### Local Development
|
|
|
|
```bash
|
|
# Start development server with HTTPS
|
|
npm run dev
|
|
|
|
# Open in browser (certificate must be trusted)
|
|
open https://localhost:5173
|
|
```
|
|
|
|
The app runs on HTTPS by default for:
|
|
|
|
- Service worker support (required for PWA)
|
|
- Push notifications
|
|
- Browser share target API
|
|
- Instagram cookie handling
|
|
|
|
### SSL Certificate Setup
|
|
|
|
The application uses HTTPS in development with SSL certificates signed by an external Caddy CA container. The current certificate is valid until **December 20, 2035** (10 years).
|
|
|
|
**Certificate Information:**
|
|
|
|
- Location: `.ssl/` directory
|
|
- CA Certificate: `.ssl/root.crt` (already trusted on the system)
|
|
- Server Certificate: `.ssl/localhost.crt`
|
|
- Server Private Key: `.ssl/localhost.key`
|
|
|
|
Since the Caddy CA is already trusted on the system, the certificate should work without additional trust steps. If you encounter browser warnings:
|
|
|
|
**Linux (Ubuntu/Debian):**
|
|
|
|
```bash
|
|
sudo cp .ssl/root.crt /usr/local/share/ca-certificates/caddy-local.crt
|
|
sudo update-ca-certificates
|
|
```
|
|
|
|
**Chrome/Chromium:**
|
|
|
|
1. Go to `chrome://settings/certificates`
|
|
2. Click "Authorities" → "Import"
|
|
3. Select `.ssl/root.crt`
|
|
4. Check "Trust this certificate for identifying websites"
|
|
|
|
**Checking Certificate Expiration:**
|
|
|
|
```bash
|
|
openssl x509 -enddate -noout -in .ssl/localhost.crt
|
|
```
|
|
|
|
**Regenerating the Certificate (if needed):**
|
|
|
|
If the certificate expires or needs to be regenerated:
|
|
|
|
```bash
|
|
# Identify the Caddy container (usually named caddy-local)
|
|
CADDY_CONTAINER="caddy-local"
|
|
|
|
# Copy Caddy's CA certificate and private key
|
|
docker cp $CADDY_CONTAINER:/data/caddy/pki/authorities/local/root.crt .ssl/root.crt
|
|
docker cp $CADDY_CONTAINER:/data/caddy/pki/authorities/local/root.key .ssl/caddy-ca.key
|
|
|
|
# Generate new server private key
|
|
openssl genrsa -out .ssl/localhost.key 2048
|
|
|
|
# Generate Certificate Signing Request (CSR)
|
|
openssl req -new \
|
|
-key .ssl/localhost.key \
|
|
-out .ssl/localhost.csr \
|
|
-subj "/O=Caddy Local Authority/CN=localhost"
|
|
|
|
# Create OpenSSL config for Subject Alternative Names
|
|
cat > .ssl/localhost.ext << 'EOF'
|
|
authorityKeyIdentifier=keyid,issuer
|
|
basicConstraints=CA:FALSE
|
|
keyUsage = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment
|
|
extendedKeyUsage = serverAuth
|
|
subjectAltName = @alt_names
|
|
|
|
[alt_names]
|
|
DNS.1 = localhost
|
|
DNS.2 = *.localhost
|
|
IP.1 = 127.0.0.1
|
|
IP.2 = ::1
|
|
EOF
|
|
|
|
# Sign certificate with Caddy's CA (10 years = 3650 days)
|
|
openssl x509 -req \
|
|
-in .ssl/localhost.csr \
|
|
-CA .ssl/root.crt \
|
|
-CAkey .ssl/caddy-ca.key \
|
|
-CAcreateserial \
|
|
-out .ssl/localhost.crt \
|
|
-days 3650 \
|
|
-sha256 \
|
|
-extfile .ssl/localhost.ext
|
|
|
|
# Cleanup temporary files and set permissions
|
|
rm .ssl/localhost.csr .ssl/localhost.ext .ssl/caddy-ca.key .ssl/root.srl
|
|
chmod 600 .ssl/localhost.key
|
|
chmod 644 .ssl/localhost.crt .ssl/root.crt
|
|
|
|
# Verify the certificate
|
|
openssl verify -CAfile .ssl/root.crt .ssl/localhost.crt
|
|
```
|
|
|
|
## ⚙️ Configuration
|
|
|
|
### Environment Variables
|
|
|
|
Create a `.env` file with the following variables:
|
|
|
|
```env
|
|
# LLM Configuration
|
|
LLM_API_BASE_URL=https://api.openai.com/v1
|
|
LLM_API_KEY=your-api-key
|
|
LLM_MODEL=gpt-4o-mini
|
|
|
|
# Tandoor Integration (optional)
|
|
TANDOOR_BASE_URL=https://your-tandoor.com
|
|
TANDOOR_API_KEY=your-tandoor-token
|
|
|
|
# Queue Processing
|
|
QUEUE_CONCURRENCY=2
|
|
QUEUE_TIMEOUT_MS=30000
|
|
|
|
# Push Notifications (optional)
|
|
VAPID_PUBLIC_KEY=your-vapid-public-key
|
|
VAPID_PRIVATE_KEY=your-vapid-private-key
|
|
|
|
# Instagram Authentication (optional)
|
|
AUTH_SCHEDULER_ENABLED=true
|
|
AUTH_SCHEDULER_INTERVAL_MINUTES=720
|
|
```
|
|
|
|
### Tandoor Setup
|
|
|
|
To automatically upload extracted recipes to Tandoor:
|
|
|
|
1. Create an API token in your Tandoor instance
|
|
2. Set `TANDOOR_BASE_URL` and `TANDOOR_API_KEY` in `.env`
|
|
3. Recipes will be automatically uploaded after successful extraction
|
|
|
|
### Push Notifications
|
|
|
|
To enable web push notifications:
|
|
|
|
1. Generate VAPID keys:
|
|
```bash
|
|
npx web-push generate-vapid-keys
|
|
```
|
|
2. Set `VAPID_PUBLIC_KEY` and `VAPID_PRIVATE_KEY` in `.env`
|
|
3. Users can enable notifications in the dashboard settings
|
|
|
|
## 🏗 Architecture Overview
|
|
|
|
### Queue System
|
|
|
|
```
|
|
User submits URL → Queue Manager → Queue Processor
|
|
↓
|
|
Extraction Phase ← → Parsing Phase ← → Upload Phase
|
|
↓
|
|
Push Notifications ← → SSE Updates ← → Dashboard Updates
|
|
```
|
|
|
|
### Processing Pipeline
|
|
|
|
1. **Extraction Phase**: Browser automation extracts text and images
|
|
2. **Parsing Phase**: LLM converts text to structured recipe data
|
|
3. **Upload Phase**: Automatic upload to Tandoor (if configured)
|
|
|
|
Each phase tracks progress and can fail independently with proper error handling.
|
|
|
|
### Error Classification
|
|
|
|
- **Recoverable Errors** (`unhealthy`): Temporary issues, can be retried
|
|
- **Non-recoverable Errors** (`error`): Invalid URLs, parsing failures, etc.
|
|
|
|
## 🧪 Testing
|
|
|
|
```bash
|
|
# Run all tests
|
|
npm test
|
|
|
|
# Run specific test suites
|
|
npm run test:unit # Unit tests only
|
|
npm run test:client # Browser tests only
|
|
npm run test:server # Server tests only
|
|
|
|
# Run tests in watch mode
|
|
npm run test:watch
|
|
```
|
|
|
|
Test Coverage:
|
|
|
|
- **138 total tests** covering all major components
|
|
- Queue Manager: 28 tests
|
|
- Queue Processor: 5 integration tests
|
|
- API Endpoints: 17 tests
|
|
- SSE Streaming: 6 tests
|
|
- Frontend Components: Browser tests
|
|
|
|
## 📦 Building & Deployment
|
|
|
|
### Production Build
|
|
|
|
```bash
|
|
# Build for production
|
|
npm run build
|
|
|
|
# Preview production build locally
|
|
npm run preview
|
|
```
|
|
|
|
### Deployment
|
|
|
|
The app is built as a Node.js application with the following outputs:
|
|
|
|
- `/.svelte-kit/output/server/` - Server bundle
|
|
- `/.svelte-kit/output/client/` - Static assets
|
|
- `/build/` - Adapter output
|
|
|
|
Deploy the server bundle with:
|
|
|
|
```bash
|
|
node build/index.js
|
|
```
|
|
|
|
### Docker Deployment
|
|
|
|
```dockerfile
|
|
FROM node:18-alpine
|
|
WORKDIR /app
|
|
COPY package*.json ./
|
|
RUN npm ci --only=production
|
|
COPY build ./build
|
|
EXPOSE 3000
|
|
CMD ["node", "build"]
|
|
```
|
|
|
|
## 🔄 Migration from Synchronous System
|
|
|
|
### What Changed
|
|
|
|
The app was migrated from a synchronous extraction system to an async queue-based system:
|
|
|
|
**Before (Synchronous)**:
|
|
|
|
- User waited for entire extraction process to complete
|
|
- No progress tracking during processing
|
|
- No retry capability for failures
|
|
- Single-threaded processing
|
|
- Limited error handling
|
|
|
|
**After (Async Queue)**:
|
|
|
|
- Fire-and-forget: submit URL and redirect immediately
|
|
- Real-time progress tracking via SSE
|
|
- Comprehensive retry system for failures
|
|
- Concurrent processing (configurable)
|
|
- Detailed error classification and reporting
|
|
- Push notifications for background updates
|
|
|
|
### API Migration
|
|
|
|
**Old Synchronous Endpoints** (Deprecated):
|
|
|
|
```bash
|
|
POST /api/extract # Submit URL and wait for completion
|
|
GET /api/extract-stream # Long-polling for progress
|
|
```
|
|
|
|
**New Queue Endpoints**:
|
|
|
|
```bash
|
|
POST /api/queue # Submit URL, get queue ID immediately
|
|
GET /api/queue # List all queue items
|
|
GET /api/queue/{id} # Get specific item status
|
|
POST /api/queue/{id}/retry # Retry failed items
|
|
GET /api/queue/stream # Real-time SSE updates
|
|
```
|
|
|
|
### Migration Steps
|
|
|
|
If migrating from the old system:
|
|
|
|
1. **Update Client Code**: Replace `/api/extract` calls with `/api/queue`
|
|
2. **Handle Async Responses**: Process queue ID instead of waiting for completion
|
|
3. **Add Progress Tracking**: Implement SSE listeners for real-time updates
|
|
4. **Update Error Handling**: Handle new error classification system
|
|
5. **Add Retry Logic**: Implement retry functionality for failed items
|
|
|
|
### Backward Compatibility
|
|
|
|
The legacy endpoints are still available but deprecated:
|
|
|
|
- They will return `410 Gone` status with migration instructions
|
|
- Support will be removed in a future version
|
|
- All new development should use the queue endpoints
|
|
|
|
## 🤝 Contributing
|
|
|
|
1. Fork the repository
|
|
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
|
|
3. Make your changes with tests
|
|
4. Run the test suite (`npm test`)
|
|
5. Commit your changes (`git commit -m 'Add amazing feature'`)
|
|
6. Push to the branch (`git push origin feature/amazing-feature`)
|
|
7. Open a Pull Request
|
|
|
|
### Development Guidelines
|
|
|
|
- Follow TypeScript strict mode
|
|
- Add tests for all new functionality
|
|
- Use the existing architecture patterns (Hexagonal Architecture)
|
|
- Update documentation for API changes
|
|
- Ensure PWA functionality remains intact
|
|
|
|
## 📄 License
|
|
|
|
This project is licensed under the MIT License - see the LICENSE file for details.
|
|
|
|
## 🙏 Acknowledgments
|
|
|
|
- [SvelteKit](https://kit.svelte.dev/) - Application framework
|
|
- [Tandoor Recipe Manager](https://docs.tandoor.dev/) - Recipe management system
|
|
- [Workbox](https://developers.google.com/web/tools/workbox) - PWA capabilities
|
|
- [fastq](https://github.com/mcollina/fastq) - High-performance queue processing
|