# Architecture This document explains how `jobqueue` is structured, what each module owns, and how data moves through the system. ## 1. Top-level structure `JobQueue` is the orchestrator. Everything else is a collaborator around four concerns: 1. **Persistence** - `SqliteStorage` 2. **Execution** - `WorkerPool` + `PhaseRunner` 3. **Notifications** - `TypedEventBus`, `SseSerializer`, `WebhookDispatcher` 4. **Lifecycle management** - `RetryStrategy`, `RetentionScheduler`, shutdown logic ```mermaid flowchart LR A[Consumer app] -->|enqueue / retry / cancel / list| B[JobQueue] B --> C[SqliteStorage] B --> D[WorkerPool] D --> E[PhaseRunner] E --> F[Phase handlers registered by consumer] B --> G[TypedEventBus] G --> H[SSE stream subscribers] B --> I[WebhookDispatcher] I --> J[Webhook endpoints] B --> K[RetentionScheduler] K --> C ``` ## 2. Module responsibilities | Module | Responsibility | Key behavior | | --- | --- | --- | | `src/JobQueue.ts` | Public API and orchestration | Owns startup, enqueue, cancel, retry, pumping, event emission, webhook dispatch, shutdown | | `src/storage/SqliteStorage.ts` | SQLite persistence | Creates schema, claims jobs, persists progress, completion, failure, retry, cancellation, stale/delete | | `src/processor/WorkerPool.ts` | Concurrency limit | Wraps `p-limit`, tracks running promises, supports drain with timeout | | `src/processor/PhaseRunner.ts` | Multi-phase execution | Runs handlers in order, computes overall progress, stops on cancellation | | `src/retry/RetryStrategy.ts` | Retry policy | Classifies errors, computes backoff, decides retry vs fail | | `src/events/EventBus.ts` | In-process pub/sub | Strongly typed wrapper around Node `EventEmitter` | | `src/events/SseSerializer.ts` | SSE formatting | Serializes event name + JSON payload into SSE wire format | | `src/webhook/WebhookDispatcher.ts` | Outbound HTTP callbacks | Sends POST requests, signs payloads, retries transient failures | | `src/retention/RetentionScheduler.ts` | Background cleanup | Periodically marks old jobs stale and later deletes them | ## 3. Public API surface Exports from `src/index.ts` expose both high-level and low-level building blocks: - `JobQueue` - `SqliteStorage` - `WorkerPool` - `PhaseRunner` - `RetryStrategy` - `TypedEventBus` - `SseSerializer` - `WebhookDispatcher` - `RetentionScheduler` - shared queue/job/event types That split makes the package usable in two modes: 1. **Normal mode** - instantiate `JobQueue` and let it coordinate everything 2. **Advanced mode** - reuse lower-level pieces independently in custom orchestration ## 4. Persistence model `SqliteStorage` keeps a single `jobs` table. The queue is effectively modeled as persisted state transitions on that table. ### Important columns | Column | Meaning | | --- | --- | | `id` | Stable job identifier | | `status` | `pending`, `active`, `completed`, `failed`, `cancelled`, `stale` | | `data` | Original enqueue payload, JSON-encoded | | `current_phase` | Phase currently executing or last failed/retried phase | | `phases_json` | Array of per-phase state objects | | `phase_results` | JSON object keyed by phase name | | `progress` / `progress_message` | Latest overall progress snapshot | | `error_json` | Persisted failure metadata | | `retry_count` / `max_attempts` | Retry bookkeeping | | `webhook_url` / `webhook_sent` | Delivery configuration and latest success flag | | `scheduled_at` | Delayed execution / retry wake-up time | | `completed_at` / `cancelled_at` / `updated_at` | Lifecycle timestamps | ### Why SQLite works well here - queue selection is simple and local - state transitions are small, synchronous writes - WAL mode supports concurrent reads while jobs are executing - no separate broker is required for a single-process runtime ## 5. Phase model Each job stores an array of `JobPhaseState` entries: | Field | Meaning | | --- | --- | | `name` | Phase identifier from `QueueConfig.phases` | | `status` | `pending`, `active`, `completed`, `failed`, `cancelled` | | `progress` | Per-phase progress percentage | | `message` | Human-oriented phase status | | `startedAt` / `completedAt` | Phase timestamps | | `error` | Last phase-level error string | `PhaseRunner` walks those phases sequentially and computes overall progress as: ```text ((phaseIndex + phaseProgress / 100) / totalPhases) * 100 ``` That design gives one stable persisted representation for: - single-step jobs (`phases: ['run']`) - multi-step pipelines (`['download', 'process', 'upload']`) - retries that restart only unfinished phases ## 6. Event model `JobQueue` emits in-process typed events first. The SSE stream and webhook flow are adapters on top of that state machine. ### Core queue events - `job:enqueued` - `job:started` - `job:progress` - `job:phase:completed` - `job:completed` - `job:failed` - `job:retrying` - `job:cancelled` - `job:stale` - `job:deleted` - `job:webhook:delivered` - `job:webhook:failed` ### Event ordering rule The queue persists state before emitting the corresponding event. That means listeners observe already-persisted state, not speculative state. This is important for consumers that mix: - `queue.on(...)` - `queue.getJob(id)` - `queue.listJobs(...)` - `queue.createEventStream(...)` ## 7. SSE model `createEventStream()` creates a web `ReadableStream`. Behavior: 1. Optional snapshot of current jobs is sent first 2. Queue subscribes the stream to in-process events 3. Each event is serialized as `event: ` + JSON `data: ...` 4. Keepalive `ping` events are emitted on an interval 5. Cancelling the reader removes subscriptions and the keepalive timer ### Payload shapes Most runtime events include a full `job` object. Two notable exceptions: | Event | Payload detail | | --- | --- | | `job:deleted` | Includes `deletedJobId` because the record was removed from storage | | `ping` | No job payload, only a timestamp | ## 8. Webhook model Webhooks are outbound notifications, not part of the core execution loop. ### Flow 1. `JobQueue` decides whether an event should trigger a webhook 2. `WebhookDispatcher` POSTs JSON to queue-level or job-level URL 3. Optional HMAC SHA-256 signature is attached as `X-JobQueue-Signature` 4. 5xx and transport failures retry with exponential backoff 5. Success sets `webhook_sent = 1` and emits `job:webhook:delivered` ### Scope Supported webhook-triggering events: - `job:completed` - `job:failed` - `job:retrying` - `job:cancelled` - `job:stale` ## 9. Retention model Retention is deliberately two-stage: ```mermaid flowchart LR A[completed / failed / cancelled] -->|older than staleAfterMs| B[stale] B -->|older than deleteAfterMs| C[deleted] ``` Why two stages: - consumers get a visible grace period before deletion - `onStale` and `onDelete` hooks can clean external artifacts - `job:stale` is externally observable before hard deletion ## 10. State machine ```mermaid stateDiagram-v2 [*] --> pending : enqueue pending --> active : claimPendingJob pending --> cancelled : cancel active --> completed : all phases succeed active --> failed : fatal error / retries exhausted active --> pending : recoverable error + retry active --> cancelled : cancel / abort completed --> stale : retention mark failed --> stale : retention mark cancelled --> stale : retention mark stale --> [*] : retention delete failed --> pending : manual retry cancelled --> pending : manual retry stale --> pending : manual retry ``` ## 11. Build and packaging - ESM only - Node 20 target - bundled with `tsup` - type declarations emitted from the same entrypoint - tests run under Vitest in Node environment ## 12. LikeC4 companion charts See [`jobqueue.c4`](./jobqueue.c4) for: - system landscape - library/container view - runtime/component view - enqueue-to-complete dynamic flow - retry dynamic flow