Tasks Specification

Welcome to the Tasks platform specification.

Tasks is a human-in-the-loop platform that orchestrates coding agents to get project work done. It combines a long-running server, an AI orchestrator, and a fleet of implementor agents into a system where:

  • A human describes work in natural language. The orchestrator decomposes it into well-formed issues and dispatches agents to implement them.
  • Each task gets an isolated session with its own sandbox, git branch, and agent process. The human can drop into any session to steer, answer questions, or add context.
  • A merge queue manages the pipeline from completed work to shipped code. The human controls how much autonomy the system has — from fully manual review to fully autonomous merging.
  • An append-only event system provides a complete audit trail of everything that happens across all tasks, agents, and decisions.

Documents

This specification is organized into the following documents:

Core Specification

  • Tasks Platform Specification — The main platform specification covering architecture, domain model, operating modes, merge queue, event system, sessions, and more.

Detail Specifications

  • Session Runtime — Container provider, supervisor protocol, and workspace provisioning. Expands spec.md §10-11, §18.
  • GitHub Integration — Normalized model, GraphQL queries, and polling mechanisms. Expands spec.md §12.

Reference

  • Contributing — Guidelines for writing and maintaining spec documents.

Status

Current specification status: Draft v1

Versioning

This specification is versioned. Use the version selector in the navigation to view previous versions of the specification.

Tasks

Status: Draft v1 (TypeScript)

Purpose: Define a human-in-the-loop platform that orchestrates coding agents to get project work done.

1. Problem Statement

Tasks is a platform for collaborating with AI coding agents on real project work. It combines a long-running server, an AI orchestrator, and a fleet of implementor agents into a system where:

  • A human describes work in natural language. The orchestrator decomposes it into well-formed issues and dispatches agents to implement them.
  • Each task gets an isolated session with its own sandbox, git branch, and agent process. The human can drop into any session to steer, answer questions, or add context.
  • A merge queue manages the pipeline from completed work to shipped code. The human controls how much autonomy the system has — from fully manual review to fully autonomous merging.
  • An append-only event system provides a complete audit trail of everything that happens across all tasks, agents, and decisions.

The platform solves five operational problems:

  • It turns issue execution into a repeatable, observable workflow instead of manual scripts.
  • It isolates agent execution in per-task sandboxed workspaces.
  • It provides a controllable autonomy model: the human chooses how much to delegate and can intervene at any level at any time.
  • It gives the human an AI collaborator (the orchestrator) that manages project state, unblocks agents, evaluates quality, and makes merge decisions when trusted to do so.
  • It keeps a complete, immutable record of all activity through its event system.

Important boundaries:

  • Tasks reads from GitHub to discover work. All GitHub writes (comments, labels, state changes, PR creation) are performed by agents working inside their sessions.
  • The orchestrator is an AI agent, not scheduling logic. The scheduler discovers work; the orchestrator manages the project.
  • A successful task may end at a workflow-defined handoff state (for example awaiting_merge), not necessarily a GitHub-closed issue.

2. Goals and Non-Goals

2.1 Goals

  • Provide a human-in-the-loop platform where the human controls the level of autonomy granted to the system.
  • Present an AI orchestrator that the human can collaborate with via chat or voice to manage project work.
  • Poll the issue tracker on a fixed cadence and dispatch work with bounded concurrency.
  • Create isolated, per-task sessions with sandboxed workspaces and dedicated git branches.
  • Manage a merge queue with configurable authority (human, orchestrator, or held).
  • Support three operating modes (Stop, Pause, Play) with the orchestrator able to lower the mode but only the human able to raise it.
  • Maintain a complete audit trail through an append-only event log.
  • Expose every session as a persistent chat conversation that any actor can join.
  • Recover from transient failures with exponential backoff.
  • Support multi-project management across repositories and organizations.
  • Keep the agent provider pluggable — the spec defines the session contract, not the agent.

2.2 Non-Goals

  • Multi-tenant platform or team management. Tasks serves a single human operator.
  • Fully autonomous operation without human oversight. The human always controls the autonomy level and can intervene at any time.
  • General-purpose workflow engine or distributed job scheduler.
  • Built-in CI/CD pipeline. Tasks integrates with existing CI but does not replace it.
  • Rich issue tracker features (milestones, sprints, boards). GitHub is the system of record for project management; Tasks is the execution layer.

3. System Overview

3.1 Server

The server is the platform. It is the long-running process that everything else runs on.

  • Always running. The GUI connects to it on demand.
  • Hosts the event log, task state, merge queue, and scheduler.
  • Serves a web GUI for human interaction.
  • Exposes the orchestrator and task conversations to the GUI over websockets (or equivalent).
  • Tracks human presence based on active GUI connections.

3.2 Scheduler

The scheduler is responsible for discovering new work and detecting state changes on tracked issues.

  • Polls GitHub on a configurable cadence for issue and PR updates.
  • May also accept push notifications from GitHub webhooks when available, with polling as a fallback and reconciliation mechanism.
  • When new or changed issues are detected, the scheduler emits events into the event bus (system:scheduler:tick, task:created, state changes, etc.).
  • The scheduler does not make decisions about what to do with the work — it discovers and queues. Dispatch decisions are made by the server's task dispatch logic.

3.3 Projects

A project maps to a single repository (initially, for simplicity).

  • The server can manage multiple projects across repos and orgs.
  • Each project has its own set of tasks, workspace root, and configuration.
  • The orchestrator can create tasks in any project the server manages, including directing work to a different project when the current one does not cover the needed scope.

3.4 External Dependencies

  • GitHub API (Issues, PRs, webhooks) for issue tracking.
  • Local filesystem and embedded database for persistent state (see §3.5).
  • Container runtime for session environments (see session-runtime.md).
  • Git CLI for branch and repository operations.
  • Coding agent executable (Claude Code initially) that supports chat-style interaction over stdio.
  • Host environment authentication for GitHub and the coding agent's AI provider.

3.5 Data Storage

The platform stores two categories of data with different access patterns:

Structured state. Projects, tasks, merge queue entries, and configuration. This data is read-write, queried by various fields (e.g., "all tasks in waiting state for project X"), and must survive server restarts. Stored in a local embedded database (SQLite). The database is the source of truth for current state.

Event log. The append-only record of everything that happened (spec §9). Events are written sequentially and read sequentially (replay, live subscriptions). Stored as per-task JSONL files on the local filesystem. The event log is the audit trail — it can reconstruct state, but the database is the primary read path for current state.

The separation is intentional. The database is optimized for point queries and state mutations. The event log is optimized for append and sequential scan. Both live on the local filesystem — there is no external database server.

Data directory. All persistent data lives under a single configurable root directory:

  • {data_dir}/db.sqlite — structured state
  • {data_dir}/events/{task-id}/events.jsonl — per-task event logs
  • {data_dir}/workspaces/ — container workspace state (managed by the container runtime)

The data directory defaults to ~/.local/state/tasks/ (following the XDG Base Directory convention) and is configurable at startup via TASKS_DATA_DIR.

4. Actors and Roles

Tasks has three actor classes: the human operator, the orchestrator, and implementor agents. They interact through the server, which is the shared platform all actors operate on.

4.1 Human Operator

The human is the project owner. They set direction, make final calls, and can intervene at any level.

Capabilities:

  • Talk to the orchestrator directly via chat (or voice) to create work, give direction, or ask questions about project state.
  • Drop into any individual task conversation to steer an implementor agent, answer its questions, or add context.
  • Review the merge queue: inspect pending merges, leave notes, approve or reject.
  • Control the operating mode (play, pause, stop).

Presence:

  • The server tracks whether a GUI client is connected.
  • Connected = the human is present. The orchestrator and agents may surface questions and expect timely responses.
  • Disconnected = autonomous mode. The orchestrator makes judgment calls instead of waiting on the human. Questions that would normally surface to the human are either resolved by the orchestrator or parked until the human returns.

4.2 Orchestrator

The orchestrator is an AI agent that manages the project. It may be one or more agents under the hood, but presents as a single entity to the human and to implementors.

Responsibilities:

  • Triage and decomposition. When the human describes work in natural language, the orchestrator turns that into well-formed issues — organized, contextualized, and split into sub-issues as needed.
  • Unblocking agents. When an implementor agent is stuck and has a question, it can ask the orchestrator. The orchestrator either answers directly (using project context), or escalates to the human if present or if the question is important enough to warrant waiting.
  • Quality gate. The orchestrator evaluates whether an implementation meets the quality bar for the task. This is the checkpoint before work enters the merge queue.
  • Merge authority (delegated). In Play mode, the orchestrator owns merge decisions continuously — reviewing pending merges and shipping them as they pass quality evaluation. In Pause/Stop, merge authority is held (though the human can Flush approved items in Pause).
  • Surfacing information. The orchestrator proactively nudges the human about important decisions, blockers, or choices that are holding up downstream work — but only when the human is around or when the issue is important enough to notify asynchronously.

The orchestrator does not generally execute code or make file changes directly — it operates through implementor agents and the issue tracker. However, this is not a hard constraint. The human may ask the orchestrator to perform small tasks directly when that is more expedient than creating a session.

4.3 Implementor Agents

Implementor agents are the workers. Each one is assigned a task and gets an isolated workspace.

Characteristics:

  • The agent provider is an implementation detail (Claude Code initially, but pluggable).
  • Each agent runs inside a session, which owns a sandboxed copy of the repo on a real git branch.
  • Agents do not interact with the Tasks event system directly. The session wrapper monitors the agent's output and emits events to the bus on its behalf.
  • When stuck, the session detects this and can escalate to the orchestrator for guidance.
  • Every session is a chat conversation: the human can drop in at any time to steer, answer questions, or add context. The orchestrator can do the same.
  • The spec defines the session and task lifecycle, not the agent's internal behavior.

4.4 Interaction Patterns

Human <--chat/voice--> Orchestrator
                            |
                     delegates / unblocks / evaluates
                            |
                   Sessions (1 per active task)
                    [sandbox + agent + chat + event emitter]
                            |
                     events emitted to bus
                            |
                        Server (platform)
                            |
                     UI, merge queue, event log

The human primarily interacts with the orchestrator. Direct interaction with sessions is available for steering or answering questions, but the default flow is autonomous.

4.5 Orchestrator Chat Interface

The human can talk to the orchestrator directly via chat (§4.1). This section specifies the interface.

Chat Context

When processing a message, the orchestrator receives a snapshot of current system state:

  • Current operating mode (Stop, Pause, Play)
  • All projects
  • All tasks with their current state
  • Recent orchestrator events (decisions, escalations, mode changes)
  • Whether a human is currently connected

This context enables the orchestrator to answer questions about system status, explain recent activity, and make informed suggestions.

Conversation History

The orchestrator maintains conversation history across messages within a server session.

  • History is bounded to 40 messages to prevent unbounded growth.
  • History persists while the server is running but is not persisted to disk.
  • When the server restarts, conversation history starts fresh.

Queue Bypass

Human chat messages to the orchestrator bypass the evaluation queue (§7.1) and are processed immediately. This ensures the human can always communicate with the orchestrator without waiting for queued PR evaluations to complete.

Events

  • orchestrator:message — Emitted when the human sends a message to the orchestrator. Contains { message: string }. The task field is "system" (not scoped to a task).
  • orchestrator:response — Emitted when the orchestrator responds. Contains { message: string } and optionally { error: true } if an error occurred. The task field is "system".

5. Domain Model

5.1 Task

A task is the internal representation of a unit of work. It may originate from a GitHub issue, a GitHub PR, or be created by the orchestrator.

Fields:

  • id (string) — internal task ID
  • source (object) — origin reference (GitHub issue ID/number, PR ID/number, or internal)
  • title (string)
  • description (string or null)
  • state (string) — current task state (see 5.2)
  • parent_id (string or null) — parent task ID, if this is a sub-task
  • blocked_by (list of task IDs) — tasks that must complete before this one can proceed
  • project (string) — project ID this task belongs to
  • labels (list of strings)
  • priority (integer or null) — lower numbers are higher priority
  • session_id (string or null) — active session ID, if any
  • workspace_id (string or null) — workspace ID, if provisioned
  • created_at (timestamp)
  • updated_at (timestamp)

5.2 Task States

  • waiting — no agent slot available / max concurrency reached
  • blocked — waiting on another task to finish
  • running — agent is actively working
  • question — agent is waiting on human or orchestrator for input
  • testing — agent done, CI/deterministic testing running
  • awaiting_merge — implementation complete, in merge queue
  • conflict — merge conflict needs resolution
  • changes_requested — PR needs work before re-evaluation; gets priority dispatch
  • completed — task finished successfully
  • failed — task failed
  • cancelled — task was cancelled

5.3 Task Hierarchy

Tasks can have parent/child relationships. This is primarily an organizational tool — a way to break a large issue into smaller pieces of work.

  • Sub-tasks are dispatched independently. A parent task does not implicitly block on its children.
  • A sub-task that gets implemented may produce a PR, a comment on the parent issue, an update to the parent task's state, or any combination.
  • Explicit blocking relationships can exist between any tasks (not just parent/child). Blocking is typically emergent: an agent working on a task recognizes that it depends on work that hasn't been done yet and emits a task:state:blocked event referencing the blocking task.
  • When a blocking task completes, blocked tasks should be re-evaluated for dispatch.
  • Parent tasks can subscribe to their children's event streams to track progress.

5.4 Session

See Section 10 for full session specification.

Fields:

  • id (string) — session ID
  • task_id (string) — the task this session is executing
  • workspace_path (string) — path to the sandboxed workspace
  • branch (string) — git branch name
  • status (string) — session status (starting, running, completed, failed, terminated)
  • started_at (timestamp)
  • ended_at (timestamp or null)

5.5 Merge Queue Entry

See Section 7 for full merge queue specification.

Each entry represents a pull request that is a candidate for merging. The merge queue is PR-centric — it tracks pull requests, not tasks. A task may produce a PR, in which case the entry links back to it, but the queue operates on PRs independently.

Fields:

  • id (string) — queue entry ID
  • task_id (string) — the originating task, if any
  • pr_url (string) — pull request URL
  • status (string) — pending, approved, merging, rejected, merged, conflict, changes_requested
  • queued_at (timestamp)
  • changes_requested_feedback (string or null) — feedback when status is changes_requested

5.6 Project

Fields:

  • id (string)
  • repo (string) — repository reference (owner/repo)
  • default_branch (string) — typically main
  • config (object) — project-level configuration

6. Operating Modes

The system operates in one of three modes. The current mode controls merge queue behavior. Agents are dispatched and work normally in all modes except Stop.

6.1 Stop

  • No new work is dispatched.
  • Running agent processes are terminated. The sandbox and git branch persist, but in-flight agent work is lost. When the system resumes, affected sessions restart from scratch.
  • The merge queue is held.
  • The system is idle until the human resumes.

6.2 Pause

  • Agents are dispatched and work on tasks normally.
  • Eligible pull requests enter the merge queue (see §7.0 for eligibility criteria).
  • The merge queue is held: nothing merges automatically.
  • The orchestrator continues to manage agents, answer questions, and evaluate quality — but does not approve merges.
  • Pause is the typical review state. The human reviews the queue, triages pending items, leaves feedback on individual tasks, and once satisfied, either flushes approved items or switches to Play.
  • The Flush action is available in Pause: it pushes through everything currently approved in the queue. The system remains in Pause afterward.

6.3 Play

  • Agents are dispatched and work on tasks normally.
  • The merge queue is active: as pull requests arrive, they are queued for evaluation. The orchestrator evaluates one entry per tick (§7.1) and approved entries merge automatically.
  • The orchestrator owns merge authority. The human can still intervene at any time.
  • Play is the fully autonomous mode. The human delegates merge authority to the orchestrator and may step away.

6.4 Mode Transitions

Mode transitions follow a severity ordering: Stop < Pause < Play.

  • The human can change the mode in any direction.
  • The orchestrator can lower the mode (for example, Play -> Pause if something goes wrong), but only a human can raise it.
  • This ensures the system can protect itself, but only the human can grant more autonomy.
  • Transitions take effect immediately for new dispatches and merge decisions.

Mode Lowering Triggers

The orchestrator monitors for failure patterns that warrant lowering operating mode from Play to Pause:

  • Consecutive evaluation failures: 3 consecutive API or internal errors during merge evaluation
  • Consecutive PR rejections: 5 consecutive PR rejections (pattern of bad PRs)
  • Repeated merge conflicts: 3 conflicts within a 10-minute window
  • Repeated agent errors: 3 agent errors within a 10-minute window
  • Repeated task failures: 3 task failures within a 10-minute window

These thresholds are defaults. When any threshold is exceeded, the orchestrator emits an escalation event and transitions from Play to Pause.

When the human raises the mode back to Play, the problem tracker resets, allowing it to detect new failure patterns.

7. Merge Queue

The merge queue is an ordered list of pull requests waiting to be merged. It is independent of tasks — a task may produce a PR that enters the queue, but the queue itself operates on pull requests regardless of how they originated.

7.0 PR Eligibility

A pull request enters the merge queue when all of the following are true:

  • State is Open — Closed or merged PRs are not queued.
  • Not a draft — Draft PRs are excluded until marked ready for review.
  • On a tracked repository — The PR must be on a repository added to the platform.

The platform does not filter PRs by author, branch name, or label. All eligible PRs on tracked repositories enter the queue, regardless of whether they were created by an agent or a human. This uniform treatment avoids the need to track "agent-created" status and supports repos where both humans and agents contribute PRs.

Implications:

  • When you add a repository with existing open PRs, those PRs immediately enter the merge queue.
  • Human-authored PRs on tracked repos will be evaluated by the orchestrator.
  • If this behavior is undesired, consider using a dedicated repository for agent work, or using draft PRs for human work-in-progress to keep them out of the queue until ready.

7.1 Queue Entry Lifecycle

  1. An open pull request is discovered on a tracked repository.
  2. A merge queue entry is created with status pending.
  3. The entry is added to the orchestrator's evaluation FIFO queue.
  4. The orchestrator evaluates one entry per tick (default: 15 seconds, configurable via TASKS_ORCHESTRATOR_EVAL_INTERVAL). Once evaluated, a PR is not re-evaluated until it receives new commits.
  5. The merge authority (human or orchestrator, depending on mode) reviews and either approves, requests changes, or rejects.
  6. Approved entries are merged (in Play mode, continuously; in Pause mode, via Flush).
  7. If changes are requested, the entry transitions to changes_requested status with feedback. The associated task transitions to changes_requested state and gets priority dispatch to address the feedback. The PR and entry remain in the queue.
  8. If rejected, the entry is removed from the queue. If the PR originated from a task, that task may be re-engaged with feedback.

Human chat messages to the orchestrator bypass the evaluation queue and are processed immediately (see §4.5).

7.2 Two-Phase Model

The merge queue operates in two distinct phases. Understanding this split clarifies the lifecycle and guides both implementation and UI design.

7.2.1 Review Phase

The review phase encompasses entries that are accumulating signal before a merge decision.

Statuses in this phase:

  • pending — Awaiting evaluation. The orchestrator has not yet reviewed the PR.
  • changes_requested — Feedback provided. The entry needs work before re-evaluation.
  • conflict — Merge conflict detected. Resolution required before the entry can proceed.

What happens in this phase:

  • The orchestrator evaluates PRs against associated issues (§7.4).
  • CI checks run and report status.
  • Humans can review, leave feedback, or approve.
  • Entries accumulate signals until a decision is reached.

Actions available:

  • Approve — Move the entry to the merge phase.
  • Request changes — Keep the entry in review with actionable feedback.
  • Reject — Remove the entry from the queue (terminal).

7.2.2 Merge Phase

The merge phase encompasses entries that have been approved and are ready to merge.

Statuses in this phase:

  • approved — Ready to merge, waiting in line.
  • merging — Actively being merged (GitHub API call in progress).

What happens in this phase:

  • Entries are processed serially — one entry merges at a time.
  • The merge process: checkout branch → check for conflicts → merge to base branch.
  • On success, status transitions to merged (terminal).
  • On failure (e.g., conflict detected during merge), the entry returns to the review phase with status conflict, not outright rejected. This preserves the work and allows resolution.

Serial processing rationale:

Serial merging prevents race conditions where two PRs both pass conflict checks but then conflict with each other. By processing one at a time, each merge has a stable base.

7.2.3 Lifecycle Diagram

                    ┌─────────────────────────────────────────────────────────┐
                    │                     REVIEW PHASE                        │
                    │                                                         │
PR Detected ───────►│  pending ◄───────► changes_requested                    │
                    │     │                     │                             │
                    │     │                     │                             │
                    │     ▼                     ▼                             │
                    │  conflict ◄─────────────────────────────────────┐       │
                    │     │                                           │       │
                    └─────┼───────────────────────────────────────────┼───────┘
                          │                                           │
                    ┌─────┼───────────────────────────────────────────┼───────┐
                    │     │              MERGE PHASE                  │       │
                    │     ▼                                           │       │
                    │  approved ──────► merging ──────────────────────┘       │
                    │                      │         (conflict)               │
                    └──────────────────────┼──────────────────────────────────┘
                                           │
                                           ▼
                                        merged
                                       (terminal)

Rejected is also terminal — entries exit the queue entirely.

7.2.4 Operating Mode Behavior by Phase

ModeReview PhaseMerge Phase
StopHeld — no evaluation occursHeld — no merging occurs
PauseOrchestrator evaluates, human reviewsHeld — Flush available for approved items
PlayContinuous evaluationContinuous merging

This table clarifies how operating modes affect each phase independently:

  • Stop: Both phases are completely paused. The queue is frozen.
  • Pause: Review continues (PRs are evaluated, feedback given), but nothing merges automatically. The Flush action allows the human to push approved entries through the merge phase on demand.
  • Play: Both phases operate continuously. PRs are evaluated as they arrive, and approved entries merge automatically.

7.3 Merge Authority

Merge authority determines who approves merges from the queue.

ModeMerge authority
StopNobody (held)
PauseNobody (held). Human triages and reviews. Flush available
PlayOrchestrator (continuous). Human can override any time

The human always has the ability to intervene — reject a merge, pull something out of the queue, or drop into a task to give feedback. The mode controls the default flow, not the human's access.

7.4 Quality Evaluation

Before approving a merge, the orchestrator reviews the actual code diff:

Pass 1 — Diff triage. The orchestrator fetches the unified diff from GitHub and evaluates it against the associated issue. It checks:

  • Does the diff actually address the issue? (Not "does the PR description say it does.")
  • Are there obvious correctness issues — missing error handling, incomplete removals, broken dependents?
  • Is the change complete relative to the issue scope?
  • Are there merge conflicts or failing CI checks?

Self-reported "test plans" in PR descriptions are ignored — these are written by the same agent that wrote the code and are not verification.

If the change is obviously correct and complete, the orchestrator approves or rejects immediately.

Pass 2 — Deep review (conditional). If the diff is substantial, ambiguous, or the orchestrator isn't confident the change is correct, it requests up to 5 specific files from the PR branch for deeper context. With the full file contents, it evaluates whether the change integrates correctly with the surrounding code — checking for broken callers, missed dependents, and incomplete migrations.

If the work isn't ready, the orchestrator rejects the entry with specific, actionable feedback referencing the actual code. If the PR originated from a task, that task may be re-engaged with the feedback rather than queuing a bad merge.

7.5 Changes Requested

When work needs minor fixes rather than outright rejection, the orchestrator or human can request changes instead of rejecting:

  • The entry transitions to changes_requested status with specific, actionable feedback.
  • The associated task transitions to changes_requested state.
  • changes_requested tasks get priority dispatch over regular waiting tasks. This ensures work that's close to completion gets addressed quickly rather than sitting in a queue behind new work.
  • When the agent addresses the feedback and pushes new commits, the entry returns to pending status for re-evaluation.

This preserves the PR and work-in-progress rather than throwing it away, allowing the agent to make targeted fixes. It's appropriate when:

  • The implementation is mostly correct but has minor issues
  • The code needs a rebase onto the latest base branch
  • Small edge cases or error handling need to be addressed
  • Documentation or tests need minor additions

For major rework or fundamentally incorrect approaches, rejection with a new task is more appropriate.

7.6 Conflicts

When a pending merge has conflicts:

  • The entry transitions to the conflict status.
  • The merge remains in the queue but is not eligible until the conflict is resolved.
  • The orchestrator triages the conflict:
    • In Play mode: the orchestrator resolves the conflict autonomously — typically re-engaging the implementor agent, or resolving it directly when the resolution is mechanical (rebases, trivial merge conflicts).
    • In Pause or Stop mode: the orchestrator surfaces non-trivial conflicts to the human for guidance. When the human is not present, mechanical conflicts are resolved directly while complex conflicts wait for human input.

8. Reflections

Reflections are a post-merge review stage. After pull requests are merged, they enter a reflection window where the human can review the merged changes asynchronously and provide feedback.

Reflections address a timing gap in the merge queue model: the human may not always be present when work is evaluated and merged, especially in Play mode. Reflections provide a second chance to review merged changes, flag concerns, and create a feedback loop that improves future work.

8.1 Purpose

Reflections serve several purposes:

  • Async review. Comment on merged changes that slipped through the queue before the human could review them in real-time.
  • Post-merge analysis. Raise security, performance, or architectural concerns after seeing the code in context of the merged codebase.
  • Feedback loop. Provide input that shapes how the orchestrator evaluates and guides future tasks (e.g., "in past reflections, X approach was flagged as problematic").
  • Questioning approaches. Challenge design decisions or implementation choices with the benefit of hindsight.

8.2 Reflection Model (Phase 1)

In the initial implementation, reflections are GitHub issues with a special label.

Implementation:

  • Reflections are issues labeled with reflection (or a configurable label).
  • The reflection label is added to the workflow configuration's [labels].ignore list, so reflection issues are not imported as tasks. They exist for review and discussion only.
  • The UI includes a Reflections tab that displays issues with the reflection label, filtered separately from the task list.
  • Reflections reference the merged PR they pertain to (via link in the issue body or title).

Creating reflections:

  • The human creates a reflection issue manually, linking it to the merged PR or commit range.
  • The issue body contains the human's observations, questions, or concerns.

Workflow configuration:

[labels]
ignore = ["wontfix", "duplicate", "ignore", "reflection"]

[reflections]
label = "reflection"  # The label that marks an issue as a reflection (default: "reflection")

8.3 Reflection Lifecycle

Reflections are simpler than tasks — they do not trigger agent sessions or enter a dispatch queue.

  1. Creation. A reflection issue is created (manually by the human, or automatically in Phase 2).
  2. Discussion. The human and orchestrator discuss the reflection. The orchestrator may provide context about why certain decisions were made, reference the original task, or acknowledge the feedback.
  3. Resolution. The human closes the reflection when satisfied. Resolution options:
    • Acknowledged — feedback noted, no action needed.
    • Action taken — a follow-up task was created to address the concern.
    • Deferred — valid concern, but not worth addressing now.
  4. Archival. Closed reflections remain in the repository history and can be referenced in future task prompts.

8.4 Design Decisions

Reflections do not block future work. Reflections are advisory, not blocking. A reflection flagging a concern about area X does not prevent new tasks from working on area X. If the concern is severe enough to warrant blocking, the human should create a regular issue or mark existing tasks as blocked.

Rationale: blocking on reflections would create an approval bottleneck that undermines the system's autonomy model. Reflections are a feedback mechanism, not a gate.

No automatic reflection window. Reflections can be created at any time after a PR merges. There is no automatic "reflection period" during which merges are held or flagged. The human reviews merged work at their own pace.

Rationale: a mandatory window would slow down Play mode without clear benefit. The human can always review the merge queue history and merged PR list.

Orchestrator cannot create reflections autonomously (Phase 1). In the initial implementation, only the human creates reflections. The orchestrator can respond to reflections in conversation and reference past reflection feedback when evaluating tasks, but it does not initiate the reflection process.

Rationale: reflections are a human review mechanism. Having the orchestrator create them would defeat the purpose. In future phases, the orchestrator may flag items for human reflection.

8.5 Future Extensions

Phase 2: Auto-populated reflections. When a PR merges, the system automatically creates a reflection issue pre-populated with:

  • The PR title and link
  • A summary of the diff (generated by the orchestrator)
  • The merge queue evaluation notes (what the orchestrator checked, what it approved)
  • Links to the original task and issue

The human can then review this pre-populated reflection, add comments, or close it without action if the change looks good. This reduces the friction of creating reflections while maintaining human control over the review process.

Phase 3: Feedback integration. Reflection feedback is incorporated into the orchestrator's context for future task evaluation:

  • When evaluating a new task or PR, the orchestrator retrieves relevant past reflections (by area of code, author, type of change) and factors them into its assessment.
  • Patterns from reflections (e.g., "security concerns in auth code are frequently flagged") are summarized and included in agent prompts for related tasks.
  • The system tracks whether feedback from reflections was acted upon or acknowledged.

8.6 Reflection Events

Reflections integrate with the event system through a small set of events:

  • reflection:created — a new reflection issue was detected
  • reflection:comment — a comment was added to a reflection
  • reflection:closed — a reflection was resolved

These events are informational — they trigger no automatic actions in the dispatch system or merge queue. They allow the UI to display reflection activity and the orchestrator to maintain awareness of ongoing review discussions.

9. Event System

Tasks uses an append-only event log as the backbone for all communication between components. Agents, the orchestrator, the scheduler, and the human all produce events. The UI, orchestrator, and parent tasks consume them.

9.1 Design

  • All events are immutable. Once written, an event is never modified or deleted.
  • Events are persisted to an append-only log, stored per-task (one log file per task).
  • A lightweight in-memory pub/sub layer sits in front of the log for live subscriptions.
  • Consumers can subscribe to live events and replay historical events from the log.

9.2 Event Shape

Every event has the same base structure:

  • id (string) — unique event ID
  • type (string) — colon-delimited event type (see 8.3)
  • task (string) — task ID this event belongs to
  • actor (string) — who produced this event: human, orchestrator, scheduler, agent, or system
  • ts (timestamp) — when the event occurred
  • data (object) — event-type-specific payload

9.3 Event Types

Task events:

  • task:created — a new task exists
  • task:state:running — agent is actively working
  • task:state:question — agent is waiting on human or orchestrator for input
  • task:state:waiting — no agent slot available / max concurrency reached
  • task:state:blocked — waiting on another task to finish
  • task:state:testing — agent done, CI/deterministic testing running
  • task:state:awaiting_merge — implementation complete, in merge queue
  • task:state:conflict — merge conflict needs resolution
  • task:state:completed — task finished successfully
  • task:state:failed — task failed
  • task:state:cancelled — task was cancelled

Agent events:

  • agent:message — agent emitted a status update, progress note, or response
  • agent:question — agent is asking for help (triggers task:state:question)
  • agent:error — something went wrong inside the agent session

Merge events:

  • merge:queued — work entered the merge queue
  • merge:approved — approved for merge
  • merge:rejected — sent back with feedback
  • merge:completed — actually merged
  • merge:conflict — conflict detected

Orchestrator events:

  • orchestrator:feedback — orchestrator sent feedback to an agent
  • orchestrator:escalation — orchestrator surfaced something to the human
  • orchestrator:decision — orchestrator made a judgment call
  • orchestrator:message — human sent a chat message to the orchestrator (§4.5)
  • orchestrator:response — orchestrator replied to human chat (§4.5)

System events:

  • system:started — server started
  • system:mode:play — mode changed to Play
  • system:mode:pause — mode changed to Pause
  • system:mode:stop — mode changed to Stop
  • system:flush — merge queue flush triggered
  • system:config:reloaded — configuration was reloaded
  • system:scheduler:tick — scheduler polled for updates

9.4 Subscriptions

Consumers subscribe to events using colon-delimited patterns with wildcard support:

  • task:* — all task events
  • task:state:* — all state changes
  • agent:* — all agent communication
  • merge:completed — just that one event type

A parent task can subscribe to events from its child tasks by task ID. Any consumer can subscribe to any task's event stream. The bus handles routing — tasks never communicate directly with each other.

9.5 Storage and Cleanup

  • Events are stored per-task as append-only files (e.g., <task-id>/events.jsonl).
  • A global index or secondary log may be maintained for cross-task queries.
  • Cleanup policy is configurable:
    • Completed/cancelled task logs can be archived or deleted after a retention period.
    • Active task logs are never pruned.

10. Sessions and Agent Runner

A session is the unit of execution. When the server dispatches a task for implementation, it creates a session. The session owns everything needed to work on that task: an isolated runtime environment with its own copy of the repo, a git branch, an agent process, a chat history, and an event emitter.

The session runtime is a multi-process environment — not just an agent process. It hosts the agent, a supervisor that manages the agent's lifecycle, and any processes the agent spawns (test runners, build tools, git operations). See session-runtime.md for the runtime architecture.

10.1 Session Lifecycle

  1. Creation. Dispatch logic determines a task needs work (implementation, comment, sub-issue creation, etc.). A new session is created for that task.
  2. Runtime setup. The session provisions an isolated runtime environment and workspace (see Section 11) with its own copy of the repo checked out to a dedicated git branch.
  3. Agent launch. An agent process is started inside the runtime. The session sends a custom prompt to the agent based on the task's details (issue description, context, relevant project information).
  4. Agentic flow. The agent works autonomously. The session wrapper monitors the agent's output, interprets what's happening, and emits events to the bus.
  5. Completion. The agent finishes (task done, error, or killed). The session emits final state events. The runtime environment and git branch persist for review or re-use.

10.2 Session as Chat

Every session is a chat conversation. This is the universal interface.

  • The agent's output appears as messages in the chat.
  • The human can join the chat at any time to send messages, steer the agent, answer questions, or add context.
  • The orchestrator can also join to provide guidance or unblock the agent.
  • Most sessions run without external participants. The chat interface is always available regardless of whether anyone joins.
  • Chat history is persistent. The human can review what happened in any session after the fact.

10.3 Event Emission

The agent itself does not know about the Tasks event system. The session wrapper is responsible for observing the agent's behavior and emitting appropriate events.

  • The session reads the agent's output stream and interprets status.
  • Some events are direct mappings (agent produced output -> agent:message).
  • Some events are inferred from context (agent is waiting for user input with choices -> agent:question + task:state:question).
  • The session emits all events to the bus on behalf of the agent.

10.4 Agent Provider

The agent provider is an implementation detail. Claude Code is the initial provider, but the session contract is designed to be provider-agnostic.

The session needs the agent provider to support:

  • Starting a chat session with an initial prompt in a given working directory.
  • Streaming output (so the session can monitor and emit events).
  • Accepting human/orchestrator messages as chat input.
  • Being terminated gracefully.

10.5 Pause, Resume, and Interruption

For the initial implementation, the model is simple:

  • Stop mode: Running agent processes are terminated. When the system returns to Pause or Play, sessions are restarted from scratch with the full task prompt. Work in progress is lost, but the git branch and workspace persist so the agent can pick up from the repo state.
  • Human/orchestrator drops in: The message is delivered to the agent's chat as a normal chat message. The agent responds as part of its ongoing flow. If the agent has already finished, the session is restarted with context that includes the new message.
  • Future improvement: Resume agent sessions where they left off rather than restarting. This depends on agent provider support and is not required for the initial implementation.

10.6 One Session Per Task

A task has at most one active session at a time. If a task needs to be re-run (retry, restart after Stop, feedback from human), the previous session is ended and a new one is created in the same sandbox/branch.

Previous session chat history remains accessible for context and audit.

11. Workspace Management

11.1 Workspace Creation

When a session is created, it provisions an isolated workspace:

  • The workspace is created inside an isolated runtime environment provisioned by a configurable container provider. The runtime provides process isolation, filesystem isolation, and a multi-process environment for the agent and its subprocesses. See session-runtime.md for provider details.
  • The workspace gets its own copy of the repository, cloned inside the runtime environment. The copy method (shallow clone, full clone, etc.) is configurable.
  • A new git branch is created off of main (or the project's default branch) unless the task is explicitly stacking on another in-progress branch.
  • The branch is initially named with a generated ID (UUID or similar). It may be renamed once work starts or when a PR is created, to something human-readable based on the task.

11.2 Workspace Reuse

  • A workspace persists across session restarts for the same task. If a session is killed (Stop mode) and restarted, the new session reuses the existing workspace and branch — the agent starts fresh but the repo state reflects prior work.
  • One workspace per task. If a session happens to address multiple tasks, the workspace belongs to the primary task.

11.3 Cleanup

Workspaces are cleaned up when they are no longer needed:

  • PR merged: The related workspace is deleted.
  • Task completed/cancelled: The workspace is eligible for cleanup.
  • Stale/idle: Workspaces with no active session for a configurable period are eligible for cleanup.
  • Chat history and event logs are retained independently of workspace cleanup — deleting a workspace does not delete the session's history.

12. Issue Tracker Integration

12.1 GitHub as Source of Truth

GitHub Issues and PRs are the external source of work. Tasks reads from GitHub to discover and track work, but does not write back — all GitHub mutations (comments, labels, state changes, PR creation) are performed by agents working inside their sessions.

12.2 What Gets Tracked

The scheduler monitors both issues and pull requests:

  • Issues are the primary source of tasks (implement this, fix that, explore this).
  • PRs can also be a source of tasks (review this PR, resolve conflicts, fit this into the merge queue).

For each issue/PR, the scheduler reads all available fields:

  • Title, body, labels, assignees, milestone
  • Comments (full history)
  • Sub-issues / linked issues
  • Linked PRs (for issues) or linked issues (for PRs)
  • Open/closed state
  • Timestamps (created, updated)

12.3 State Mapping

Tasks owns its own internal state (Section 5.1) independently of GitHub's open/closed status.

  • A GitHub issue being open is a precondition for creating a task, but after that, task state is managed internally based on session activity, merge queue progress, etc.
  • When a GitHub issue is closed externally, the corresponding task should be cancelled or completed (depending on context).
  • Tasks does not push its internal states back to GitHub labels or issue fields. Progress is communicated through agent-authored comments and PR activity.

12.4 Discovery

The scheduler discovers new and changed work through:

  • Polling: Periodic check for new/updated issues and PRs on a configurable cadence.
  • Webhooks (optional): GitHub webhooks can push issue/PR events to the server for faster response. Polling remains as a fallback and reconciliation mechanism.

12.5 Normalization

The scheduler normalizes GitHub payloads into a stable internal model before emitting events. This keeps the rest of the system decoupled from GitHub-specific API shapes.

See github.md for the full normalized model, GraphQL query design, client API, polling interface, and testing strategy.

13. Scheduling and Dispatch

The dispatch system determines which tasks get worked on and when. The scheduler discovers work (§3.2, §12); the dispatcher decides what to run.

13.1 Dispatch Loop

The dispatcher is triggered in two ways:

Event-driven dispatch. The dispatcher evaluates immediately when any of these events fire:

  • task:created — new task is available
  • task:state:completed, task:state:failed, task:state:cancelled — a slot freed up
  • task:state:waiting — a blocked task became unblocked
  • A question-state task receives an answer (human or orchestrator message)
  • system:mode:pause, system:mode:play — mode changed to one that allows dispatch

Reconciliation tick. A periodic sweep (configurable, default 30 seconds) runs the same dispatch logic. This catches missed events, stuck states after restarts, and race conditions in event processing.

Both triggers invoke the same dispatch evaluation function. The function is idempotent — running it multiple times in quick succession is harmless.

Mode gate. Dispatch is only active in Pause and Play modes. In Stop mode, the dispatch function returns immediately. When transitioning from Stop to Pause or Play, the reconciliation tick triggers a full evaluation.

13.2 Candidate Selection

The dispatcher divides actionable tasks into two pools:

Resume candidates. Tasks with existing sessions that need re-engagement:

  • Tasks in question state that have received an answer. These already hold a session slot — resuming them is free from a concurrency perspective. The dispatcher sends the message to the existing session immediately.
  • Tasks in blocked state that have become unblocked (all blocked_by tasks are in terminal states). These transition to waiting and are dispatched with their existing workspace and branch.

Resume candidates are always processed before new work candidates. They represent in-progress work closer to completion and are cheaper to start.

New work candidates. Tasks in waiting state with no active session. These require a new session, container, and workspace. Tasks with an unclosed pull request already in the merge queue are skipped — work is already in progress on that task and should not be re-dispatched.

13.3 Prioritization

Within each pool, candidates are sorted by:

  1. Explicit priority. Lower priority number first. Tasks with null priority sort after all explicitly prioritized tasks.
  2. Unblocking value. Tasks that appear in other tasks' blocked_by lists sort before tasks that don't. This favors completing work that unblocks downstream tasks.
  3. Source order. For GitHub tasks, lower issue/PR numbers sort first (older issues before newer ones). This ensures related tasks (e.g., "Phase 1", "Phase 2", "Phase 3") are processed in logical creation order. For internal tasks without source numbers, older creation dates sort first.

The orchestrator influences dispatch indirectly by setting task priorities and creating or cancelling tasks, not by participating in the dispatch loop itself. This keeps dispatch fast and deterministic.

13.4 Concurrency Limits

Two limits control how many tasks can run simultaneously:

  • Global limit (max_sessions, required). Total active sessions across all projects. This is the primary resource constraint — each session is a container consuming host CPU and memory. Default: 5.
  • Per-project limit (max_sessions on project config, optional). Prevents one project from consuming all available slots. Defaults to 1 if unset, which allows multiple projects to make progress concurrently. The default can be overridden via TASKS_MAX_SESSIONS_PER_PROJECT.

13.5 Slot Accounting

A task holds a slot from when its session is created until the session ends:

  • running — holds a slot (agent actively working)
  • question — holds a slot (session is live, container running, agent waiting)
  • testing — holds a slot (CI may be running inside the container)
  • waiting, blocked — does not hold a slot
  • awaiting_merge, conflict — does not hold a slot (session has ended, work is complete)
  • Terminal states (completed, failed, cancelled) — does not hold a slot

The dispatcher counts slots by counting tasks in slot-holding states, not by tracking session objects. This keeps the accounting simple and derivable from task state.

13.6 Dispatch Evaluation

On each evaluation, the dispatcher:

  1. Checks mode. If Stop, return immediately.
  2. Processes all resume candidates (free — no slot cost for question answers).
  3. Counts active slots globally and per-project.
  4. Collects new work candidates, sorted by priority rules (§13.3).
  5. For each candidate in order: if both global and project slot limits have room, create a session and start the task. Otherwise, the task remains in waiting.

14. Retry and Recovery

14.1 Failure Classes

The system categorizes failures to determine the appropriate response:

Transient failures. Temporary problems likely to resolve on their own:

  • Network errors (GitHub API timeouts, DNS failures)
  • Container startup failures (resource pressure, daemon hiccups)
  • Agent process crashes (non-deterministic, may succeed on retry)

Response: Retry with exponential backoff.

Deterministic failures. Problems that will recur if retried with the same inputs:

  • Agent repeatedly fails on the same task (same error multiple times)
  • Invalid task configuration (missing repo, bad branch)
  • Authentication failures (expired token, revoked access)

Response: Mark the task as failed, emit an event, surface to the orchestrator or human. Do not retry automatically.

Infrastructure failures. The host or server itself has a problem:

  • Server crash and restart
  • Container runtime unavailable
  • Disk full

Response: Recover on restart via reconciliation (§14.3).

The system distinguishes transient from deterministic failures using a retry counter per task. If a task fails and is retried N times (configurable, default: 3) without making progress, it is reclassified as deterministic and marked failed.

"Making progress" means the agent produced commits, changed task state, or ran for longer than a minimum duration (configurable, default: 60 seconds). A task that crashes immediately on start 3 times in a row is deterministic. A task that runs for 10 minutes and then hits an edge case is still worth retrying.

14.2 Retry Behavior

Exponential backoff. When a transient failure occurs, the system retries with increasing delays:

  • Base delay: 5 seconds
  • Multiplier: 2x per attempt
  • Maximum delay: 5 minutes
  • Jitter: ±25%, deterministic (derived from task ID and retry count, preventing thundering herd while remaining predictable across dispatch evaluations)

Sequence: ~5s, ~10s, ~20s, ~40s, ~80s, capped at ~300s.

Retry scope. Retries apply at two levels:

  • API retries. GitHub API calls, container runtime commands, and other infrastructure operations retry transparently within the client code. The caller does not see transient failures unless retries are exhausted. Max attempts: 3.
  • Task retries. When an agent session fails (agent crashes, container dies), the dispatcher can restart the session for the same task. The workspace and branch persist, so the new session picks up from the repo state. Max attempts: 3 (configurable per project).

Retry state. Each task tracks:

  • retry_count (integer) — number of times this task has been retried
  • last_failure_at (timestamp or null) — when the most recent failure occurred

These fields are used by the dispatcher to calculate backoff delay. A task whose last_failure_at plus its current backoff interval is still in the future is not eligible for dispatch.

Retry vs. new session. A retry creates a new session in the existing workspace. The agent starts fresh but the repo state (commits, branch) reflects prior work. This is the same behavior as restarting after Stop mode (§10.5).

14.3 Restart Recovery

When the server restarts, it reconciles its in-memory state with persistent state:

  1. Reload task state. Tasks are persisted (event log is the source of truth). Replay each task's event stream to reconstruct current state.
  2. Detect orphaned sessions. Tasks in running or question state may have had their agent process killed by the restart. The server checks whether each session's container is still alive.
  3. Recover or fail. For tasks with dead sessions:
    • If retry_count < max retries, transition to waiting with an incremented retry count. The dispatcher picks them up on the next evaluation.
    • If retries exhausted, transition to failed.
  4. Resume the dispatch loop. The reconciliation tick fires, evaluating all waiting tasks.

Container state may persist across server restarts (the containers are independent processes). If a container is still running, the server re-attaches to its stdio and resumes the session without restarting the agent.

14.4 Failure Surfacing

When a task fails (retries exhausted or deterministic failure):

  • A task:state:failed event is emitted with failure details in the event data.
  • If the human is present, the orchestrator surfaces the failure in conversation.
  • If the human is absent, the failure is logged and visible in the UI on return.
  • The orchestrator may attempt to diagnose the failure and suggest a course of action (retry with different parameters, break the task into smaller pieces, escalate to the human).

15. Workflow Configuration

Each project can customize how tasks are handled through a workflow configuration file in the repository.

15.1 Configuration File

The workflow configuration lives at workflow.toml in the project's repository root. This file is read when the project is added to the server and can be reloaded dynamically.

[project]
max_sessions = 3                # Per-project concurrency limit (§13.4)
default_branch = "main"         # Override project default branch

[dispatch]
max_retries = 3                 # Task retry limit (§14.2)
retry_base_delay = 5            # Base backoff delay in seconds
progress_threshold = 60         # Minimum runtime (seconds) to count as "progress" (§14.1)

[labels]
# Map GitHub labels to task behavior.
# Tasks with "blocked" label start in blocked state.
# Tasks with "ignore" label are not imported.
ignore = ["wontfix", "duplicate", "ignore"]
blocked = ["blocked", "waiting-on-external"]

[prompt]
# Path to a system prompt file included in every agent session for this project.
# Relative to repo root.
system_prompt = "system-prompt.md"

15.2 Label Mapping

The [labels] section controls how GitHub labels affect task behavior:

  • ignore: Issues with any of these labels are skipped during import. The scheduler does not create tasks for them.
  • blocked: Issues with any of these labels start in blocked state instead of waiting.

15.2.1 Canonical Skip Label

The label tasks/skip is a reserved, canonical label that always causes issues and pull requests to be skipped during import, regardless of project configuration. This label is checked before the configurable ignore list and provides a consistent, cross-project mechanism to prevent specific items from becoming tasks. Use cases include:

  • Meta-issues or tracking epics that should remain open for organizational purposes
  • Issues awaiting external decisions or dependencies not suitable for agent work
  • Items temporarily excluded from agent processing without modifying workflow.toml

Labels not listed in the configuration (aside from the canonical tasks/skip label described above) have no special meaning to the dispatch system. The orchestrator and human can still use them for their own organizational purposes.

15.3 Dynamic Reload

The server watches for configuration changes:

  • When the configuration file changes (detected via polling the repo or webhook), the server reloads it and emits a system:config:reloaded event.
  • Active sessions are not affected — configuration changes apply to newly created sessions and future dispatch decisions.
  • Invalid configuration is rejected with a warning. The previous valid configuration remains in effect.

16. Prompt Construction

When a session starts, the server constructs a prompt for the agent based on the task's details and project context. The prompt is the agent's entire understanding of what it needs to do.

16.1 Prompt Structure

The prompt is assembled from several layers, concatenated in order:

  1. System prompt (project-level). The contents of the file referenced by [prompt].system_prompt in the workflow configuration (§15.1). This typically contains project conventions, coding standards, and repository-specific context. If not configured, this layer is omitted.

  2. Task description. The core of the prompt — what the agent needs to do:

    • Issue/PR title and body
    • Comments: the first 10 and last 10 comments, chronologically ordered. If there are more than 20 comments, a note is inserted between the two groups indicating how many were omitted and that the agent can use gh CLI to fetch the full history.
    • Labels and assignees
    • Sub-issues (titles and states, for context)
    • Linked PRs or issues (titles and states)
  3. Task context. Additional context the server provides:

    • Parent task details (if this is a sub-task)
    • Related task summaries (tasks in the same project that are in progress or recently completed, to help the agent avoid conflicts)
    • The git branch name and whether prior work exists on it
  4. Behavioral instructions. Instructions that control how the agent operates:

    • Commit and push work to the branch when done
    • Do not merge — the merge queue handles that
    • If stuck, describe the problem clearly so the orchestrator or human can help
    • If the task is ambiguous, ask for clarification rather than guessing

16.2 Retry and Continuation Context

When a task is being retried (§14.2), additional context is prepended:

  • A note that this is a retry, not a first attempt
  • The previous session's failure mode (crash, error message, timeout)
  • What progress was made (commits on the branch, if any)
  • Guidance to try a different approach if the previous one failed

When a task receives a human or orchestrator message while in question state, the message is delivered via the session's chat interface (§10.2), not by reconstructing the prompt.

16.3 Prompt Rendering

The prompt is rendered as plain Markdown. No template engine — the server concatenates the sections with clear headings. This keeps the system simple and the prompts inspectable.

# Project Context

{contents of system-prompt.md}

# Task

**{title}** (#{number})

{body}

## Comments

**{author}** ({timestamp}):
{comment body}

... (showing first 10 and last 10 of {total} comments — use `gh issue view {number} --comments` for full history)

**{author}** ({timestamp}):
{comment body}

## Context

- Branch: `tasks/{task-id}`
- Parent task: #{parent_number} — {parent_title}
- Related in-progress tasks: #{n1} — {title1}, #{n2} — {title2}

## Instructions

- Work on the branch `tasks/{task-id}`. Commit and push your changes when done.
- Do not merge into main. The merge queue handles merging.
- If you are stuck or the task is ambiguous, describe the problem clearly.

17. Observability

17.1 Structured Logging

All server components emit structured log entries (JSON) with consistent fields:

  • ts — timestamp
  • level — trace, debug, info, warn, error
  • component — which subsystem (scheduler, dispatcher, session, merge_queue, orchestrator)
  • task_id — if the log relates to a specific task
  • session_id — if the log relates to a specific session
  • message — human-readable description
  • data — additional structured data

Logs are written to stdout and optionally to a file. The log level is configurable at startup and can be changed at runtime.

17.2 GUI Dashboard

The web GUI (§3.1) provides a real-time view of system state:

  • System status. Current operating mode, active session count, slot utilization.
  • Task list. All tasks with current state, priority, and session status. Filterable by project, state, and label.
  • Session view. For each active session: agent output stream (live), chat history, task details, git branch status.
  • Merge queue. Pending, approved, and recently merged items. Review and approve/reject from the UI.
  • Event stream. Live feed of events across all tasks, filterable by type and task.
  • Orchestrator chat. Persistent conversation with the orchestrator.

17.3 Runtime Snapshots

The server exposes a snapshot endpoint (HTTP GET) that returns the full system state as JSON:

  • All tasks and their current states
  • All active sessions and their statuses
  • Merge queue contents
  • Current operating mode
  • Rate limit state for each project's GitHub connection
  • Slot utilization (active / max, global and per-project)

This is useful for debugging, monitoring integrations, and the GUI's initial page load.

17.4 Token and Cost Accounting

The server tracks resource consumption per task and per project:

  • Agent tokens. Input and output token counts per session, sourced from agent output parsing (agent-provider-specific). Accumulated per task and per project.
  • API calls. GitHub API calls and rate limit point consumption per project per polling cycle.
  • Session duration. Wall-clock time per session, from creation to termination.
  • Container resources. CPU and memory utilization per session (if available from the container runtime).

Accounting data is stored as events (system:accounting:*) and surfaced in the GUI dashboard. Cost estimation (mapping tokens to dollars) is not built in — the accounting provides the raw numbers, and the human can interpret them with their provider's pricing.

18. Security and Safety

18.1 Workspace Isolation

Session isolation is provided by the container runtime (session-runtime.md §2):

  • Each session runs in its own lightweight VM (apple/container).
  • Processes in one session cannot see or affect processes in another.
  • Each session has its own filesystem. No shared mounts between sessions.
  • The host filesystem is not accessible from inside containers.

18.2 Secret Handling

Secrets are injected into containers as environment variables at creation time (session-runtime.md §3.1):

  • GITHUB_TOKEN — for git operations and gh CLI.
  • Agent-provider API keys (e.g., ANTHROPIC_API_KEY).

Security properties:

  • Secrets are never written to disk inside the container (environment variables only).
  • Secrets are not included in event logs, task state, or any persisted data.
  • Secrets are not passed through the supervisor protocol — they are set at container creation and available to all processes inside the container.
  • Each project can use a different GitHub token with scoped permissions (e.g., repo-level fine-grained PAT).

Secrets are configured on the server side (environment variables, config file, or secret manager). The server reads them and passes them to the container runtime at session creation. The mechanism for configuring secrets on the server is an operational concern, not specified here.

18.3 Trust Boundaries

The system has three trust boundaries:

  1. Host ↔ Container. The container is untrusted from the host's perspective. The agent can execute arbitrary code inside the container, but cannot affect the host. Communication is limited to the supervisor protocol over stdio.

  2. Server ↔ GitHub. The server reads from GitHub using authenticated API calls. GitHub is trusted as the source of truth for issues and PRs. The server does not write to GitHub directly — all mutations happen through agents inside containers.

  3. Server ↔ Agent provider. API keys for the agent's AI provider are passed into containers. The server trusts the agent provider's API but limits exposure by scoping keys to the minimum required permissions where possible.

18.4 Agent Sandboxing

Agents run inside containers with the following constraints:

  • Network access. Agents have unrestricted network access. They need it for git operations, package installation (npm, cargo, pip), AI provider APIs, and potentially browsing documentation. Network restriction is not enforced at the container level.
  • Filesystem. Agents can read and write anywhere inside their container. The container's filesystem is ephemeral and isolated — nothing persists beyond the container's lifetime except git pushes.
  • Process execution. Agents can spawn arbitrary processes inside their container (build tools, test runners, language servers). This is required for them to do their job.
  • Resource limits. CPU and memory limits are set at container creation (session-runtime.md §2.1). Default limits are configurable per project.
  • Time limits. Sessions have a soft limit and a hard limit on wall-clock duration:
    • Soft limit (configurable, default: 1 hour). When reached, the server nudges the orchestrator or human that the session is running long. The orchestrator may intervene (provide guidance, break the task into smaller pieces) or the human may extend or steer.
    • Hard limit (soft limit + 15 minutes). If no intervention occurs after the nudge, the session is terminated and the task is retried or failed per §14.

The sandboxing model is: give agents everything they need to do their work, but contain the blast radius to a single disposable VM.

19. Reference Algorithms

19.1 Dispatch Tick

function dispatch_tick(server):
    if server.mode == Stop:
        return

    # Phase 1: Resume candidates (free — no slot cost).
    for task in server.tasks where task.state == Question:
        if task has pending message:
            send message to task.session
            set task.state = Running

    # Phase 2: Unblock tasks whose dependencies completed.
    for task in server.tasks where task.state == Blocked:
        if all tasks in task.blocked_by are terminal:
            set task.state = Waiting

    # Phase 3: Dispatch new work.
    candidates = server.tasks
        where state == Waiting
        and retry_backoff_elapsed(task)
        and not has_unclosed_pr_in_merge_queue(task)
        sorted by priority_sort(task)

    for task in candidates:
        global_slots = count(server.tasks where state in {Running, Question, Testing})
        project_slots = count(server.tasks
            where project == task.project
            and state in {Running, Question, Testing})

        if global_slots >= server.max_sessions:
            break
        if project_slots >= task.project.max_sessions:
            continue

        session = create_session(task)
        prompt = build_prompt(task)
        session.start(prompt)
        set task.state = Running
        emit task:state:running

19.2 Session Lifecycle

function create_session(task):
    container = runtime.create(task.project.image, task.project.env)
    runtime.start(container)
    transport = runtime.attach(container)

    wait for system:ready event on transport

    session = Session {
        id: new_uuid(),
        task_id: task.id,
        container: container,
        transport: transport,
        status: Ready,
    }

    task.session_id = session.id
    return session

function start_session(session, prompt):
    send start command {
        repo: session.task.project.repo_url,
        branch: session.task.branch,
        prompt: prompt,
    } over session.transport

    # Monitor agent output.
    loop:
        event = session.transport.recv()
        match event:
            agent:started -> emit task:state:running
            agent:stdout  -> emit agent:message, check for question patterns
            agent:stderr  -> log warning
            agent:exit(0) -> emit task:state:testing or task:state:awaiting_merge
            agent:exit(n) -> handle_failure(session, exit_code=n)

function handle_failure(session, exit_code):
    task = session.task
    task.retry_count += 1
    task.last_failure_at = now()

    if task.retry_count >= max_retries:
        set task.state = Failed
        emit task:state:failed
    else:
        set task.state = Waiting
        emit task:state:waiting
        # Dispatcher will pick it up after backoff.

19.3 Merge Queue Processing

The orchestrator maintains a FIFO evaluation queue. Entries are added when merge:queued events arrive and popped one at a time on a configurable interval (default 15s). Once a PR has been evaluated, it is not re-evaluated until it receives new commits.

function process_merge_queue_tick(server, eval_queue, evaluated_prs):
    if server.mode == Stop:
        return
    if server.mode == Pause:
        return  # Queue is held. Only Flush triggers processing.

    entry = eval_queue.pop_front()
    if entry is None:
        return
    if entry.pr_url in evaluated_prs:
        return  # Already evaluated this commit

    # Play mode: orchestrator has merge authority.
    evaluation = orchestrator.evaluate(entry.task)
    evaluated_prs.add(entry.pr_url)  # Don't re-evaluate until new commits

        if evaluation.approved:
            entry.status = Approved
            emit merge:approved

            conflict = check_merge_conflicts(entry)
            if conflict:
                entry.status = Conflict
                entry.task.state = Conflict
                emit merge:conflict
                continue

            perform_merge(entry)
            entry.status = Merged
            entry.task.state = Completed
            emit merge:completed
            emit task:state:completed
        else:
            entry.status = Rejected
            emit merge:rejected
            # Send task back to implementor with feedback.
            restart_with_feedback(entry.task, evaluation.feedback)

function flush_merge_queue(server):
    # Only callable in Pause mode.
    for entry in server.merge_queue where status == Approved:
        conflict = check_merge_conflicts(entry)
        if conflict:
            entry.status = Conflict
            entry.task.state = Conflict
            emit merge:conflict
            continue

        perform_merge(entry)
        entry.status = Merged
        entry.task.state = Completed
        emit merge:completed
        emit task:state:completed

    emit system:flush

19.4 Event Routing

function publish(bus, event):
    # Persist to task-specific log.
    bus.store.append(event.task, event)

    # Broadcast to live subscribers.
    for subscriber in bus.subscribers:
        if matches(subscriber.pattern, event.type)
           and matches(subscriber.task_filter, event.task):
            subscriber.send(event)

function matches(pattern, event_type):
    # Colon-delimited pattern matching with wildcard support.
    pattern_parts = pattern.split(":")
    type_parts = event_type.split(":")

    for i in 0..pattern_parts.len():
        if pattern_parts[i] == "*":
            return true  # Wildcard matches all remaining segments.
        if i >= type_parts.len():
            return false
        if pattern_parts[i] != type_parts[i]:
            return false

    return pattern_parts.len() == type_parts.len()

20. Test and Validation Matrix

20.1 Unit Tests

Each crate has unit tests covering its core logic in isolation:

CrateCoverage
eventsEvent serialization, pattern matching, wildcards, store append/read, bus publish/subscribe/replay
githubResponse normalization, GraphQL response deserialization, rate limit parsing, pagination cursor handling, filter construction
runtimeProtocol codec (encode/decode/partial lines), command/event serialization
serverMode transitions (all actor/direction combinations), task state transitions, merge queue operations (enqueue/approve/reject/flush/conflict/cleanup), presence tracking, slot accounting

20.2 Mock Integration Tests

Tests that use mock servers or in-process fakes to test cross-component behavior:

ComponentCoverage
GitHub clientList/get issues and PRs, pagination, nested comment/review fetching, error handling (auth, not found, GraphQL errors, rate limiting), since-based filtering
PollerHigh-water mark advancement, failure recovery (mark not advanced), empty poll stability
DispatcherCandidate selection (resume vs new), priority sorting, concurrency enforcement (global and per-project), mode gating, backoff eligibility
Merge queueMode-dependent behavior (Stop/Pause/Play), flush in Pause, conflict detection, rejection with feedback

20.3 Container Integration Tests

Tests that exercise the full session lifecycle with real containers. These are slower and require the container runtime to be available.

TestWhat it validates
Session startContainer creation, supervisor ready, agent launch
Agent executionSend prompt, receive output, agent exits cleanly
Chat injectionSend message to running agent, receive response
Exec commandRun command inside container, receive result
Session restartStop agent, restart in same workspace, verify repo state persists
Session cleanupDestroy container, verify resources released

These tests use a mock agent (simple echo process) to avoid depending on a real AI provider. The existing verify.ts script (§session-runtime.md) is the foundation for these tests.

20.4 End-to-End Tests

Full system tests that exercise the platform from issue discovery to merge:

TestWhat it validates
Happy pathIssue created → task dispatched → agent completes → merge queue → merged
Question flowAgent asks question → human answers → agent resumes → completes
Retry on failureAgent crashes → task retried → succeeds on second attempt
ConcurrencyMultiple tasks dispatched up to limit, excess tasks wait
Mode transitionsStop halts agents, Pause holds merges, Play resumes everything
Conflict resolutionTwo tasks complete, second has conflict, gets re-engaged
Priority orderingHigher-priority task dispatched before lower-priority

End-to-end tests use a fixture GitHub repository (or mock server) and a mock agent. They exercise the full dispatch → session → merge pipeline.

20.5 Test Environment

  • Unit and mock tests: Run with cargo test and require no external dependencies.
  • Container tests: Require the container runtime (container CLI) and a pre-built base image. Gated behind a --features container flag.
  • End-to-end tests: Require container runtime and optionally a GitHub token for live API tests. Gated behind --features e2e.
  • GitHub integration tests: Require a GITHUB_TOKEN and a fixture repository. Gated behind --features integration.

21. Implementation Checklist

A conforming implementation must satisfy all of the following:

21.1 Core Platform

  • Server starts, tracks mode (Stop/Pause/Play), and enforces transition rules
  • Event system: append-only log with per-task storage, pub/sub with pattern matching
  • Persistent storage: embedded database for structured state, JSONL for event log (§3.5)
  • Human presence tracking based on active GUI connections
  • Multi-project support with per-project configuration

21.2 GitHub Integration

  • GraphQL client fetches issues and PRs with full metadata
  • Normalized model decoupled from GitHub API shapes
  • Polling with high-water mark for incremental discovery
  • Rate limit tracking and backoff

21.3 Scheduling and Dispatch

  • Event-driven dispatch with reconciliation tick
  • Candidate selection: resume candidates before new work
  • Priority sorting: explicit priority → unblocking value → recency
  • Global and per-project concurrency limits with slot accounting
  • Mode-gated dispatch (no dispatch in Stop)

21.4 Sessions and Agent Runner

  • Container lifecycle: create, start, attach, stop, destroy
  • Supervisor protocol: start, chat, stop, exec commands; all event types
  • Session lifecycle: creation → ready → running → ended
  • Chat injection from human and orchestrator
  • Workspace persistence across session restarts
  • Session soft/hard time limits with escalation nudge

21.5 Merge Queue

  • Queue entry lifecycle: pending → approved/rejected → merged/conflict
  • Mode-dependent merge authority (Stop: held, Pause: held with flush, Play: orchestrator)
  • Conflict detection and re-engagement
  • Quality evaluation by orchestrator before queuing

21.6 Reflections (Phase 1)

  • Reflection label filtering: issues with reflection label excluded from task import
  • UI Reflections tab displaying filtered reflection issues
  • Reflection events: reflection:created, reflection:comment, reflection:closed

21.7 Retry and Recovery

  • Failure classification (transient vs deterministic)
  • Exponential backoff with jitter
  • Progress detection to distinguish transient from deterministic failures
  • Server restart recovery: state reconstruction from event log, orphaned session detection
  • Failure surfacing to orchestrator and human

21.8 Prompt Construction

  • Layered prompt assembly: system prompt, task description, context, instructions
  • Retry context for failed tasks
  • Project-level system prompt from workflow configuration

21.9 Observability

  • Structured JSON logging with consistent fields
  • Runtime snapshot endpoint (full system state as JSON)
  • Token and cost accounting per task and per project
  • GUI dashboard with live task list, session view, merge queue, event stream

21.10 Security

  • Session isolation via container runtime
  • Secret injection via environment variables (not persisted in logs or state)
  • Session time limits

Session Runtime

Status: Draft

This document specifies the session runtime architecture — how the platform creates isolated execution environments, provisions workspaces, and manages agent processes. It is a companion to the main spec (spec.md §10 Sessions and Agent Runner, §11 Workspace Management, §18 Security and Safety).

1. Overview

The session runtime provides the execution environment for agent sessions. Each session runs in an isolated container with its own filesystem, processes, and network namespace. Inside the container, a supervisor process (PID 1) manages the agent lifecycle and bridges communication with the host.

Host (Rust server)
  └── Session Manager (spec.md §10)
        │
        ├── creates containers via ContainerRuntime trait
        ├── communicates via JSON-line protocol over stdio
        │
        └── Container (apple/container lightweight VM)
              └── Supervisor (PID 1)
                    ├── clones repo and sets up workspace
                    ├── starts agent process
                    ├── forwards agent I/O as protocol events
                    └── handles stop/chat commands from host

The runtime crate (crates/runtime/) implements the host side. The supervisor binary (crates/supervisor/) runs inside containers.

2. Container Provider

2.1 Isolation Model

Sessions run in lightweight Linux VMs using apple/container. Each container provides:

  • Process isolation. Processes in one session cannot see or affect processes in another.
  • Filesystem isolation. Each container has its own root filesystem. No shared mounts between sessions. The host filesystem is not accessible from inside containers.
  • Network namespace. Containers have their own network stack with unrestricted outbound access (required for git, package managers, AI provider APIs).

2.2 Resource Limits

Containers can be configured with resource limits at creation time:

  • cpus — CPU core limit (fractional, e.g., 2.0 for 2 cores)
  • memory — Memory limit (e.g., "8G")
  • dns — DNS server (defaults to 8.8.8.8)

These are passed to the container create command and enforced by the VM runtime.

2.3 Container Lifecycle

The ContainerRuntime trait defines the container lifecycle operations:

#![allow(unused)]
fn main() {
trait ContainerRuntime {
    async fn create(&self, config: &ContainerConfig) -> Result<String, ContainerError>;
    async fn start(&self, container_id: &str) -> Result<StdioTransport, ContainerError>;
    async fn stop(&self, container_id: &str) -> Result<(), ContainerError>;
    async fn destroy(&self, container_id: &str) -> Result<(), ContainerError>;
    async fn container_exists(&self, container_id: &str) -> Result<bool, ContainerError>;
}
}
  1. Create. Allocates a container with the given config. Returns a container ID.
  2. Start. Boots the container and attaches to its stdio. Returns a transport handle.
  3. Stop. Sends shutdown signal to the container.
  4. Destroy. Removes the container and cleans up resources.
  5. Exists. Checks if a container exists (for restart recovery, spec.md §14.3).

3. Workspace Provisioning

3.1 Secret Injection

Secrets are injected as environment variables at container creation time. They are available to all processes inside the container but are never written to disk.

Required secrets:

  • GITHUB_TOKEN — Used for git clone/push operations and gh CLI. Embedded in HTTPS URLs for authentication.
  • ANTHROPIC_API_KEY — Agent provider API key.

Optional configuration:

  • AGENT_CMD — Command to run the agent (default: claude)
  • AGENT_ARGS — Arguments for the agent command
  • AGENT_USER — Non-root user to run the agent (default: agent)

3.2 Repository Setup

When the supervisor receives a start command, it provisions the workspace:

  1. Git configuration. Creates .gitconfig for both root and agent users with:

    • User identity (tasks@localhost)
    • Safe directory setting for /workspace
  2. Clone. Clones the repository to /workspace. The GITHUB_TOKEN is embedded in the clone URL for HTTPS authentication:

    https://x-access-token:{token}@github.com/owner/repo
    
  3. Branch. Checks out (or creates) the specified branch.

  4. Ownership. Changes ownership of /workspace to the agent user (clone runs as root).

If the workspace already contains a git repository (workspace reuse), the clone step is skipped.

4. Supervisor Protocol

The supervisor communicates with the host via a JSON-line protocol over stdio. Commands flow host→supervisor on stdin; events flow supervisor→host on stdout.

4.1 Commands (Host → Supervisor)

Commands are single-line JSON objects with a cmd discriminator:

start — Initialize workspace and start the agent.

{"cmd": "start", "repo": "owner/repo", "branch": "task-123", "prompt": "Implement feature X..."}

chat — Send a message to the running agent.

{"cmd": "chat", "text": "Can you also add tests?"}

stop — Gracefully stop the agent process.

{"cmd": "stop"}

exec — Execute an arbitrary command in the container (for debugging/inspection).

{"cmd": "exec", "id": "req-1", "argv": ["git", "status"]}

4.2 Events (Supervisor → Host)

Events are single-line JSON objects with an ev discriminator:

system:ready — Supervisor is initialized and ready to accept commands.

{"ev": "system:ready"}

agent:started — Agent process has been spawned.

{"ev": "agent:started", "pid": 1234}

agent:stdout — Agent wrote to stdout.

{"ev": "agent:stdout", "data": "Analyzing codebase..."}

agent:stderr — Agent wrote to stderr.

{"ev": "agent:stderr", "data": "[debug] loading config"}

agent:exit — Agent process exited.

{"ev": "agent:exit", "code": 0, "signal": null}

exec:result — Result of an exec command.

{"ev": "exec:result", "id": "req-1", "code": 0, "stdout": "On branch main...", "stderr": ""}

4.3 Stream Conventions

  • stdout is reserved for protocol events. One JSON object per line, newline-delimited.
  • stderr is for supervisor diagnostic logging (prefixed with [supervisor]).
  • Agent stdout/stderr are captured and forwarded as agent:stdout/agent:stderr events.

4.4 Agent Process Management

The supervisor manages the agent process lifecycle:

  1. Start. On receiving start, the supervisor:

    • Provisions the workspace (§3.2)
    • Spawns the agent via sudo --preserve-env -u agent (Claude Code refuses root)
    • Emits agent:started with the PID
  2. I/O forwarding. Agent stdout/stderr are read line-by-line and forwarded as events.

  3. Chat injection. On receiving chat, the supervisor writes the text to the agent's stdin.

  4. Graceful stop. On receiving stop:

    • Sends SIGTERM to the agent process
    • Waits up to 5 seconds for graceful exit
    • Sends SIGKILL if the process doesn't terminate
    • Emits agent:exit with the final status

5. Session Lifecycle

The full session flow, from the host perspective:

  1. Create container. Host calls runtime.create(config) with the container image and environment variables (secrets, agent config).

  2. Start container. Host calls runtime.start(container_id), which boots the container and returns a transport for stdio communication.

  3. Wait for ready. Host waits for system:ready event indicating the supervisor is initialized.

  4. Start agent. Host sends start command with repo URL, branch name, and initial prompt.

  5. Monitor events. Host processes events:

    • agent:started — agent is running
    • agent:stdout/agent:stderr — forward to session chat, emit to event bus
    • agent:exit — session completed
  6. Human/orchestrator interaction. If a participant sends a chat message, the host sends a chat command to inject it into the agent's stdin.

  7. Stop session. When the session needs to end (mode change to Stop, task cancelled):

    • Host sends stop command
    • Waits for agent:exit
    • Calls runtime.stop() then runtime.destroy()

6. Container Image

The container image is built with container build (apple/container CLI), not Docker. The image includes:

  • Base Linux environment (Debian-based)
  • Git, curl, common build tools
  • Node.js, Python, Rust (language runtimes)
  • The claude CLI (agent provider)
  • A non-root agent user for running the agent
  • The supervisor binary (cross-compiled from crates/supervisor/)

Build command:

make container-image   # cross-compile supervisor + build image

The supervisor is cross-compiled on the host for aarch64-unknown-linux-gnu to avoid building inside Docker. See CLAUDE.md for toolchain prerequisites.

GitHub Integration

Status: Draft

This document specifies the GitHub integration layer — how the platform discovers, fetches, and normalizes work from GitHub. It is a companion to the main spec (spec.md §12 Issue Tracker Integration, §3.2 Scheduler).

1. Overview

The GitHub crate is the platform's interface to GitHub. It fetches issues, pull requests, and their associated metadata from GitHub's GraphQL API, normalizes them into a stable internal model, and provides a polling mechanism for discovering new and changed work.

The crate also provides write operations for the orchestrator and server to interact with GitHub directly (posting comments, updating issues, adding labels, merging PRs, managing branches). Agents working inside sessions may also use the gh CLI with credentials injected into the container environment (session-runtime.md §3.1).

Server
  └── Scheduler (spec.md §3.2)
        │
        ├── uses GitHubClient to poll for changes
        ├── normalizes responses into internal model
        ├── emits events to the event bus
        │
        └── GitHubClient (this crate)
              ├── GraphQL queries against api.github.com
              ├── rate limit tracking
              └── pagination handling

The crate is consumed by the scheduler but is otherwise independent — it has no dependency on the event system, server, or runtime crates.

2. Normalized Model

The crate normalizes GitHub's API responses into a stable internal model. The rest of the system works with these types, never with raw GitHub API shapes.

2.1 Issue

  • owner (string) — repository owner
  • repo (string) — repository name
  • number (u64) — issue number
  • node_id (string) — GitHub's global GraphQL node ID (used for pagination and cross-references)
  • title (string)
  • body (string or null)
  • state (enum) — Open, Closed
  • state_reason (enum or null) — Completed, NotPlanned, Reopened (GitHub's close reason)
  • labels (list of Label)
  • assignees (list of User)
  • milestone (Milestone or null)
  • comments (list of Comment) — full comment history, ordered chronologically
  • parent (ParentIssueRef or null) — parent issue if this is a sub-issue
  • sub_issues (list of SubIssueRef) — issues linked as sub-issues via GitHub's sub-issue feature
  • blocked_by (list of BlockingIssueRef) — issues that block this one (must be resolved before this issue can be worked on)
  • linked_pull_requests (list of LinkedPR) — PRs that reference this issue (via closing keywords or manual links)
  • author (User)
  • created_at (timestamp)
  • updated_at (timestamp)
  • closed_at (timestamp or null)

2.2 Pull Request

  • owner (string) — repository owner
  • repo (string) — repository name
  • number (u64) — PR number
  • node_id (string)
  • title (string)
  • body (string or null)
  • state (enum) — Open, Closed, Merged
  • head_ref (string) — source branch name
  • head_sha (string) — current head commit SHA
  • base_ref (string) — target branch name
  • is_draft (bool)
  • mergeable (enum or null) — Mergeable, Conflicting, Unknown (GitHub may not have computed this yet)
  • labels (list of Label)
  • assignees (list of User)
  • review_decision (enum or null) — Approved, ChangesRequested, ReviewRequired
  • reviews (list of Review) — all reviews, ordered chronologically
  • comments (list of Comment) — issue-level comments (not review comments)
  • linked_issues (list of LinkedIssueRef) — issues this PR closes/references
  • author (User)
  • created_at (timestamp)
  • updated_at (timestamp)
  • closed_at (timestamp or null)
  • merged_at (timestamp or null)

2.3 Supporting Types

Label: name (string), color (string)

User: login (string), node_id (string)

Milestone: title (string), number (u64), state (Open | Closed)

Comment: id (string), author (User), body (string), created_at (timestamp), updated_at (timestamp)

Review: id (string), author (User), state (Approved | ChangesRequested | Commented | Dismissed), body (string or null), submitted_at (timestamp)

ParentIssueRef: number (u64), title (string), state (Open | Closed), node_id (string)

SubIssueRef: number (u64), title (string), state (Open | Closed), node_id (string)

BlockingIssueRef: owner (string), repo (string), number (u64), title (string), state (Open | Closed), node_id (string)

LinkedPR: number (u64), title (string), state (Open | Closed | Merged), node_id (string)

LinkedIssueRef: number (u64), title (string), state (Open | Closed), node_id (string)

3. GraphQL Queries

All data is fetched via GitHub's GraphQL API (POST https://api.github.com/graphql). The crate provides three query categories.

3.1 Repository Issues

Fetches issues for a repository, with filtering and pagination.

Parameters:

  • owner, repo — repository identifier
  • states (optional) — filter by Open, Closed, or both. Default: Open only (but see §5.5 — the RepoPoller fetches all states to detect external closures)
  • labels (optional) — filter to issues with any of these labels
  • since (optional) — only issues updated after this timestamp (for polling)
  • first / after (optional) — cursor-based pagination

Returns: Paginated list of Issue (§2.1), including all nested fields (comments, labels, assignees, sub-issues, linked PRs) in a single query.

3.2 Repository Pull Requests

Fetches PRs for a repository, with filtering and pagination.

Parameters:

  • owner, repo — repository identifier
  • states (optional) — filter by Open, Closed, Merged. Default: Open only (but see §5.5 — the RepoPoller fetches all states to detect external closures)
  • since (optional) — only PRs updated after this timestamp
  • first / after (optional) — cursor-based pagination

Returns: Paginated list of PullRequest (§2.2), including reviews, comments, and linked issues in a single query.

3.3 Single Item Fetch

Fetches a single issue or PR by number, with full detail.

Parameters:

  • owner, repo, number

Returns: Issue or PullRequest with all fields populated.

This is used when the scheduler needs to refresh a specific item (e.g., after an event indicates it changed, or when fetching a linked issue referenced by another item).

3.4 Pagination

GitHub GraphQL uses cursor-based pagination. The crate handles this internally:

  • Each query requests up to 100 items per page (GitHub's maximum).
  • Comments and reviews are paginated within each item — the crate fetches all pages for these nested connections automatically.
  • The client exposes a stream/iterator interface so callers don't manage cursors directly.
  • A configurable maximum page limit prevents runaway queries on repositories with thousands of issues (default: 10 pages = 1000 items).

3.5 Rate Limiting

GitHub's GraphQL API has a point-based rate limit (typically 5,000 points per hour). Each query costs a variable number of points depending on the fields and pagination depth requested.

The client tracks rate limit state from response headers:

  • x-ratelimit-remaining — points remaining
  • x-ratelimit-reset — when the budget resets

Behavior:

  • If remaining points drop below a configurable threshold (default: 200), the client pauses requests and waits until the reset window.
  • Rate limit state is exposed to callers so the scheduler can adjust its polling cadence.
  • If a request receives a 403 with rate limit exceeded, the client waits for the reset time and retries once.

4. Client API

The GitHubClient is the public interface to the crate. It is a thin async wrapper around the GraphQL queries, rate limit tracking, and response normalization.

4.1 Construction

GitHubClient::new(token: String) -> GitHubClient

Takes a personal access token. The token is sent as Authorization: Bearer {token} on all requests. The client holds a single reqwest::Client internally for connection pooling.

An optional builder allows overriding:

  • base_url — for GitHub Enterprise or testing against a mock server (default: https://api.github.com)
  • max_pages — pagination limit (default: 10)
  • rate_limit_floor — minimum remaining points before pausing (default: 200)

4.2 Methods

Issues:

  • list_issues(owner, repo, filters) -> Result<Vec<Issue>> — paginated, returns all pages up to limit
  • get_issue(owner, repo, number) -> Result<Issue> — single issue with full detail

Pull Requests:

  • list_pull_requests(owner, repo, filters) -> Result<Vec<PullRequest>> — paginated
  • get_pull_request(owner, repo, number) -> Result<PullRequest> — single PR with full detail

Rate Limit:

  • rate_limit() -> RateLimit — current rate limit state (remaining points, reset time)

4.3 Filters

IssueFilters {
    states: Option<Vec<IssueState>>,
    labels: Option<Vec<String>>,
    since: Option<DateTime<Utc>>,
}

PullRequestFilters {
    states: Option<Vec<PullRequestState>>,
    since: Option<DateTime<Utc>>,
}

4.4 Errors

GitHubError {
    Auth            — 401, bad or expired token
    NotFound        — issue/PR/repo doesn't exist
    RateLimited     — rate limit exceeded after retry
    GraphQL(Vec)    — GitHub returned GraphQL-level errors
    Network         — connection/timeout failures
    Decode          — response didn't match expected shape
}

5. Polling and Discovery

The crate provides a higher-level polling interface on top of the raw client. This is what the scheduler (spec.md §3.2) uses to discover new and changed work.

5.1 Repository Poller

The RepoPoller tracks the last-seen updated_at timestamp per repository and fetches only items that changed since the last poll.

RepoPoller::new(client: GitHubClient, owner: String, repo: String) -> RepoPoller

Methods:

  • poll() -> Result<PollResult> — fetches issues and PRs updated since the last successful poll. On first call, fetches all open items.
  • poll_issues() -> Result<Vec<Issue>> — issues only
  • poll_pull_requests() -> Result<Vec<PullRequest>> — PRs only

PollResult:

  • issues — list of new or updated issues
  • pull_requests — list of new or updated PRs
  • timestamp — the updated_at high-water mark from this poll (used as since on the next call)
  • rate_limit — rate limit state after this poll

Merge queue population: The scheduler uses pull_requests from the poll result to populate the merge queue. PRs that are open and not drafts are added as merge queue entries. See spec.md §7.0 for full eligibility criteria. This happens automatically on each poll cycle — the GitHub crate does not filter PRs for merge queue purposes; it returns all PRs matching the query filters, and the scheduler applies the merge queue eligibility rules.

5.2 Change Detection

The poller returns all items updated since the last poll. It is the caller's (scheduler's) responsibility to determine what changed — the poller does not diff against previous state.

This is intentional. The scheduler already maintains task state and is the right place to compare incoming GitHub state against internal state. The poller is a data-fetching layer, not a state machine.

5.3 High-Water Mark

The poller tracks a single since timestamp per repository:

  • After a successful poll, since advances to the maximum updated_at across all returned items.
  • If a poll fails, since is not advanced — the next poll retries the same window.
  • The timestamp is held in memory. If the server restarts, the first poll after restart fetches all open items (equivalent to a cold start). Persisting the high-water mark is a future optimization.

5.4 Polling Cadence

The poller does not own its own timer. The scheduler calls poll() on whatever cadence it chooses (spec.md §3.2 says configurable). This keeps the crate free of tokio::time dependencies and scheduling opinions.

5.5 State Filtering for Closure Detection

Although the raw GraphQL queries (§3.1, §3.2) default to fetching only open items, the RepoPoller intentionally fetches all states (Open and Closed for issues; Open, Closed, and Merged for PRs) when polling.

This is necessary to detect external closures (spec.md §12.3). When an issue or PR is closed externally (by a human or another automation), its updated_at timestamp changes. By including closed/merged items in the query with a since filter, the poller sees these state changes and can report them to the scheduler.

Without this behavior, externally closed items would disappear from poll results entirely — the scheduler would never learn that they closed, and the corresponding tasks would remain in stale states.

Implementation note: The high-water mark (§5.3) ensures that each closed item is only returned once — in the first poll after its updated_at changes. Subsequent polls will have a since value newer than the closed item's timestamp, so it won't appear again.

6. Testing

6.1 Unit Tests

Normalization tests. Given raw GraphQL JSON responses (captured from real API calls or hand-written), verify that normalization produces the correct model structs. These tests exercise the deserialization and mapping logic without making network calls. Cover:

  • Issues with all fields populated
  • Issues with null/missing optional fields (no milestone, no assignees, closed without reason)
  • PRs in each state (open, closed, merged) with varying mergeable/review states
  • Nested pagination (issue with >100 comments)
  • Sub-issues and linked PRs/issues
  • Malformed or unexpected fields (should produce Decode errors, not panics)

Rate limit tracking tests. Verify that rate limit state is correctly parsed from response headers and that the floor threshold triggers waiting behavior.

Pagination tests. Verify cursor handling across multiple pages, including the stop condition when has_next_page is false or the page limit is reached.

Filter construction tests. Verify that IssueFilters and PullRequestFilters produce the correct GraphQL query variables.

6.2 Integration Tests

Integration tests run against a real GitHub API. They are gated behind a feature flag (--features integration) and require a GITHUB_TOKEN environment variable.

Target repository: Tests run against a public fixture repository (e.g., tasks-test/fixture) with known issues, PRs, comments, and labels. The fixture repo is set up once and not modified by tests — all operations are reads.

Tests:

  • Fetch a known issue by number and verify all fields
  • Fetch a known PR by number and verify all fields (including reviews)
  • List open issues with label filter
  • List open PRs
  • Pagination across multiple pages (fixture repo needs enough issues)
  • since filter returns only recently updated items
  • Rate limit state is populated after a request
  • Bad token returns Auth error
  • Nonexistent repo returns NotFound error

6.3 Mock Server Tests

For testing polling behavior and error handling without depending on GitHub uptime or rate limits, the crate includes tests that run a local HTTP server (using wiremock or similar) serving canned GraphQL responses.

Tests:

  • RepoPoller advances high-water mark after successful poll
  • RepoPoller does not advance after failed poll
  • Rate limit floor triggers wait behavior
  • 403 rate-limit response triggers retry-after-reset
  • Network timeout produces Network error
  • GraphQL error response produces GraphQL error

7. Open Questions

  • Webhook support. The spec (§11.4) mentions optional webhook push notifications. This crate covers the polling path. Webhook ingestion may be a separate module in the server crate, since it requires an HTTP endpoint and ties into the server's request handling.
  • GraphQL schema changes. GitHub evolves its GraphQL schema. Sub-issues in particular are relatively new. The normalization layer should degrade gracefully if a field is absent from the response.
  • Nested pagination limits. An issue with thousands of comments would require many nested pagination calls. A practical limit on nested page depth (e.g., 10 pages = 1000 comments) may be needed.

Specification Contribution Guide

This document defines the conventions for writing and maintaining specification documents in this directory.

Document Types

Main Specification (spec.md)

The platform specification covering the overall architecture, domain model, operating modes, and high-level behavior. This is the primary reference for understanding the system.

Detail Specifications

Companion documents that expand on specific subsystems in implementation-focused detail. These are for readers implementing or deeply understanding a particular component.

Current detail specs:

  • github.md — GitHub integration (polling, GraphQL, normalized model)
  • session-runtime.md — Container provider, supervisor protocol, workspace provisioning

Document Format

Required Header

Every specification document must begin with:

# [Title]

Status: Draft | Review | Stable

[One-paragraph purpose statement explaining what this document covers and its relationship
to the main spec.]

Status values:

  • Draft — Work in progress, may change significantly
  • Review — Ready for review, expected to stabilize
  • Stable — Finalized, changes require versioning

Section Numbering

Use numbered sections for easy cross-referencing:

## 1. First Section

### 1.1 Subsection

### 1.2 Another Subsection

## 2. Second Section

Table of Contents

Documents over 200 lines should include a table of contents after the header.

Cross-Referencing

Within the Same Document

Reference sections by number: (see §3.2) or as described in §1.

Between Documents

Reference other specs with filename and section: (see github.md §2.1) or (spec.md §10 Sessions).

Back-References

Detail specs should note which main spec sections they expand in their overview:

This document specifies [...]. It is a companion to the main spec (spec.md §10 Sessions
and Agent Runner, §11 Workspace Management).

Code References

When spec sections are referenced in code, use the format:

#![allow(unused)]
fn main() {
// See spec/session-runtime.md §4.1
}

This allows searching the codebase for spec references and identifying drift.

Keeping Specs in Sync

Adding New Sections

  1. Add the section to the appropriate spec document
  2. Update cross-references in other specs if needed
  3. Update SUMMARY.md if adding a new document
  4. Search for existing references to ensure consistency

Removing or Moving Sections

  1. Search for references: rg "spec-name.md §N"
  2. Update all references before removing/moving
  3. Consider leaving a note: [Section moved to other-spec.md §M]

Code-Spec Alignment

For protocol definitions and API schemas, keep the spec and code in sync:

  • Protocol types should match between spec and crates/runtime/src/protocol/
  • Update both when making changes

File Organization

spec/
├── README.md          # Overview and document index
├── SUMMARY.md         # mdBook-style table of contents
├── CONTRIBUTING.md    # This file
├── CHANGELOG.md       # Version history
├── VERSION            # Current version number
├── spec.md            # Main platform specification
├── github.md          # GitHub integration detail spec
└── session-runtime.md # Session runtime detail spec

Changelog

All notable changes to the Tasks specification will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.1.0] - 2026-03-17

Added

  • Initial specification release
  • Tasks Platform Specification (spec.md)
  • GitHub Integration documentation (github.md)
  • GitHub Pages hosting with versioning support