GitHub Integration
Status: Draft
This document specifies the GitHub integration layer — how the platform discovers, fetches, and normalizes work from GitHub. It is a companion to the main spec (spec.md §12 Issue Tracker Integration, §3.2 Scheduler).
1. Overview
The GitHub crate is the platform's interface to GitHub. It fetches issues, pull requests, and their associated metadata from GitHub's GraphQL API, normalizes them into a stable internal model, and provides a polling mechanism for discovering new and changed work.
The crate also provides write operations for the orchestrator and server to interact with GitHub
directly (posting comments, updating issues, adding labels, merging PRs, managing branches). Agents
working inside sessions may also use the gh CLI with credentials injected into the container
environment (session-runtime.md §3.1).
Server
└── Scheduler (spec.md §3.2)
│
├── uses GitHubClient to poll for changes
├── normalizes responses into internal model
├── emits events to the event bus
│
└── GitHubClient (this crate)
├── GraphQL queries against api.github.com
├── rate limit tracking
└── pagination handling
The crate is consumed by the scheduler but is otherwise independent — it has no dependency on the event system, server, or runtime crates.
2. Normalized Model
The crate normalizes GitHub's API responses into a stable internal model. The rest of the system works with these types, never with raw GitHub API shapes.
2.1 Issue
owner(string) — repository ownerrepo(string) — repository namenumber(u64) — issue numbernode_id(string) — GitHub's global GraphQL node ID (used for pagination and cross-references)title(string)body(string or null)state(enum) — Open, Closedstate_reason(enum or null) — Completed, NotPlanned, Reopened (GitHub's close reason)labels(list of Label)assignees(list of User)milestone(Milestone or null)comments(list of Comment) — full comment history, ordered chronologicallyparent(ParentIssueRef or null) — parent issue if this is a sub-issuesub_issues(list of SubIssueRef) — issues linked as sub-issues via GitHub's sub-issue featureblocked_by(list of BlockingIssueRef) — issues that block this one (must be resolved before this issue can be worked on)linked_pull_requests(list of LinkedPR) — PRs that reference this issue (via closing keywords or manual links)author(User)created_at(timestamp)updated_at(timestamp)closed_at(timestamp or null)
2.2 Pull Request
owner(string) — repository ownerrepo(string) — repository namenumber(u64) — PR numbernode_id(string)title(string)body(string or null)state(enum) — Open, Closed, Mergedhead_ref(string) — source branch namehead_sha(string) — current head commit SHAbase_ref(string) — target branch nameis_draft(bool)mergeable(enum or null) — Mergeable, Conflicting, Unknown (GitHub may not have computed this yet)labels(list of Label)assignees(list of User)review_decision(enum or null) — Approved, ChangesRequested, ReviewRequiredreviews(list of Review) — all reviews, ordered chronologicallycomments(list of Comment) — issue-level comments (not review comments)linked_issues(list of LinkedIssueRef) — issues this PR closes/referencesauthor(User)created_at(timestamp)updated_at(timestamp)closed_at(timestamp or null)merged_at(timestamp or null)
2.3 Supporting Types
Label: name (string), color (string)
User: login (string), node_id (string)
Milestone: title (string), number (u64), state (Open | Closed)
Comment: id (string), author (User), body (string), created_at (timestamp),
updated_at (timestamp)
Review: id (string), author (User), state (Approved | ChangesRequested | Commented |
Dismissed), body (string or null), submitted_at (timestamp)
ParentIssueRef: number (u64), title (string), state (Open | Closed), node_id (string)
SubIssueRef: number (u64), title (string), state (Open | Closed), node_id (string)
BlockingIssueRef: owner (string), repo (string), number (u64), title (string), state (Open | Closed), node_id (string)
LinkedPR: number (u64), title (string), state (Open | Closed | Merged),
node_id (string)
LinkedIssueRef: number (u64), title (string), state (Open | Closed),
node_id (string)
3. GraphQL Queries
All data is fetched via GitHub's GraphQL API (POST https://api.github.com/graphql). The crate
provides three query categories.
3.1 Repository Issues
Fetches issues for a repository, with filtering and pagination.
Parameters:
owner,repo— repository identifierstates(optional) — filter by Open, Closed, or both. Default: Open only (but see §5.5 — theRepoPollerfetches all states to detect external closures)labels(optional) — filter to issues with any of these labelssince(optional) — only issues updated after this timestamp (for polling)first/after(optional) — cursor-based pagination
Returns: Paginated list of Issue (§2.1), including all nested fields (comments, labels, assignees, sub-issues, linked PRs) in a single query.
3.2 Repository Pull Requests
Fetches PRs for a repository, with filtering and pagination.
Parameters:
owner,repo— repository identifierstates(optional) — filter by Open, Closed, Merged. Default: Open only (but see §5.5 — theRepoPollerfetches all states to detect external closures)since(optional) — only PRs updated after this timestampfirst/after(optional) — cursor-based pagination
Returns: Paginated list of PullRequest (§2.2), including reviews, comments, and linked issues in a single query.
3.3 Single Item Fetch
Fetches a single issue or PR by number, with full detail.
Parameters:
owner,repo,number
Returns: Issue or PullRequest with all fields populated.
This is used when the scheduler needs to refresh a specific item (e.g., after an event indicates it changed, or when fetching a linked issue referenced by another item).
3.4 Pagination
GitHub GraphQL uses cursor-based pagination. The crate handles this internally:
- Each query requests up to 100 items per page (GitHub's maximum).
- Comments and reviews are paginated within each item — the crate fetches all pages for these nested connections automatically.
- The client exposes a stream/iterator interface so callers don't manage cursors directly.
- A configurable maximum page limit prevents runaway queries on repositories with thousands of issues (default: 10 pages = 1000 items).
3.5 Rate Limiting
GitHub's GraphQL API has a point-based rate limit (typically 5,000 points per hour). Each query costs a variable number of points depending on the fields and pagination depth requested.
The client tracks rate limit state from response headers:
x-ratelimit-remaining— points remainingx-ratelimit-reset— when the budget resets
Behavior:
- If remaining points drop below a configurable threshold (default: 200), the client pauses requests and waits until the reset window.
- Rate limit state is exposed to callers so the scheduler can adjust its polling cadence.
- If a request receives a 403 with rate limit exceeded, the client waits for the reset time and retries once.
4. Client API
The GitHubClient is the public interface to the crate. It is a thin async wrapper around the
GraphQL queries, rate limit tracking, and response normalization.
4.1 Construction
GitHubClient::new(token: String) -> GitHubClient
Takes a personal access token. The token is sent as Authorization: Bearer {token} on all
requests. The client holds a single reqwest::Client internally for connection pooling.
An optional builder allows overriding:
base_url— for GitHub Enterprise or testing against a mock server (default:https://api.github.com)max_pages— pagination limit (default: 10)rate_limit_floor— minimum remaining points before pausing (default: 200)
4.2 Methods
Issues:
list_issues(owner, repo, filters) -> Result<Vec<Issue>>— paginated, returns all pages up to limitget_issue(owner, repo, number) -> Result<Issue>— single issue with full detail
Pull Requests:
list_pull_requests(owner, repo, filters) -> Result<Vec<PullRequest>>— paginatedget_pull_request(owner, repo, number) -> Result<PullRequest>— single PR with full detail
Rate Limit:
rate_limit() -> RateLimit— current rate limit state (remaining points, reset time)
4.3 Filters
IssueFilters {
states: Option<Vec<IssueState>>,
labels: Option<Vec<String>>,
since: Option<DateTime<Utc>>,
}
PullRequestFilters {
states: Option<Vec<PullRequestState>>,
since: Option<DateTime<Utc>>,
}
4.4 Errors
GitHubError {
Auth — 401, bad or expired token
NotFound — issue/PR/repo doesn't exist
RateLimited — rate limit exceeded after retry
GraphQL(Vec) — GitHub returned GraphQL-level errors
Network — connection/timeout failures
Decode — response didn't match expected shape
}
5. Polling and Discovery
The crate provides a higher-level polling interface on top of the raw client. This is what the scheduler (spec.md §3.2) uses to discover new and changed work.
5.1 Repository Poller
The RepoPoller tracks the last-seen updated_at timestamp per repository and fetches only items
that changed since the last poll.
RepoPoller::new(client: GitHubClient, owner: String, repo: String) -> RepoPoller
Methods:
poll() -> Result<PollResult>— fetches issues and PRs updated since the last successful poll. On first call, fetches all open items.poll_issues() -> Result<Vec<Issue>>— issues onlypoll_pull_requests() -> Result<Vec<PullRequest>>— PRs only
PollResult:
issues— list of new or updated issuespull_requests— list of new or updated PRstimestamp— theupdated_athigh-water mark from this poll (used assinceon the next call)rate_limit— rate limit state after this poll
Merge queue population: The scheduler uses pull_requests from the poll result to populate
the merge queue. PRs that are open and not drafts are added as merge queue entries. See spec.md
§7.0 for full eligibility criteria. This happens automatically on each poll cycle — the GitHub
crate does not filter PRs for merge queue purposes; it returns all PRs matching the query filters,
and the scheduler applies the merge queue eligibility rules.
5.2 Change Detection
The poller returns all items updated since the last poll. It is the caller's (scheduler's) responsibility to determine what changed — the poller does not diff against previous state.
This is intentional. The scheduler already maintains task state and is the right place to compare incoming GitHub state against internal state. The poller is a data-fetching layer, not a state machine.
5.3 High-Water Mark
The poller tracks a single since timestamp per repository:
- After a successful poll,
sinceadvances to the maximumupdated_atacross all returned items. - If a poll fails,
sinceis not advanced — the next poll retries the same window. - The timestamp is held in memory. If the server restarts, the first poll after restart fetches all open items (equivalent to a cold start). Persisting the high-water mark is a future optimization.
5.4 Polling Cadence
The poller does not own its own timer. The scheduler calls poll() on whatever cadence it chooses
(spec.md §3.2 says configurable). This keeps the crate free of tokio::time dependencies and
scheduling opinions.
5.5 State Filtering for Closure Detection
Although the raw GraphQL queries (§3.1, §3.2) default to fetching only open items, the RepoPoller
intentionally fetches all states (Open and Closed for issues; Open, Closed, and Merged for PRs)
when polling.
This is necessary to detect external closures (spec.md §12.3). When an issue or PR is closed
externally (by a human or another automation), its updated_at timestamp changes. By including
closed/merged items in the query with a since filter, the poller sees these state changes and can
report them to the scheduler.
Without this behavior, externally closed items would disappear from poll results entirely — the scheduler would never learn that they closed, and the corresponding tasks would remain in stale states.
Implementation note: The high-water mark (§5.3) ensures that each closed item is only returned
once — in the first poll after its updated_at changes. Subsequent polls will have a since value
newer than the closed item's timestamp, so it won't appear again.
6. Testing
6.1 Unit Tests
Normalization tests. Given raw GraphQL JSON responses (captured from real API calls or hand-written), verify that normalization produces the correct model structs. These tests exercise the deserialization and mapping logic without making network calls. Cover:
- Issues with all fields populated
- Issues with null/missing optional fields (no milestone, no assignees, closed without reason)
- PRs in each state (open, closed, merged) with varying mergeable/review states
- Nested pagination (issue with >100 comments)
- Sub-issues and linked PRs/issues
- Malformed or unexpected fields (should produce
Decodeerrors, not panics)
Rate limit tracking tests. Verify that rate limit state is correctly parsed from response headers and that the floor threshold triggers waiting behavior.
Pagination tests. Verify cursor handling across multiple pages, including the stop condition
when has_next_page is false or the page limit is reached.
Filter construction tests. Verify that IssueFilters and PullRequestFilters produce the
correct GraphQL query variables.
6.2 Integration Tests
Integration tests run against a real GitHub API. They are gated behind a feature flag
(--features integration) and require a GITHUB_TOKEN environment variable.
Target repository: Tests run against a public fixture repository (e.g., tasks-test/fixture)
with known issues, PRs, comments, and labels. The fixture repo is set up once and not modified by
tests — all operations are reads.
Tests:
- Fetch a known issue by number and verify all fields
- Fetch a known PR by number and verify all fields (including reviews)
- List open issues with label filter
- List open PRs
- Pagination across multiple pages (fixture repo needs enough issues)
sincefilter returns only recently updated items- Rate limit state is populated after a request
- Bad token returns
Autherror - Nonexistent repo returns
NotFounderror
6.3 Mock Server Tests
For testing polling behavior and error handling without depending on GitHub uptime or rate limits,
the crate includes tests that run a local HTTP server (using wiremock or similar) serving canned
GraphQL responses.
Tests:
RepoPolleradvances high-water mark after successful pollRepoPollerdoes not advance after failed poll- Rate limit floor triggers wait behavior
- 403 rate-limit response triggers retry-after-reset
- Network timeout produces
Networkerror - GraphQL error response produces
GraphQLerror
7. Open Questions
- Webhook support. The spec (§11.4) mentions optional webhook push notifications. This crate covers the polling path. Webhook ingestion may be a separate module in the server crate, since it requires an HTTP endpoint and ties into the server's request handling.
- GraphQL schema changes. GitHub evolves its GraphQL schema. Sub-issues in particular are relatively new. The normalization layer should degrade gracefully if a field is absent from the response.
- Nested pagination limits. An issue with thousands of comments would require many nested pagination calls. A practical limit on nested page depth (e.g., 10 pages = 1000 comments) may be needed.