Skip to content

State, Roadmap & Limitations


Current State

The Hall is released. The GitHub App is available on the Marketplace — install it on any org and you get a working Hall instance in minutes. Invoker onboarding is the only manual step.

What exists today is a template repository: all workflows, composite actions, scripts, routing logic, and documentation are there and replicable by anyone willing to read through it. There is no published GitHub App in the Marketplace, no Helm chart, no installer. Deploying the Hall in a new org means:

  1. Forking (or using the template) and customising the repo.
  2. Registering a GitHub App manually and pointing its webhook at a self-hosted relay.
  3. Onboarding at least one invoker before any dispatch can run.

This is intentional for now — the architecture is still settling, and locking it into a polished installation surface too early would create migration debt. The beta is the right vehicle for validating the design under real conditions.

What is stable:

  • Invoker pool selection, weekly quota tracking, and nightly retry.
  • Full dispatch lifecycle: label trigger → authorize → dispatch → status card → cleanup.
  • Old Major onboarding and PR creation flow.
  • CI loop (re-dispatch on failure up to max_retries).
  • Cross-repo dispatch via App webhook + relay (self-hosted on Aruba VPS).
  • Composite action interface (authorize, dispatch, memory, counter, status-card, cleanup, post-dispatch).
  • Federation: hall.sync propagates operator-managed paths to all installed orgs on tag push.
  • Model selection per agent (agents.yml--model flag).
  • MCP config driven by agents.yml — no per-agent branching in dispatch logic.
  • Post-mortem loop: failed dispatches auto-trigger Old Major via hall:post-mortem.
  • Audit log schema: agent, model, MCP servers, turns efficiency, duration, outcome.

What is not stable:

  • Agent-to-agent coordination.
  • The relay protocol (may change before v1).
  • Turns tuning: max_turns will be calibrated once 2+ weeks of turns_used data exists per agent.

Federation Model

The Hall is federated. Orgs install the GitHub App; the relay provisions their Hall instance from the operator template. Operator-managed paths sync to all installed orgs on tag push via hall.sync.

Execution isolation

Layer Design
GitHub App Single published App; orgs install it from the Marketplace
Relay Managed relay on operator VPS — routes webhooks and syncs updates
Invoker pool Per-org invoker/<handle> environments — no cross-org token sharing
Execution GitHub-hosted runners in the org's own Hall repo — secrets never leave the org
Sync hall.sync (repository_dispatch) updates operator-managed workflows, actions, and scripts; org-specific files (roster, custom agents) are never overwritten

What never changes

The principle of zero shared state and zero shared secrets is non-negotiable.

  • No org's OAuth tokens, invoker counters, or task memories touch another org's environment.
  • The shared surface is exclusively: the App identity, the relay routing logic, and the workflow templates.
  • Secrets remain in GitHub Environments scoped to each org's own Hall repository.
  • Anthropic API tokens never leave the runner that owns them.

This means the federated model is not a multi-tenant SaaS — it is a distributed system where the coordination layer is shared and the execution layer is always isolated. An org's Hall instance is architecturally equivalent to a self-hosted deployment; it just installs faster.


Known Limitations

Invoker pool race condition

Pool selection and counter increment are not atomic. The detect job reads all invoker usage counts, picks the least-used eligible invoker, and emits it as a job output. The dispatch job then increments the counter — but between detection and increment, a parallel dispatch can select the same invoker. Under concurrent load, an invoker can be dispatched past their weekly cap by the number of concurrent detections that happen before any increment lands.

Mitigation in practice: Concurrency groups (hall-{agent}-{issue}) serialize per agent+issue. The race only affects cross-issue parallelism, which is bounded by the org's dispatch rate. Acceptable for beta; needs a test-and-set at detection time for high-throughput orgs.

Quota exhaustion retry has no jitter

Queued dispatches are retried by a nightly cron at 03:00 UTC. All queued issues fire simultaneously. If quota is still exhausted, all of them are re-queued for the following night. Under sustained quota shortage (API downtime, billing issues), this creates a thundering herd at the same wall-clock time every night. Fix: add per-issue jitter or stagger the retry window.

GitHub API rate limit under large pools

The detect and select-invoker scripts paginate all invoker/* environments on every dispatch. With many invokers and a high dispatch rate, each round of dispatches makes several paginated API calls. The scripts have no backoff; under sustained load they would begin hitting the 5000 req/hr ceiling. Not a concern for small orgs; becomes critical past ~20 simultaneous invokers under continuous dispatch.

Agent to Agent coordination

Routing a complex task to Old Major with the expectation that he will split it into subtasks and dispatch multiple agents in parallel is not a supported workflow and not recommended.

The problem is structural: agents run in isolation. There is no inter-agent messaging, no shared working memory, no dependency graph, and no coordinator that can block agent B until agent A's PR lands. If Old Major creates three sub-issues and labels each for a different specialist, those three specialists will:

  • Each check out the target repo from main (or the same branch) independently.
  • Race to open PRs with potentially conflicting changes.
  • Have no visibility into what the others are doing or have done.

Managing the merge order, preventing conflicts, and synthesising the results falls entirely back on the human. In practice this is more work than doing the task linearly, not less.

The correct pattern for multi-component tasks is:

  1. Open one issue describing the full task in Advise.
  2. Ask explicitly to decompose tasks and create sub issues.
  3. Old-Major analyses and decomposes.
  4. Review and assign.

The above described behaviour happens automatically if Old-Major detects that the task is non-trivial and directly proposes decomposition.