Skip to content

Agent-state monitor over the experimental control-mode engine#692

Open
tony wants to merge 50 commits into
engine-opsfrom
engine-ops-supatui
Open

Agent-state monitor over the experimental control-mode engine#692
tony wants to merge 50 commits into
engine-opsfrom
engine-ops-supatui

Conversation

@tony

@tony tony commented Jun 27, 2026

Copy link
Copy Markdown
Member

Summary

  • Add libtmux.experimental.agents — a headless, self-healing monitor that reports, per pane, which coding agent is RUNNING, AWAITING_INPUT, IDLE, EXITED, or UNKNOWN, without polling or scraping pane output. Answers the orchestration question "which of my parallel agents needs me right now?"
  • Ingest two cooperative signal channels — a local tmux option subscription (@agent_state%subscription-changed) and a remote OSC 3008 escape carried in control-mode %output — reconciled by a lock-free last-writer-wins merge so the two channels can race freely.
  • Extend AsyncControlModeEngine with a supervised reconnect loop, a death-sentinel that closes subscribers cleanly, and per-subscriber broadcast queues, so a tmux restart or socket blip self-heals and concurrent consumers never steal each other's events.
  • Add typed refresh-client -B/-C to the RefreshClient operation — the substrate the monitor subscribes through (debounced, server-side change detection) instead of raw %output.
  • Expose the monitor over MCPlist_agents, watch_agents, and install_agent_hooks tools, plus a lifespan that starts/stops the monitor and gates the tools behind an engine capability check.
  • Ship non-clobbering shell hooks for Claude Code (~/.claude/settings.json) and Codex (~/.codex/config.toml), installable into a running session at any time.

Stacks on #690. Everything is additive under libtmux.experimentalno existing public API is touched, and the package is mypy-strict clean.

Changes by area

src/libtmux/experimental/agents/ (new package)

  • state.py: the AgentState enum (with from_signal() that maps unknown values to UNKNOWN rather than raising) and the frozen Agent record (name, state, since, source, pid, alive, plus is_running/is_awaiting).
  • merge.py: Stamp(counter, writer) ordering and latest() — the convergent, idempotent, out-of-order-tolerant merge rule. The clock is a pluggable callable (MonotonicCounter now, HLC later).
  • store.py: the durable value tier — a frozen AgentStore, the pure apply(store, event, *, now) reducer, Observed/Vanished events, a Storage protocol, and JsonFile (atomic temp-write + rename). Only apply() mutates state.
  • signals.py: OptionSignal (local) and OscSignal (remote) — the latter reassembles the byte-fragmented OSC 3008 stream tmux delivers in %output.
  • health.py: is_alive(pid) via os.kill(pid, 0); None (a PID-less remote pane) is treated as alive so a remote agent is never falsely expired.
  • tree.py: panes_of() / diff_panes() derived from models snapshots — the live session→window→pane projection.
  • monitor.py: AgentMonitor — the supervised start/stop/reconcile/status/agents contract, the reducer pipeline, and the reconcile sweep that self-attaches (excluding its own tmux -C session) and self-heals across reconnects.
  • hooks/: the AgentHook protocol (detect/install/uninstall/status), the shared emit() (local set-option, else remote OSC 3008/dev/tty), ClaudeCodeHook, CodexHook, and a registry().

src/libtmux/experimental/engines/async_control_mode.py

  • Supervised reconnect (_supervisor, deterministic jittered _backoff, backoff reset on a healthy connect), desired-state replay (add_subscription/set_attach_targets_replay_subscriptions/_replay_attach), a _STREAM_END death-sentinel broadcast to per-subscriber queues, and a subscribe() that no longer hangs when called after aclose().

src/libtmux/experimental/ops/_ops/refresh_client.py

  • Typed subscribe (-B) and size (-C) parameters, version-gated and serializable like every other operation.

src/libtmux/experimental/mcp/

  • vocabulary/agents.py: register_agents() registers list_agents / watch_agents / install_agent_hooks, behind a supports_monitor() capability gate.
  • _lifespan.py, events.py, fastmcp_adapter.py: start/stop the monitor in the server lifespan and skip the agent tools when the lifespan won't bring a monitor up.

CHANGES

  • An Agent-state monitor entry under ### What's new.

Design decisions

  • Source of truth is split by kind. tmux is authoritative for the observed tree (derive it, never store it); the monitor is authoritative for intent/run-state (agent identity + state), which tmux cannot hold and never persists. Deltas drive the fast path; a periodic full list-* snapshot diff is the correctness backstop, because tmux's change feed has blind spots (pane-died, window-resized, and title changes emit no notification).
  • Both signal channels, by necessity. The option path is lossless and re-queryable (show-options -p -v self-heals a dropped notification) but slow (~1 s debounce); the OSC path is instant and the only one that survives SSH (a remote set-option can't reach the local socket) but rides the lossy %output stream. Neither alone is sufficient.
Channel Transport Latency Loss model Reach
OptionSignal @agent_state%subscription-changed ~1 s lossless, re-queryable local socket
OscSignal OSC 3008%output (byte-fragmented) instant lossy (stream) survives SSH
  • Never infer death from silence. A missing notification never marks an agent EXITED. Local panes expire only on a failed PID probe; PID-less remote panes stay at their last-known state.
  • Lock-free convergence. The (counter, writer) latest() guard runs before the coalescing value write, so the two channels merge deterministically without locks regardless of arrival order.
  • Decoupled from the classic ORM. The package depends only on the AsyncTmuxEngine protocol, the models snapshots, and a Storage sink — not on Server/Session/Window/Pane.
MCP tool Purpose
list_agents snapshot of every known pane's agent + state
watch_agents live stream of agent-state transitions
install_agent_hooks install Claude Code / Codex hooks into the running session

Verification

The package is decoupled from the classic ORM (depends only on the engine protocol + models):

$ rg -n "from libtmux\.(server|session|window|pane|common) import|import libtmux\b" src/libtmux/experimental/agents

The remote signal is written to the pane tty, not stdout (which agent hooks capture):

$ rg -n "/dev/tty" src/libtmux/experimental/agents/hooks/emit.py

Async tests follow the repo convention (asyncio.run, no pytest-asyncio):

$ rg -n "pytest_asyncio|pytest\.mark\.asyncio" tests/experimental/agents tests/experimental/engines/test_async_control_mode_supervisor.py

MCP agent tools are gated behind an engine capability check:

$ rg -n "supports_monitor" src/libtmux/experimental/mcp

Test plan

  • Pure tier, no tmuxstate / merge / store / signals / tree / health unit tests + doctests (reducer idempotence, byte-fragmented OSC reassembly, last-writer-wins).
  • Engine resiliencetest_async_control_mode_supervisor.py and test_async_control_mode_sentinel.py cover supervised reconnect, attach replay on reconnect, broadcast to concurrent subscribers, and subscribe-after-close.
  • Live monitortest_live_monitor.py validates self-attach, reconcile, and survives-reconnect against a real tmux server via the libtmux fixtures.
  • Hooks — non-clobbering install/uninstall/status for Claude Code and Codex, including the legacy-Codex notify fallback.
  • MCPtest_agents_tools.py / test_attach_reset.py cover tool registration and the capability gate.
  • Full repo gateruff check, ruff format --check, mypy src tests, pytest, and just build-docs (the CHANGES entry + catalog directive) all green.

Refs #688, #689. Builds on #690.

@codecov

codecov Bot commented Jun 27, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 89.64174% with 133 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.74%. Comparing base (8f2c6a1) to head (6a0a06a).

Files with missing lines Patch % Lines
src/libtmux/experimental/agents/monitor.py 76.07% 33 Missing and 17 partials ⚠️
...libtmux/experimental/engines/async_control_mode.py 86.18% 14 Missing and 7 partials ⚠️
src/libtmux/experimental/agents/hooks/codex.py 78.37% 10 Missing and 6 partials ⚠️
tests/experimental/agents/test_monitor.py 81.25% 6 Missing and 3 partials ⚠️
tests/experimental/mcp/test_agents_tools.py 86.53% 5 Missing and 2 partials ⚠️
src/libtmux/experimental/agents/hooks/claude.py 90.90% 5 Missing and 1 partial ⚠️
src/libtmux/experimental/agents/store.py 91.52% 4 Missing and 1 partial ⚠️
src/libtmux/experimental/agents/hooks/emit.py 84.00% 2 Missing and 2 partials ⚠️
tests/experimental/mcp/test_events.py 88.23% 4 Missing ⚠️
tests/experimental/agents/test_live_monitor.py 95.08% 1 Missing and 2 partials ⚠️
... and 5 more
Additional details and impacted files
@@              Coverage Diff               @@
##           engine-ops     #692      +/-   ##
==============================================
+ Coverage       74.22%   75.74%   +1.51%     
==============================================
  Files             214      246      +32     
  Lines           12563    13819    +1256     
  Branches         1671     1794     +123     
==============================================
+ Hits             9325    10467    +1142     
- Misses           2586     2661      +75     
- Partials          652      691      +39     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@tony tony force-pushed the engine-ops-supatui branch from cd305cf to 68e6a55 Compare June 27, 2026 22:17
tony added 26 commits June 28, 2026 17:37
why: The shared vocabulary every agents module reads/writes.

what:
- AgentState (running/awaiting_input/idle/exited/unknown) with from_signal
- frozen Agent record + is_running/is_awaiting helpers
why: Out-of-order/replayed agent-state updates must converge to newest.

what:
- Stamp(counter, writer) with deterministic tie-break
- latest() guard + pluggable Clock + MonotonicCounter default
why: A dead stream left consumers hanging on queue.get(), so settle
reported a false 'settled' (success-shaped) instead of stream_end.

what:
- broadcast a _STREAM_END sentinel to subscriber queues on death/close
- subscribe() ends its async for on the sentinel
why: A reconnect left _attached_session set, so the next _ensure_attached
call skipped re-attaching and %output was silently missing.

what:
- Pin contract with test_reset_attach_clears_flag (Task 10)
- Add _attached_session to _StreamEngine Protocol; drop type: ignore
why: Tasks 14/15 need a stable protocol + registry to import from;
the monitor needs the canonical event→state map to translate hook names.

what:
- Add EVENT_STATE canonical lifecycle-event→state dict in base.py
- Add AgentHook runtime_checkable Protocol (name/detect/install/uninstall/status)
- Add registry() + get() with lazy imports to avoid import cycles
- Add stub ClaudeCodeHook (claude.py) + CodexHook (codex.py) for Tasks 14/15
- Add test_registry.py (3 tests: canonical map, registry members, KeyError)
why: Per-pane %subscription-changed only flows to an *attached*
control client, but AgentMonitor.start() never attached — so the
monitor was silent against a live server. The spec's "re-attach
declared sessions" step was deferred in Tasks 9/10 and never wired.
This closes that gap and proves the whole pipeline end to end.

what:
- monitor.start(): after add_subscription, pick a real session via
  list-sessions, set_attach_targets([id]), and perform the initial
  attach-session through the engine (set _attached_session). A
  tmux -C client creates its own throwaway session on connect, which
  sorts first in list-sessions; _own_session_id (display-message -p
  '#{session_id}') identifies it so _primary_session_id skips it and
  attaches to a real session. Single-session v1 limit documented.
- async_control_mode: add _replay_attach(), called after
  _replay_subscriptions on every (re)connect — mirrors the
  subscription replay (direct stdin write, swallowed pending future,
  _write_lock/FIFO discipline) so the engine re-attaches across
  reconnects. No-op when _desired_attach is empty.
- tests/experimental/agents/test_live_monitor.py: live tests against
  real tmux. test_monitor_observes_running sets @agent_state running
  and polls (no manual attach — asserts start() self-attached).
  test_reconcile_parses_live_panes proves _parse_pane_rows handles
  real list-panes -F output and the Vanished->EXITED path.
why: Complete the feature branch with user-facing documentation and a
changelog entry so the agent-state monitor is discoverable.

what:
- Add superpowers/** to exclude_patterns in docs/conf.py so the spec
  and plan are tracked in git without generating toctree warnings
- Add ## Agents section to docs/experimental.md with prose intro and
  MyST-role cross-refs for AgentMonitor, AgentState, Agent, and the
  three MCP tools
- Add Agent-state monitor deliverable to CHANGES under ### What's new
  in prose (not bullets), linking {class} roles for AgentMonitor,
  AgentState, and AgentHook
- Stage docs/superpowers/ plans and specs into version control
why: AgentMonitor._drain ran a single subscribe() that ended on the
engine's death-sentinel; the supervisor reconnected but nothing
re-subscribed and reconcile ran only once in start(), so after any
blip list_agents served a stale snapshot forever (broke acceptance
#3 / D2). health.is_alive was dead code and the docs described
behavior the code did not do.

what:
- monitor: replace _drain with a supervised _run loop that reconciles
  FIRST each iteration (so subscribe() only runs against a live
  engine), then drains until the stream ends; on disconnect it retries
  reconcile with a bounded _reconnect_poll until the supervisor
  reconnects (replaying subs + attach), then re-subscribes. Split
  reconcile() into a defensive public wrapper + a raising
  _reconcile_once so the loop can wait for the engine to revive.
  stop() sets _stopping then cancels (no hang on a closed engine).
- monitor: wire health in reconcile via _apply_health — refresh each
  tracked agent's pid/alive from the pane tree; mark a LOCAL pane
  (pid set) EXITED when its process is dead; never auto-EXIT a
  PID-less remote pane (D5). Note the receive-time clock seam (D1).
- tests: add live test_monitor_survives_engine_reconnect (kill the
  control proc, confirm a NEW state is observed after reconnect) and
  unit tests for _apply_health (dead-local→EXITED, live refresh,
  pidless-never-exits).
- docs/experimental.md: correct the liveness, reconcile-on-reconnect,
  and hook-install (settings.json / config.toml + libtmux-agent-emit,
  not set-hook) descriptions; add an AgentMonitor usage snippet.
why: The "keepalive TTL" / "age out" wording described a mechanism
that does not exist in v1 — there is no TTL, keepalive, or staleness
timer. Remote PID-less panes are simply never auto-expired; they are
left at last-known state and become EXITED only via the Vanished diff
when their tmux pane actually disappears.

what:
- docs/experimental.md: reword the reconcile paragraph to drop the
  keepalive-TTL claim and state the real behavior.
- monitor._apply_health docstring: remote pid-less panes expire only
  via the Vanished/pane-gone path, not a TTL.
- health.py module docstring: this probe never declares remote panes
  dead; a keepalive/TTL is a possible future enhancement, not v1.
why: If monitor.start() raised, the existing finally block was skipped
because start() ran between the two try blocks, leaking the drain task.

what:
- Move monitor.start() + yield inside try/finally within the
  if monitor is not None: branch so stop() always runs on exit
- Add else: yield branch to cover the monitor=None path
why: With lifespan=False the monitor is never started, so registering
agent tools against it yields a list_agents that silently returns [].

what:
- Guard register_agents call with `monitor_enabled and lifespan`
  so tools are only wired when the lifespan will actually start them
why: _BLOCK_RE.sub(new_block, content) replaces every match, so a
manually-malformed config with two marker blocks produced two copies.

what:
- Rewrite the existing-block branch to strip ALL marker blocks
  (via _BLOCK_WITH_SEP_RE then _BLOCK_RE for start-of-file), then
  append exactly one fresh block
- Add test seeding a two-block config and asserting install collapses
  to one block with status "installed"
why: The OscSignal regex accepts both ST and BEL terminators but only
the ST path had a working doctest example.

what:
- Add BEL-terminated doctest to OscSignal.feed showing
  b"\033]3008;state=idle\007" yields a Reading with state idle
tony added 23 commits June 28, 2026 17:37
why: The status() == "outdated" branch (marker present, content stale)
had no test coverage.

what:
- Add test that installs, mutates the emit command in the block,
  and asserts status() returns "outdated"
why: The class docstring and parse() method docstring carried identical
doctests; --doctest-modules ran both, which is pure noise.

what:
- Replace class-level Examples block with prose description
- Keep the method-level doctest as the single runnable example
why: AgentStore was imported at runtime and again under TYPE_CHECKING;
the runtime import already satisfies the annotation, so the second
import is dead.

what:
- Remove the redundant TYPE_CHECKING import of AgentStore
why: When the display-message own-id probe fails, own is None, so the
`sid != own` guard is always true and _primary_session_id falls through
to return ids[0] — tmux's phantom `tmux -C` session (it sorts first),
which holds no agent panes. Attaching there leaves the option channel
effectively silent.

what:
- Return None from _primary_session_id when the own-session probe fails,
  so start() skips attach instead of binding to the phantom session
- Cover the case with a fake engine whose display-message probe raises
…close

why: aclose() sets _closing, broadcasts the stream-end sentinel, and
clears _subscribers. A subscribe() call afterwards registers a fresh
queue no broadcast will ever touch, hanging the consumer forever on
queue.get().

what:
- Gate subscribe() on _closing at the top: a permanently-closing engine
  yields nothing and ends at once. A merely _dead (reconnecting) engine
  still keeps the subscriber so the post-reconnect reader feeds it
- Cover with a regression test asserting the async for ends within a
  timeout after aclose()
…connect

why: The supervisor backoff counter climbed for the engine's whole life,
so a reconnect after a long healthy session waited near the 5s cap.

what:
- Reset the attempt counter to 0 once a spawn succeeds (its startup ACK
  was consumed = a healthy connection); only consecutive connect failures
  now escalate the backoff. Verified by inspection + the existing
  reconnect tests (a counter-reset assertion would require exposing the
  internal attempt, deferred).
why: tmux reports a failed attach (e.g. stale session id) as a non-zero
returncode, not an exception, so the monitor recorded _attached_session
even when the attach failed -- silencing the option channel.

what:
- monitor.start() now records the sticky attach only when attach-session
  returns returncode 0; logs the stderr on failure
- document _replay_attach's optimistic fire-and-forget attach and that a
  failed re-attach self-corrects on the monitor's next reconcile
why: The agent monitor needs to tell a floating overlay (e.g. a status
HUD) apart from a real agent pane; the snapshot had no floating flag and
the monitor's pane format did not request it.

what:
- Add PaneSnapshot.floating, parsed from #{pane_floating_flag} (tmux
  3.7+; renders empty -> False on older tmux, so no version gate)
- Request pane_floating_flag in the monitor's PANE_FORMAT
- Cover the snapshot floating flag and the format request

Foundation for the floating HUD; the renderer + monitor self-exclusion
land next.
why: The agent monitor was observation-only; on tmux 3.7 a floating
overlay can surface live agent state in the session itself, with no
external UI.

what:
- Add HudRenderer: a pure AgentStore -> text frame plus the typed
  RespawnPane paint op (the frame is shell-quoted and held open with
  `tail -f /dev/null` so it persists between repaints)
- AgentMonitor gains opt-in `hud=True`: start() creates one floating
  NewPane over the primary session and captures its id; the supervised
  drain repaints on every store change (dirty flag set in _observe and
  reconcile); stop() kills the HUD pane
- Exclude the HUD's own pane from _reconcile_once so it never enters the
  diff or the health sweep (tracking is signal-driven, so this is the
  only exclusion needed); best-effort throughout (no session, engine
  error, or tmux < 3.7 silently skips the HUD)
- Cover the renderer, the repaint op, HUD create/teardown, and the
  reconcile self-exclusion
why: _replay_attach sent attach-session fire-and-forget then cached
self._attached_session optimistically, without confirming success. If a
session was killed during the disconnect, the events layer's
_ensure_attached trusted the stale cache, skipped the re-attach, and a
wait_for_output caller got a silently-empty capture (its docstring
promises to raise instead).

what:
- Stop setting _attached_session in _replay_attach; the events layer
  owns that cache (set on a confirmed attach, re-attached on a miss).
  The fire-and-forget re-attach itself is unchanged.
- Update the now-stale docstrings on _replay_attach / set_attach_targets
- Rewrite the reconnect test to assert the cache stays unset across a
  reconnect (replay no longer caches optimistically)
why: On engine death the supervisor's _broadcast_stream_end ends the
subscribe stream, so _EventRing._drain completes. _ensure_started's bare
`if self._task is None` guard never restarts a *completed* task, so after
the first reconnect poll_events froze on a stale cursor.

what:
- Restart the drain when the task is None OR done, and clear any stale
  error so a healthy restart isn't masked
- Cover restart-when-done vs keep-when-running (parametrized)
why: watch_agents opened its own engine.subscribe() and re-ingested the
fan-out stream while the monitor's drain already ingests it, so every
event was processed twice -- drifting the MonotonicCounter and `since`
stamps. It was also tagged readOnlyHint=True despite mutating the store.

what:
- Drop the redundant subscribe()+ingest loop; observe the monitor's live
  store over the window (the monitor's drain is the sole ingester), so
  readOnlyHint=True is now accurate
- Cover that watch_agents never calls engine.subscribe()
why: _ensure_hud ran only in start(). After a full tmux restart the HUD
pane id is dead; _repaint_hud ignored the arun result (which reports a
tmux-side failure as data, not a raise), so _hud_pane_id was never
cleared and the HUD stayed dark for the rest of the session.

what:
- _repaint_hud now checks the result: a failed/errored repaint drops
  _hud_pane_id (leaving it dirty) so it can be recreated; the dirty flag
  is cleared only on a successful paint
- _run recreates the HUD (_ensure_hud) after a reconcile when it is
  enabled but has no pane id -- covering both a restart and an initial
  create that had no session yet
- Cover ok-keeps-pane vs fail-drops-pane (parametrized)
why: The module docstring said the HUD "repaints on every store change",
but _hud_dirty is set unconditionally on each notification and reconcile,
so it repaints on those events rather than only on an actual mutation.

what:
- Reword to "repaints after each notification and reconcile"
why: The AgentHook.install/uninstall doctests ran a bare
ClaudeCodeHook(), which defaults settings_path to the real
~/.claude/settings.json and rewrites it on every doctest run. The
"no-op on stub" comment was wrong -- these methods always write.

what:
- Redirect each doctest into a tempfile.TemporaryDirectory(), mirroring
  the already-isolated claude.py/codex.py doctests
why: The agent-monitor entry described the liveness/reconcile sweep as
"a periodic health probe" running "every few seconds". It is actually
event/reconnect-driven: _run reconciles at startup and again each time
the subscribe stream ends (a disconnect/reconnect). The only sleep is a
retry backoff on the failure path, not a timer.

what:
- "refreshed by a periodic health probe" -> "refreshed on each
  reconciliation"
- "runs every few seconds" -> "runs at startup and on each engine
  reconnect"
why: Two monitor test fakes failed strict mypy: a stream-engine fake
missing the _StreamEngine Protocol's _attached_session member, and a
capturing FastMCP stand-in passed where FastMCP is expected.

what:
- Add _attached_session to FakeStreamEngine/_BlockingStreamEngine
- Cast the capturing MCP fake to FastMCP at the register_agents call
why: from_dict raised ValueError on a state string absent from the
current enum (e.g. a store written by a newer version), crashing the
monitor on startup when it loads the store.

what:
- Deserialize state via AgentState.from_signal (unknown -> UNKNOWN),
  mirroring signal ingestion
- Add parametrized tests for known/unknown/garbage states
why: `emit ... --name` with the flag as the final CLI arg raised
IndexError instead of a clean exit, surfacing a traceback to the
agent's hook runner.

what:
- Fall back to name=None when --name has no following value
- Add parametrized tests for main's --name parsing
why: install_agent_hooks awaited blocking file I/O (read, fsync,
atomic replace) directly on the event loop, stalling concurrent
MCP tools during an install.

what:
- Run hook install/status via asyncio.to_thread
- Add parametrized tests for the install tool (known/unknown agent)
why: A PR does not own its release version; the monitor entry named a
concrete version that also went stale after rebasing onto newer master.

what:
- Open the entry with the package as subject, not "libtmux X.Y.Z ships"
- Add the (#692) PR ref to the deliverable heading
@tony tony force-pushed the engine-ops-supatui branch from 9bc1f82 to 6a0a06a Compare June 28, 2026 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant