Agent-state monitor over the experimental control-mode engine#692
Open
tony wants to merge 50 commits into
Open
Agent-state monitor over the experimental control-mode engine#692tony wants to merge 50 commits into
tony wants to merge 50 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## engine-ops #692 +/- ##
==============================================
+ Coverage 74.22% 75.74% +1.51%
==============================================
Files 214 246 +32
Lines 12563 13819 +1256
Branches 1671 1794 +123
==============================================
+ Hits 9325 10467 +1142
- Misses 2586 2661 +75
- Partials 652 691 +39 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
why: The shared vocabulary every agents module reads/writes. what: - AgentState (running/awaiting_input/idle/exited/unknown) with from_signal - frozen Agent record + is_running/is_awaiting helpers
why: Out-of-order/replayed agent-state updates must converge to newest. what: - Stamp(counter, writer) with deterministic tie-break - latest() guard + pluggable Clock + MonotonicCounter default
why: A dead stream left consumers hanging on queue.get(), so settle reported a false 'settled' (success-shaped) instead of stream_end. what: - broadcast a _STREAM_END sentinel to subscriber queues on death/close - subscribe() ends its async for on the sentinel
why: A reconnect left _attached_session set, so the next _ensure_attached call skipped re-attaching and %output was silently missing. what: - Pin contract with test_reset_attach_clears_flag (Task 10) - Add _attached_session to _StreamEngine Protocol; drop type: ignore
why: Tasks 14/15 need a stable protocol + registry to import from; the monitor needs the canonical event→state map to translate hook names. what: - Add EVENT_STATE canonical lifecycle-event→state dict in base.py - Add AgentHook runtime_checkable Protocol (name/detect/install/uninstall/status) - Add registry() + get() with lazy imports to avoid import cycles - Add stub ClaudeCodeHook (claude.py) + CodexHook (codex.py) for Tasks 14/15 - Add test_registry.py (3 tests: canonical map, registry members, KeyError)
why: Per-pane %subscription-changed only flows to an *attached*
control client, but AgentMonitor.start() never attached — so the
monitor was silent against a live server. The spec's "re-attach
declared sessions" step was deferred in Tasks 9/10 and never wired.
This closes that gap and proves the whole pipeline end to end.
what:
- monitor.start(): after add_subscription, pick a real session via
list-sessions, set_attach_targets([id]), and perform the initial
attach-session through the engine (set _attached_session). A
tmux -C client creates its own throwaway session on connect, which
sorts first in list-sessions; _own_session_id (display-message -p
'#{session_id}') identifies it so _primary_session_id skips it and
attaches to a real session. Single-session v1 limit documented.
- async_control_mode: add _replay_attach(), called after
_replay_subscriptions on every (re)connect — mirrors the
subscription replay (direct stdin write, swallowed pending future,
_write_lock/FIFO discipline) so the engine re-attaches across
reconnects. No-op when _desired_attach is empty.
- tests/experimental/agents/test_live_monitor.py: live tests against
real tmux. test_monitor_observes_running sets @agent_state running
and polls (no manual attach — asserts start() self-attached).
test_reconcile_parses_live_panes proves _parse_pane_rows handles
real list-panes -F output and the Vanished->EXITED path.
why: Complete the feature branch with user-facing documentation and a
changelog entry so the agent-state monitor is discoverable.
what:
- Add superpowers/** to exclude_patterns in docs/conf.py so the spec
and plan are tracked in git without generating toctree warnings
- Add ## Agents section to docs/experimental.md with prose intro and
MyST-role cross-refs for AgentMonitor, AgentState, Agent, and the
three MCP tools
- Add Agent-state monitor deliverable to CHANGES under ### What's new
in prose (not bullets), linking {class} roles for AgentMonitor,
AgentState, and AgentHook
- Stage docs/superpowers/ plans and specs into version control
why: AgentMonitor._drain ran a single subscribe() that ended on the engine's death-sentinel; the supervisor reconnected but nothing re-subscribed and reconcile ran only once in start(), so after any blip list_agents served a stale snapshot forever (broke acceptance #3 / D2). health.is_alive was dead code and the docs described behavior the code did not do. what: - monitor: replace _drain with a supervised _run loop that reconciles FIRST each iteration (so subscribe() only runs against a live engine), then drains until the stream ends; on disconnect it retries reconcile with a bounded _reconnect_poll until the supervisor reconnects (replaying subs + attach), then re-subscribes. Split reconcile() into a defensive public wrapper + a raising _reconcile_once so the loop can wait for the engine to revive. stop() sets _stopping then cancels (no hang on a closed engine). - monitor: wire health in reconcile via _apply_health — refresh each tracked agent's pid/alive from the pane tree; mark a LOCAL pane (pid set) EXITED when its process is dead; never auto-EXIT a PID-less remote pane (D5). Note the receive-time clock seam (D1). - tests: add live test_monitor_survives_engine_reconnect (kill the control proc, confirm a NEW state is observed after reconnect) and unit tests for _apply_health (dead-local→EXITED, live refresh, pidless-never-exits). - docs/experimental.md: correct the liveness, reconcile-on-reconnect, and hook-install (settings.json / config.toml + libtmux-agent-emit, not set-hook) descriptions; add an AgentMonitor usage snippet.
why: The "keepalive TTL" / "age out" wording described a mechanism that does not exist in v1 — there is no TTL, keepalive, or staleness timer. Remote PID-less panes are simply never auto-expired; they are left at last-known state and become EXITED only via the Vanished diff when their tmux pane actually disappears. what: - docs/experimental.md: reword the reconcile paragraph to drop the keepalive-TTL claim and state the real behavior. - monitor._apply_health docstring: remote pid-less panes expire only via the Vanished/pane-gone path, not a TTL. - health.py module docstring: this probe never declares remote panes dead; a keepalive/TTL is a possible future enhancement, not v1.
why: If monitor.start() raised, the existing finally block was skipped because start() ran between the two try blocks, leaking the drain task. what: - Move monitor.start() + yield inside try/finally within the if monitor is not None: branch so stop() always runs on exit - Add else: yield branch to cover the monitor=None path
why: With lifespan=False the monitor is never started, so registering agent tools against it yields a list_agents that silently returns []. what: - Guard register_agents call with `monitor_enabled and lifespan` so tools are only wired when the lifespan will actually start them
why: _BLOCK_RE.sub(new_block, content) replaces every match, so a manually-malformed config with two marker blocks produced two copies. what: - Rewrite the existing-block branch to strip ALL marker blocks (via _BLOCK_WITH_SEP_RE then _BLOCK_RE for start-of-file), then append exactly one fresh block - Add test seeding a two-block config and asserting install collapses to one block with status "installed"
why: The OscSignal regex accepts both ST and BEL terminators but only the ST path had a working doctest example. what: - Add BEL-terminated doctest to OscSignal.feed showing b"\033]3008;state=idle\007" yields a Reading with state idle
why: The status() == "outdated" branch (marker present, content stale) had no test coverage. what: - Add test that installs, mutates the emit command in the block, and asserts status() returns "outdated"
why: The class docstring and parse() method docstring carried identical doctests; --doctest-modules ran both, which is pure noise. what: - Replace class-level Examples block with prose description - Keep the method-level doctest as the single runnable example
why: AgentStore was imported at runtime and again under TYPE_CHECKING; the runtime import already satisfies the annotation, so the second import is dead. what: - Remove the redundant TYPE_CHECKING import of AgentStore
why: When the display-message own-id probe fails, own is None, so the `sid != own` guard is always true and _primary_session_id falls through to return ids[0] — tmux's phantom `tmux -C` session (it sorts first), which holds no agent panes. Attaching there leaves the option channel effectively silent. what: - Return None from _primary_session_id when the own-session probe fails, so start() skips attach instead of binding to the phantom session - Cover the case with a fake engine whose display-message probe raises
…close why: aclose() sets _closing, broadcasts the stream-end sentinel, and clears _subscribers. A subscribe() call afterwards registers a fresh queue no broadcast will ever touch, hanging the consumer forever on queue.get(). what: - Gate subscribe() on _closing at the top: a permanently-closing engine yields nothing and ends at once. A merely _dead (reconnecting) engine still keeps the subscriber so the post-reconnect reader feeds it - Cover with a regression test asserting the async for ends within a timeout after aclose()
…l start() callers
…connect why: The supervisor backoff counter climbed for the engine's whole life, so a reconnect after a long healthy session waited near the 5s cap. what: - Reset the attempt counter to 0 once a spawn succeeds (its startup ACK was consumed = a healthy connection); only consecutive connect failures now escalate the backoff. Verified by inspection + the existing reconnect tests (a counter-reset assertion would require exposing the internal attempt, deferred).
why: tmux reports a failed attach (e.g. stale session id) as a non-zero returncode, not an exception, so the monitor recorded _attached_session even when the attach failed -- silencing the option channel. what: - monitor.start() now records the sticky attach only when attach-session returns returncode 0; logs the stderr on failure - document _replay_attach's optimistic fire-and-forget attach and that a failed re-attach self-corrects on the monitor's next reconcile
why: The agent monitor needs to tell a floating overlay (e.g. a status
HUD) apart from a real agent pane; the snapshot had no floating flag and
the monitor's pane format did not request it.
what:
- Add PaneSnapshot.floating, parsed from #{pane_floating_flag} (tmux
3.7+; renders empty -> False on older tmux, so no version gate)
- Request pane_floating_flag in the monitor's PANE_FORMAT
- Cover the snapshot floating flag and the format request
Foundation for the floating HUD; the renderer + monitor self-exclusion
land next.
why: The agent monitor was observation-only; on tmux 3.7 a floating overlay can surface live agent state in the session itself, with no external UI. what: - Add HudRenderer: a pure AgentStore -> text frame plus the typed RespawnPane paint op (the frame is shell-quoted and held open with `tail -f /dev/null` so it persists between repaints) - AgentMonitor gains opt-in `hud=True`: start() creates one floating NewPane over the primary session and captures its id; the supervised drain repaints on every store change (dirty flag set in _observe and reconcile); stop() kills the HUD pane - Exclude the HUD's own pane from _reconcile_once so it never enters the diff or the health sweep (tracking is signal-driven, so this is the only exclusion needed); best-effort throughout (no session, engine error, or tmux < 3.7 silently skips the HUD) - Cover the renderer, the repaint op, HUD create/teardown, and the reconcile self-exclusion
why: _replay_attach sent attach-session fire-and-forget then cached self._attached_session optimistically, without confirming success. If a session was killed during the disconnect, the events layer's _ensure_attached trusted the stale cache, skipped the re-attach, and a wait_for_output caller got a silently-empty capture (its docstring promises to raise instead). what: - Stop setting _attached_session in _replay_attach; the events layer owns that cache (set on a confirmed attach, re-attached on a miss). The fire-and-forget re-attach itself is unchanged. - Update the now-stale docstrings on _replay_attach / set_attach_targets - Rewrite the reconnect test to assert the cache stays unset across a reconnect (replay no longer caches optimistically)
why: On engine death the supervisor's _broadcast_stream_end ends the subscribe stream, so _EventRing._drain completes. _ensure_started's bare `if self._task is None` guard never restarts a *completed* task, so after the first reconnect poll_events froze on a stale cursor. what: - Restart the drain when the task is None OR done, and clear any stale error so a healthy restart isn't masked - Cover restart-when-done vs keep-when-running (parametrized)
why: watch_agents opened its own engine.subscribe() and re-ingested the fan-out stream while the monitor's drain already ingests it, so every event was processed twice -- drifting the MonotonicCounter and `since` stamps. It was also tagged readOnlyHint=True despite mutating the store. what: - Drop the redundant subscribe()+ingest loop; observe the monitor's live store over the window (the monitor's drain is the sole ingester), so readOnlyHint=True is now accurate - Cover that watch_agents never calls engine.subscribe()
why: _ensure_hud ran only in start(). After a full tmux restart the HUD pane id is dead; _repaint_hud ignored the arun result (which reports a tmux-side failure as data, not a raise), so _hud_pane_id was never cleared and the HUD stayed dark for the rest of the session. what: - _repaint_hud now checks the result: a failed/errored repaint drops _hud_pane_id (leaving it dirty) so it can be recreated; the dirty flag is cleared only on a successful paint - _run recreates the HUD (_ensure_hud) after a reconcile when it is enabled but has no pane id -- covering both a restart and an initial create that had no session yet - Cover ok-keeps-pane vs fail-drops-pane (parametrized)
why: The module docstring said the HUD "repaints on every store change", but _hud_dirty is set unconditionally on each notification and reconcile, so it repaints on those events rather than only on an actual mutation. what: - Reword to "repaints after each notification and reconcile"
why: The AgentHook.install/uninstall doctests ran a bare ClaudeCodeHook(), which defaults settings_path to the real ~/.claude/settings.json and rewrites it on every doctest run. The "no-op on stub" comment was wrong -- these methods always write. what: - Redirect each doctest into a tempfile.TemporaryDirectory(), mirroring the already-isolated claude.py/codex.py doctests
why: The agent-monitor entry described the liveness/reconcile sweep as "a periodic health probe" running "every few seconds". It is actually event/reconnect-driven: _run reconciles at startup and again each time the subscribe stream ends (a disconnect/reconnect). The only sleep is a retry backoff on the failure path, not a timer. what: - "refreshed by a periodic health probe" -> "refreshed on each reconciliation" - "runs every few seconds" -> "runs at startup and on each engine reconnect"
why: Two monitor test fakes failed strict mypy: a stream-engine fake missing the _StreamEngine Protocol's _attached_session member, and a capturing FastMCP stand-in passed where FastMCP is expected. what: - Add _attached_session to FakeStreamEngine/_BlockingStreamEngine - Cast the capturing MCP fake to FastMCP at the register_agents call
why: from_dict raised ValueError on a state string absent from the current enum (e.g. a store written by a newer version), crashing the monitor on startup when it loads the store. what: - Deserialize state via AgentState.from_signal (unknown -> UNKNOWN), mirroring signal ingestion - Add parametrized tests for known/unknown/garbage states
why: `emit ... --name` with the flag as the final CLI arg raised IndexError instead of a clean exit, surfacing a traceback to the agent's hook runner. what: - Fall back to name=None when --name has no following value - Add parametrized tests for main's --name parsing
why: install_agent_hooks awaited blocking file I/O (read, fsync, atomic replace) directly on the event loop, stalling concurrent MCP tools during an install. what: - Run hook install/status via asyncio.to_thread - Add parametrized tests for the install tool (known/unknown agent)
why: A PR does not own its release version; the monitor entry named a concrete version that also went stale after rebasing onto newer master. what: - Open the entry with the package as subject, not "libtmux X.Y.Z ships" - Add the (#692) PR ref to the deliverable heading
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
libtmux.experimental.agents— a headless, self-healing monitor that reports, per pane, which coding agent isRUNNING,AWAITING_INPUT,IDLE,EXITED, orUNKNOWN, without polling or scraping pane output. Answers the orchestration question "which of my parallel agents needs me right now?"@agent_state→%subscription-changed) and a remoteOSC 3008escape carried in control-mode%output— reconciled by a lock-free last-writer-wins merge so the two channels can race freely.AsyncControlModeEnginewith a supervised reconnect loop, a death-sentinel that closes subscribers cleanly, and per-subscriber broadcast queues, so atmuxrestart or socket blip self-heals and concurrent consumers never steal each other's events.refresh-client -B/-Cto theRefreshClientoperation — the substrate the monitor subscribes through (debounced, server-side change detection) instead of raw%output.list_agents,watch_agents, andinstall_agent_hookstools, plus a lifespan that starts/stops the monitor and gates the tools behind an engine capability check.~/.claude/settings.json) and Codex (~/.codex/config.toml), installable into a running session at any time.Stacks on #690. Everything is additive under
libtmux.experimental— no existing public API is touched, and the package is mypy-strict clean.Changes by area
src/libtmux/experimental/agents/(new package)state.py: theAgentStateenum (withfrom_signal()that maps unknown values toUNKNOWNrather than raising) and the frozenAgentrecord (name,state,since,source,pid,alive, plusis_running/is_awaiting).merge.py:Stamp(counter, writer)ordering andlatest()— the convergent, idempotent, out-of-order-tolerant merge rule. The clock is a pluggable callable (MonotonicCounternow, HLC later).store.py: the durable value tier — a frozenAgentStore, the pureapply(store, event, *, now)reducer,Observed/Vanishedevents, aStorageprotocol, andJsonFile(atomic temp-write + rename). Onlyapply()mutates state.signals.py:OptionSignal(local) andOscSignal(remote) — the latter reassembles the byte-fragmentedOSC 3008stream tmux delivers in%output.health.py:is_alive(pid)viaos.kill(pid, 0);None(a PID-less remote pane) is treated as alive so a remote agent is never falsely expired.tree.py:panes_of()/diff_panes()derived frommodelssnapshots — the live session→window→pane projection.monitor.py:AgentMonitor— the supervisedstart/stop/reconcile/status/agentscontract, the reducer pipeline, and the reconcile sweep that self-attaches (excluding its owntmux -Csession) and self-heals across reconnects.hooks/: theAgentHookprotocol (detect/install/uninstall/status), the sharedemit()(localset-option, else remoteOSC 3008→/dev/tty),ClaudeCodeHook,CodexHook, and aregistry().src/libtmux/experimental/engines/async_control_mode.py_supervisor, deterministic jittered_backoff, backoff reset on a healthy connect), desired-state replay (add_subscription/set_attach_targets→_replay_subscriptions/_replay_attach), a_STREAM_ENDdeath-sentinel broadcast to per-subscriber queues, and asubscribe()that no longer hangs when called afteraclose().src/libtmux/experimental/ops/_ops/refresh_client.pysubscribe(-B) andsize(-C) parameters, version-gated and serializable like every other operation.src/libtmux/experimental/mcp/vocabulary/agents.py:register_agents()registerslist_agents/watch_agents/install_agent_hooks, behind asupports_monitor()capability gate._lifespan.py,events.py,fastmcp_adapter.py: start/stop the monitor in the server lifespan and skip the agent tools when the lifespan won't bring a monitor up.CHANGES### What's new.Design decisions
list-*snapshot diff is the correctness backstop, because tmux's change feed has blind spots (pane-died, window-resized, and title changes emit no notification).show-options -p -vself-heals a dropped notification) but slow (~1 s debounce); the OSC path is instant and the only one that survives SSH (a remoteset-optioncan't reach the local socket) but rides the lossy%outputstream. Neither alone is sufficient.OptionSignal@agent_state→%subscription-changedOscSignalOSC 3008→%output(byte-fragmented)EXITED. Local panes expire only on a failed PID probe; PID-less remote panes stay at their last-known state.(counter, writer)latest()guard runs before the coalescing value write, so the two channels merge deterministically without locks regardless of arrival order.AsyncTmuxEngineprotocol, themodelssnapshots, and aStoragesink — not onServer/Session/Window/Pane.list_agentswatch_agentsinstall_agent_hooksVerification
The package is decoupled from the classic ORM (depends only on the engine protocol + models):
$ rg -n "from libtmux\.(server|session|window|pane|common) import|import libtmux\b" src/libtmux/experimental/agentsThe remote signal is written to the pane tty, not stdout (which agent hooks capture):
$ rg -n "/dev/tty" src/libtmux/experimental/agents/hooks/emit.pyAsync tests follow the repo convention (
asyncio.run, nopytest-asyncio):$ rg -n "pytest_asyncio|pytest\.mark\.asyncio" tests/experimental/agents tests/experimental/engines/test_async_control_mode_supervisor.pyMCP agent tools are gated behind an engine capability check:
$ rg -n "supports_monitor" src/libtmux/experimental/mcpTest plan
state/merge/store/signals/tree/healthunit tests + doctests (reducer idempotence, byte-fragmented OSC reassembly, last-writer-wins).test_async_control_mode_supervisor.pyandtest_async_control_mode_sentinel.pycover supervised reconnect, attach replay on reconnect, broadcast to concurrent subscribers, and subscribe-after-close.test_live_monitor.pyvalidates self-attach, reconcile, and survives-reconnect against a real tmux server via the libtmux fixtures.notifyfallback.test_agents_tools.py/test_attach_reset.pycover tool registration and the capability gate.ruff check,ruff format --check,mypy src tests,pytest, andjust build-docs(the CHANGES entry + catalog directive) all green.Refs #688, #689. Builds on #690.