Skip to content

Typed operations & engines: spine, 6 engines, plans, models, facades (#689)#690

Open
tony wants to merge 108 commits into
masterfrom
engine-ops
Open

Typed operations & engines: spine, 6 engines, plans, models, facades (#689)#690
tony wants to merge 108 commits into
masterfrom
engine-ops

Conversation

@tony

@tony tony commented Jun 21, 2026

Copy link
Copy Markdown
Member

Summary

Implements the typed operations + engines architecture under libtmux.experimental.{ops,engines,models,facade} — an inert, statically-typed operation spine; a family of interchangeable engines (subprocess, concrete, control-mode, async-subprocess, async-control, and the native imsg easter-egg); lazy/async-lazy plans with ;-folding chainability; pure object-graph snapshots; a typed read surface; engine-typed facades; and a docs catalog generated from the registry.

Operationalizes #688 (architecture) per the plan in #689. Touches no existing public API — everything is additive under libtmux.experimental (explicitly outside the versioning policy). Nothing is generated at runtime; everything is statically typed and mypy-strict clean.

What's delivered

The spine — libtmux.experimental.ops (pure, no tmux):

  • Operation[ResultT]: frozen, keyword-only, class-vars as the single source of truth (kind/command/scope/result_cls/effects/safety/chainable/version gates). Pure render() with declarative version gating; build_result() adapts raw output to a typed result (version-threaded so read parsing matches the gated render).
  • Typed Result hierarchy with opt-in raise_for_status(): AckResult (no-output commands — success/failure only), SplitWindowResult/CreateResult (captured ids), CapturePaneResult (lines), ListPanes/Windows/SessionsResult (snapshot-deriving rows).
  • Closed Target sum, fail-closed OperationRegistry, stdlib serialization, and catalog() (registry-derived docs data).
  • LazyPlan (record → resolve SlotRef forward refs → execute) with chainability: >> / OpChain composition and execute(fold=True) folding chainable runs into one tmux a ; b dispatch, attributing per-op status (success → all complete; failure → first failed, rest skipped, matching tmux's cmdq_remove_group).
  • Read seam: ListPanes/ListWindows/ListSessions ops render the same -F template neo uses (imported, not copied) and parse into models snapshots — a typed read surface parallel to neo, leaving the ORM untouched.
  • 57 operations across client/pane/window/session/server scopes.

Engines — libtmux.experimental.engines (all behind TmuxEngine/AsyncTmuxEngine, all returning the same CommandResult):

Family Sync Async
Subprocess (classic) SubprocessEngine AsyncSubprocessEngine
Concrete (in-memory) ConcreteEngine AsyncConcreteEngine
Control mode (tmux -C) ControlModeEngine AsyncControlModeEngine (event stream via subscribe())
Native imsg (binary protocol) ImsgEngine (opt-in easter egg)

Control engines use an I/O-free bytes ControlModeParser with FIFO/skip correlation (startup-ACK consumed up front; unsolicited hook blocks skipped). The imsg engine speaks tmux's binary peer protocol directly (AF_UNIX + SCM_RIGHTS, PROTOCOL_VERSION 8) and has a live parity test vs the subprocess engine the prototype never had.

Models — libtmux.experimental.models: frozen Pane/Window/Session/ServerSnapshot (typed core + raw field tail), from_pane_rows() builds the whole tree from one list-panes -a query, round-trips to plain dicts — neo-like but decoupled and serializable.

Facades — libtmux.experimental.facade ("mode lives in the type"): eager Server→Session→Window→Pane navigation, LazyWindow/LazyPane, AsyncWindow/AsyncPane — all over the same ops; control mode is just an engine choice.

Docs: an in-repo tmuxop-catalog Sphinx directive renders catalog() into the operation reference (exercised by the docs gate), so the reference can't drift from the code.

Testing

  • ~240 experimental tests + doctests; the pure spine/models/concrete tests need no tmux, while classic/control/async/imsg engines and the facades are validated against a real tmux server via the libtmux fixtures.
  • Cross-engine contract suite: same typed result across engines; serialization round-trips.
  • Full repo gate green: ruff, ruff format, mypy --strict, pytest (1501 passed, 2 skipped), build-docs. (The occasional test_retry.py timing flake is pre-existing and unrelated — passes in isolation.)

Design notes

  • Revises Design typed operations and engines #688: execution mode lives in the facade type, not a runtime-bound engine attribute (return types differ by mode).
  • Per-engine error policy: classic reproduces today's behavior; newer engines return typed results with opt-in raise_for_status(). Same result shape across engines.
  • Core is stdlib-dataclass-only; an OTel/MCP edge can sit behind an extra.
  • imsg is opt-in and non-default: it depends on tmux's internal protocol (v8), is POSIX-only, and cannot host attach (which falls back to a local spawn).

Refs #688, #689.

@codecov

codecov Bot commented Jun 21, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 77.67932% with 1335 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.22%. Comparing base (d88a212) to head (8f2c6a1).

Files with missing lines Patch % Lines
scripts/mcp_swap.py 26.72% 314 Missing and 15 partials ⚠️
src/libtmux/experimental/engines/imsg/base.py 51.59% 163 Missing and 34 partials ⚠️
src/libtmux/experimental/engines/control_mode.py 65.43% 70 Missing and 33 partials ⚠️
...libtmux/experimental/engines/async_control_mode.py 71.74% 42 Missing and 21 partials ⚠️
src/libtmux/experimental/engines/imsg/v8.py 75.29% 46 Missing and 17 partials ⚠️
...rc/libtmux/experimental/mcp/vocabulary/_resolve.py 60.95% 46 Missing and 11 partials ⚠️
src/libtmux/experimental/mcp/middleware.py 73.55% 41 Missing and 14 partials ⚠️
docs/_ext/tmuxop.py 18.18% 36 Missing ⚠️
src/libtmux/experimental/mcp/vocabulary/pane.py 76.15% 32 Missing and 4 partials ⚠️
src/libtmux/experimental/mcp/__init__.py 47.61% 28 Missing and 5 partials ⚠️
... and 60 more
Additional details and impacted files
@@             Coverage Diff             @@
##           master     #690       +/-   ##
===========================================
+ Coverage   51.89%   74.22%   +22.33%     
===========================================
  Files          25      214      +189     
  Lines        3623    12563     +8940     
  Branches      733     1671      +938     
===========================================
+ Hits         1880     9325     +7445     
- Misses       1439     2586     +1147     
- Partials      304      652      +348     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@tony tony changed the title Typed operations and engines: inert op spine (#689) Jun 21, 2026
@tony tony changed the title Typed operations and engines: spine + 4 engines + facades (#689) Jun 21, 2026
@tony tony changed the title Typed operations and engines Jun 21, 2026
tony added a commit that referenced this pull request Jun 21, 2026
why: Record the experimental operations/engines layer for the
upcoming release so the unreleased section tracks what landed.

what:
- Add a "What's new" deliverable under the unreleased 0.59.x section
  for the experimental operations and engines layer (#690)
- Defer the release lead paragraph until the version is cut
@tony

tony commented Jun 21, 2026

Copy link
Copy Markdown
Member Author

Code review

Found 2 issues:

  1. LazyPlan resolves a forward SlotRef only for target, never for src_target, so a dual-target op (JoinPane, SwapPane, MovePane, BreakPane, SwapWindow, MoveWindow, LinkWindow) whose src_target comes from an earlier plan.add(...) reaches render() with the slot unresolved and raises TypeError: cannot render an unresolved SlotRef. (bug: _resolve() substitutes operation.target but not operation.src_target, even though serialize.py already handles both)

def _resolve(
operation: Operation[t.Any],
bindings: dict[int, str],
) -> Operation[t.Any]:
"""Substitute a :class:`SlotRef` target with a captured concrete id."""
target = operation.target
if not isinstance(target, SlotRef):
return operation
try:
concrete = bindings[target.slot] + target.suffix
except KeyError as error:
msg = (
f"slot {target.slot} has no captured id yet; a plan step can only "
f"target an earlier step that creates an object"
)
raise OperationError(msg) from error
return dataclasses.replace(operation, target=_target_from_id(concrete))

  1. SaveBuffer declares contradictory metadata: safety = "mutating" together with effects = Effects(read_only=True), where read_only is documented as "does not change tmux state". Its read peer ShowBuffer uses safety = "readonly", and a consumer filtering registry.select(lambda s: s.safety == "readonly") would omit save_buffer despite effects.read_only=True. (bug: inconsistent safety/effects declarations)

result_cls = AckResult
safety = "mutating"
effects = Effects(read_only=True)

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@tony

tony commented Jun 21, 2026

Copy link
Copy Markdown
Member Author

Code review

Found 1 issue:

  1. LazyPlan resolves forward SlotRefs on every dispatch path except the {marked} fold's decorates. _drive resolves the create op but builds decorates raw, so a chainable dual-target decorate (SwapPane/JoinPane/MovePane) whose src_target is a forward slot reaches render_marked unresolved and raises TypeError: cannot render an unresolved SlotRef. The single-op and chain paths both call _resolve; this one does not. (bug: decorates = [self._operations[i] for i in decorate_idx] skips _resolve, so src_target SlotRefs survive into render under MarkedPlanner)

create_idx, *decorate_idx = step.indices
create = _resolve(self._operations[create_idx], bindings)
decorates = [self._operations[i] for i in decorate_idx]
merged: CommandResult = yield _Chain(
render_marked(create, decorates, version),
)

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@tony

tony commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

1 similar comment
@tony

tony commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

tony added a commit that referenced this pull request Jun 27, 2026
why: Record the experimental operations/engines layer for the
upcoming release so the unreleased section tracks what landed.

what:
- Add a "What's new" deliverable under the unreleased 0.59.x section
  for the experimental operations and engines layer (#690)
- Defer the release lead paragraph until the version is cut
tony added 16 commits June 28, 2026 16:48
why: Operationalizes the typed-operations/engines architecture
(issues 688, 689) with the pure substrate that was absent from every
prototype branch: an inert, statically-typed operation value that
renders tmux commands, carries its result type, and serializes without
a live tmux server. Engines stay transport-agnostic over it. None of
this touches or changes existing public APIs.

what:
- Add libtmux.experimental.{ops,engines} packages (experimental, not
  under the versioning policy)
- ops: frozen Operation[ResultT] with class-level metadata as the
  single source of truth; pure render() with declarative version gating
  (LooseVersion); build_result() adapting raw output to typed results
- ops: typed Result base + raise_for_status() (CPython/requests
  precedent), SplitWindowResult/CapturePaneResult payloads
- ops: closed Target sum (PaneId/WindowId/SessionId/ClientName/NameRef/
  IndexRef/Special/SlotRef) with fail-closed validation
- ops: fail-closed OperationRegistry keyed by kind, with OpSpec views
  and predicate listing; stdlib dict serialization with round-trips
- ops: four seed operations (split-window, capture-pane, send-keys,
  select-layout) registered via @register
- engines: TmuxEngine/AsyncTmuxEngine protocols, CommandRequest/
  CommandResult, EngineSpec; run()/arun() execute bridge sharing one
  render/build path (sync vs await is the only divergence)
- tests: 111 pure, fixture-parametrizable unit tests + doctests, all
  runnable without a tmux server
why: Proves the operation/result contract is transport-agnostic -- the
same typed result whether produced by a real tmux subprocess or an
in-memory simulator -- and provides the offline engine that lets ops
doctests and tests run without a tmux server (issue 689 phases 2-3).

what:
- engines.subprocess: classic SubprocessEngine mirroring tmux_cmd
  (has-session stderr fold, backslashreplace, trailing-blank strip;
  tmux failure returned as data, only missing binary raises), with
  for_server() deriving -L/-S/-f/-2 flags from a live Server
- engines.concrete: deterministic in-memory engine (fabricated pane/
  window/session ids, canned capture lines) for tests and docs
- engines.registry: name-keyed engine registry (register/create/
  available), seeded with subprocess + concrete
- tests/experimental/contract: engine-agnostic operation contract run
  offline via concrete, plus classic-vs-concrete parity against a real
  tmux server (same result type + argv, payload may differ)
why: Completes the sync/async-symmetric execution story plus the
deferred-execution and documentation mechanisms from issue 689
(phase 5 + docs), still without touching any existing API.

what:
- engines.asyncio: real AsyncSubprocessEngine on
  create_subprocess_exec (terminates the child on cancellation; not a
  thread wrapper), mirroring the classic engine's output handling so it
  returns the same typed result
- ops.plan: LazyPlan records operations without touching tmux and
  resolves SlotRef forward refs at execute time via a sans-I/O
  generator; sync execute() and async aexecute() share one resolution
  core (run vs await arun is the only divergence); whole-plan
  serialization round-trips
- ops.catalog: registry-driven CatalogEntry list (scope, version
  gates, effects, safety, result type, summary) -- the single source a
  docs domain renders, so runtime and docs cannot drift
- tests: lazy resolution sync+async, plan serialization, catalog
  coverage, async-vs-sync classic parity against a real tmux server
why: Proves control mode is just another engine returning the same
typed result (issue 689 phase 4) -- an operation run over a persistent
tmux -C connection is indistinguishable, at the result level, from one
run via fork-per-call subprocess.

what:
- engines.control_mode: ControlModeEngine over one persistent tmux -C
  connection; run_batch pipelines commands and parses each command's
  %begin/%end/%error block into a CommandResult; selectors-based
  nonblocking reads with timeout; startup-ACK discard; lifecycle via
  close()/context manager (lock-guarded teardown)
- engines.control_mode: I/O-free ControlModeParser, unit-testable
  without tmux, adapted from the chain runner + protocol-engines parser
- register control_mode in the engine registry and export it
- tests: pure parser tests + real-tmux contract (split creates a real
  pane, batched commands, control-vs-concrete parity)
why: Demonstrates the "mode lives in the type" model from issue 689 --
EagerPane.split() returns a live EagerPane while LazyPane.split() returns
a deferred LazyPane, each a single statically-known return type, both
backed by the same SplitWindow operation. One Pane class with a
runtime-bound engine could not type these return values distinctly.

what:
- facade.pane.EagerPane: executes immediately, returns live handles
  (split -> EagerPane), typed results for capture/send_keys
- facade.pane.LazyPane: records into a LazyPlan, returns deferred handles
  (split -> LazyPane bound to the new pane's SlotRef), chainable
- seed of the wider Server/Session/Window/Pane/Client x mode matrix
- tests: eager live handles, lazy deferral + forward-ref resolution,
  and same-operation-backs-both-facades parity
why: Closes the two async gaps from issue 689: control mode and concrete
had no async sibling. The async control engine is the one async engine
that earns its place -- it adds an event stream subprocess cannot -- and
prior libtmux/mux control-mode work (surfaced across agent histories via
agentgrep, plus the asyncio-2 branches) shaped its correlation design.

what:
- engines.async_control_mode: AsyncControlModeEngine over a persistent
  tmux -C (create_subprocess_exec + one reader task). FIFO future
  correlation with skip-when-empty so unsolicited %begin blocks (hook-
  triggered commands and the startup ACK) never desync results; the
  startup ACK is consumed synchronously in start() to close the
  correlation race our whole-block parser would otherwise have. DEAD
  state fails pending commands on reader EOF/error. Cancellation via
  asyncio.wait_for (3.10 floor: no asyncio.timeout/TaskGroup). Bounded
  subscribe() notification stream with drop-counting. for_server() helper
- engines.control_mode: ControlModeParser now surfaces bare %-notification
  lines via notifications() (additive; the sync engine ignores them)
- engines.concrete: AsyncConcreteEngine sibling over shared simulation;
  removes the async test shim
- ControlNotification typed event value
- tests: parser notification/drain; async control vs real tmux (split,
  pipelined batch, concrete parity, live event stream, lifecycle)
why: Many tmux commands print nothing (rename-window, kill-pane,
select-window, ...). tmux returns CMD_RETURN_NORMAL on success or calls
cmdq_error on failure, framed in control mode as %end vs %error (see
tmux cmd-queue.c) -- they never cmdq_print. They still need a typed
result that records success/failure without inventing a payload.

what:
- results.AckResult: a typed acknowledgement (no payload) whose
  raise_for_status() still surfaces the error path; documents the tmux
  success/error mapping
- retarget send-keys and select-layout to AckResult (both print nothing)
- add no-output ops: rename-window (mutating), kill-window and kill-pane
  (destructive) -- exercising AckResult across scopes and safety tiers
- export AckResult and the new ops; refresh the catalog doctest
- tests: render + AckResult success/failure across the no-output ops and
  destructive safety metadata; update classic/control parity assertions
why: A neo-like read model is useful, but neo.Obj is one flat ~200-field
class fused to the query/dispatch pipeline. The experimental namespace
lets us try a decoupled, immutable, serializable snapshot layer without
any risk to the shipped ORM APIs.

what:
- libtmux.experimental.models: frozen PaneSnapshot / WindowSnapshot /
  SessionSnapshot / ServerSnapshot, each a typed core plus the full raw
  tmux-format tail in .fields (nothing tmux reported is lost)
- from_format() builds one node from a format mapping;
  ServerSnapshot.from_pane_rows() groups a flat "list-panes -a -F" row
  set into an ordered session/window/pane tree
- to_dict()/from_dict() round-trip the whole tree as plain data, with no
  live objects
- pure tests (no tmux): value coercion, tree grouping/order, round-trip
why: The list/show read commands overlap neo's reader. Rather than
touch the ORM, add a parallel typed read surface in experimental.ops
that yields immutable models snapshots. The render version must thread
into result parsing first, because the -F template is version-gated and
the parser must split against the same fields it was rendered with.

what:
- operation: thread `version` through build_result -> _make_result so
  payload parsing matches the version-gated render (backward compatible;
  existing overrides accept and ignore it); execute.run/arun pass it
- ops._read: re-export neo.get_output_format / parse_output and
  formats.FORMAT_SEPARATOR as the single source of truth (no copies)
- list-panes / list-windows / list-sessions ops (readonly,
  chainable=False) render the same -F template neo builds and parse rows
  into models snapshots
- ListPanesResult/.../ store JSON-friendly rows and derive typed views
  (.panes/.server/.windows/.sessions) via properties, so results
  serialize and round-trip with no special-casing
- tests: -F parity with neo, snapshot-tree build, serialize round-trip,
  and live list-panes/sessions/windows against a real tmux server
why: The operation catalog is registry-derived data, so rendering it in
docs keeps the operation reference from drifting from the code -- and the
docs gate then exercises catalog() on every build.

what:
- docs/_ext/tmuxop.py: an in-repo Sphinx directive `tmuxop-catalog` that
  walks libtmux.experimental.ops.catalog() and emits a table, with
  :scope:/:safety:/:primitive-only: filters; warns (not raises) on empty
- conf.py: add docs/_ext to sys.path and 'tmuxop' to extra_extensions
- docs/experimental.md: an experimental ops/engines overview embedding
  the catalog (full + readonly + destructive views), in the index toctree
why: The sync control engine skipped tmux's startup ACK with a fragile
one-shot flags==0 heuristic and had no defense against hook-emitted
%begin/%end blocks, so a stray block could desync request->result
alignment. The async engine already handles this; backport the approach.

what:
- consume the startup ACK synchronously at connect (_consume_startup),
  dropping the one-shot _startup_ack_pending heuristic, so the startup
  block can never be conflated with a command's result block
- drain buffered unsolicited blocks before each batch
  (_drain_unsolicited), so a hook-triggered command's block left over
  from a prior call is not mis-attributed to the next command
- drain notifications during reads to keep the parser buffer bounded
- regression test: many sequential commands stay aligned (first result
  is real; each call drains before reading its own block)

A hook firing mid-pipelined-batch still needs per-command number
correlation to disambiguate; single-command run() is robust.
why: The chainable-commands prototype folds independent commands into one
"tmux a ; b" dispatch. Our typed-op model is a better host for it -- the
Operation already carries a `chainable` classvar and the result Status
already reserves `skipped` for exactly the chain-drop case. So yes, lazy
mode can adopt the prototype's chainability.

what:
- mark output/creation ops non-chainable (capture-pane, split-window;
  list-* already were) so a fold never drops captured data or an id
- ops._chain: render_chain (join chainable ops with standalone ';',
  escaping a trailing-';' arg), ensure_chainable (fail closed), and
  attribute -- splitting one merged ';'-chain result into a typed result
  per op (success -> all complete; failure -> first failed, rest skipped,
  matching tmux cmd-queue.c cmdq_remove_group); plus OpChain with >>/then
- Operation.__rshift__/then compose into an OpChain; result_with_status()
  builds a result with an explicit status (skipped/failed attribution)
- LazyPlan.execute/aexecute gain fold=False (opt-in): maximal runs of
  chainable, resolved ops dispatch once via engine.run; the sans-I/O
  _drive yields _Single or _Chain so sync and async share the core;
  add_chain() records an OpChain
- tests: >> composition, render_chain, fold=one dispatch, fold-off=N
  dispatches, failure attribution, creators stay unfolded, add_chain
why: Extend the mode-in-the-type facades beyond the pane seed so a typed
return value distinguishes eager/lazy/async across scopes -- and add the
few creation ops the cross-scope navigation needs.

what:
- ops: NewWindow / NewSession (CreateResult, capture the new id),
  KillSession, RenameSession; generalize binding capture via
  Result.created_id (base None; SplitWindowResult -> new_pane_id;
  CreateResult -> new_id) so lazy plans bind window/session creations too
- facade: eager Server -> Session -> Window -> Pane navigation
  (EagerServer/EagerSession/EagerWindow); LazyWindow (records into a
  plan); AsyncPane / AsyncWindow (await arun) -- all over the same ops.
  Control mode stays an engine choice, not a separate facade family
- EagerServer.for_server() binds the classic engine to a live Server
- tests: offline navigation across scopes/modes (concrete engine), and a
  live eager Server -> Session -> Window -> Pane build against real tmux
  with cleanup
why: The native binary peer-protocol engine is the strongest proof the
operation/result contract is transport-agnostic -- the same typed
CommandResult whether produced by a subprocess, tmux -C, or by speaking
tmux's imsg protocol directly. Research confirmed it is pure-stdlib and
CI-verifiable; the prototype it is ported from only ever tested against a
fake socketpair server, never real tmux.

what:
- port engines/imsg/{types,v8,base}.py from libtmux-protocol-engines:
  ImsgEngine over AF_UNIX + sendmsg/recvmsg + SCM_RIGHTS fd-passing, and
  ProtocolV8Codec (=IIII header, IMSG_FD_MARK high bit of len,
  peerid=PROTOCOL_VERSION 8, IDENTIFY -> COMMAND -> WRITE_* -> EXIT
  handshake); posix_spawn local fallback for attach / start-server /
  no-server-running
- adapt to the experimental tuple CommandResult (drop the process field);
  add imsg.exc (ImsgError / ImsgProtocolError / UnsupportedProtocolVersion)
  and select the v8 codec directly; keep the version-mismatch retry
- register as the opt-in "imsg" engine; import-safe everywhere (AF_UNIX
  is only touched at runtime; tests skip without it)
- tests: v8 codec round-trip + MSG_COMMAND framing (no tmux), plus the
  live parity test the prototype lacked -- ImsgEngine vs SubprocessEngine
  return identical stdout/returncode for read-only commands against a
  real tmux server (runs across the CI tmux matrix)
why: Finish the mode-in-the-type matrix so every tmux scope has
eager/lazy/async facades, and add the client-scoped ops a Client facade
needs. The matrix is now 5 scopes x 3 modes, all over the shared spine.

what:
- ops: detach-client, refresh-client, switch-client (AckResult, client
  scope; switch-client renders -c/-t rather than the generic target)
- facade: LazyServer/AsyncServer, LazySession/AsyncSession, and the new
  client scope (EagerClient/LazyClient/AsyncClient); AsyncServer.for_server
  binds the async engine to a live Server
- tests: a lazy full Server->Session->Window->pane plan, async navigation,
  and eager/lazy/async client methods
why: The pre-commit gate now runs `uv run ty check`, so ty must be a
configured dev tool. Brings the ty setup from the add-ty-type-checker
branch and makes the experimental tree ty-clean.

what:
- add `ty` to the dev dependency group (uv.lock updated)
- add [tool.ty] (environment py3.10, src=src/tests) with the documented
  rule ignores for known ty false positives, ported verbatim
- fixes ty surfaced in experimental: Target is now a real union (ty
  rejects an implicit two-string type alias); OperationRegistry.list ->
  select so the `-> list[OpSpec]` return annotation is not shadowed by
  the method name
tony added 29 commits June 28, 2026 16:49
why: tmuxp's window_index config key places a window at a chosen
session index; the builder always appended, ignoring it.

what:
- ir: Window.window_index (threaded through analyze/to_dict)
- compiler: a created window (2..N) with window_index targets
  new-window at `session:N` by suffixing the session SlotRef (":N"),
  so the captured window-id binding is preserved -- zero Core change
- test: window_index renders new-window -t $1:5 and still binds the id

note: window 0 reuses the session's implicit window and keeps the base
index; append-into-existing-session mode (tmuxp load -a) is deferred as
a follow-up -- it restructures the build flow (no new-session, all
windows created) and the fresh-session reuse model is faithful for the
common case.
why: the async-first control-mode server lacked the Declarative tier —
build_workspace was sync-only — so an agent on an async engine could
not build a whole workspace in one call (a documented asymmetry).

what:
- plan_tools: abuild_workspace, the async sibling over
  analyze(spec).abuild(engine)
- fastmcp_adapter: register an async build_workspace on the async
  server, backed by abuild_workspace (mirrors execute_plan's
  conditional-variant type:ignore)
- export abuild_workspace from the mcp package
- test: the async server lists + calls build_workspace offline
why: porting libtmux-mcp's safety surface into the core adapter needs a
single source of truth for the safety tiers and the agent-correctable
error type, ahead of the middleware and the tag-gate.

what:
- _safety.py: TAG_readonly/mutating/destructive, VALID_SAFETY_LEVELS,
  _TIER_LEVELS, resolve_safety_level (None->mutating, valid->verbatim,
  invalid->warn+readonly fail-safe), ExpectedToolError(ToolError)
  (log_level=WARNING default + suggestion) — fastmcp+logging deps only,
  off the framework-agnostic import path
- tests: resolver defaults/fail-safe-with-warning + ExpectedToolError
why: fastmcp's stock transform funnels every expected failure through a
-32603 "Internal error:" catch-all, and its response limiter drops the
tail (terminal scrollback's useful output is at the bottom).

what:
- new mcp/middleware.py with the real fastmcp base-class imports
- ToolErrorResultMiddleware: tool failures -> ToolResult(is_error) with
  the clean message + typed meta (error_type/expected/suggestion);
  _log_error demotes ExpectedToolError + schema-validation to WARNING
- TailPreservingResponseLimitingMiddleware: keeps the tail, prefixes a
  truncation header, re-attaches is_error the base path drops
- the schema-validation + suggestion helpers (no raw input echoed),
  _RESPONSE_LIMITED_TOOLS (engine-ops scrollback tools)
- dropped libtmux-mcp's global fastmcp-log-filter side effect
- tests: tail-keep, error-result meta/suggestion, schema redaction
why: complete the middleware stack — the runtime safety gate (defense in
depth behind the static tag-gate), a structured audit trail, and retries
scoped to readonly tools so a transient socket error never double-runs a
mutating tool.

what:
- SafetyMiddleware: fail-closed tier gate on list + call (untagged tool
  denied); raises ExpectedToolError on an over-tier call
- AuditMiddleware: one INFO record per call, restructured to the project
  logging standard (static message + structured extra: tmux_subcommand/
  outcome/duration_ms/tmux_args), payload args digested (len+sha256)
- ReadonlyRetryMiddleware: composes fastmcp RetryMiddleware, delegates
  only for readonly-tagged tools; trigger LibTmuxException
- loggers namespaced libtmux.experimental.mcp.audit/.retry
- single tier source: _TIER_LEVELS/TAG_* imported from _safety
- tests: audit redaction, fail-closed _is_allowed, retry pass-through
why: replacing libtmux-mcp needs the safety tier-gate and the middleware
stack on the engine-ops servers — gating destructive tools by
LIBTMUX_SAFETY (default mutating) and adding the timing/limit/error/
audit/retry/safety chain.

what:
- _apply_safety_gate (Option A, subtractive): disable only the over-tier
  tiers AFTER register_operations, so the per-op hide is never undone —
  destructive op_* stay hidden at every tier (regression-tested)
- _make_middleware builds the outer->inner stack (Safety innermost,
  fail-closed); passed at FastMCP(middleware=...) construction
- build_server/build_async_server grow safety_level + include_middleware;
  level resolved in-body (env read deferred -> monkeypatchable)
- main() gains --safety; default_server/main forward it
- tests: static visibility per tier, the per-op re-exposure regression,
  destructive-call blocked at readonly, plan-tool tier
- existing kill_*/op_kill_* tests opt into safety_level="destructive"
  (the new default tier hides destructive tools, as intended)
why: libtmux-mcp ships workflow prompts (run-and-wait, diagnose,
build-workspace, interrupt) that package operator-discovered best
practices; the engine-ops server should offer the same, in its own
vocabulary.

what:
- prompts.py: the four recipes rewritten over the engine-ops verbs
  (send_input/wait_for_output/capture_pane/create_session/split_pane),
  not libtmux-mcp's run_command/snapshot_pane/send_keys/split_window
- register_prompts(mcp) via Prompt.from_function; pure string builders,
  identical on the sync and async servers
- both builders gain include_prompts (default True); registered after
  the caller context
- tests: the four prompts register; rendered bodies name only engine-ops
  tools (guards prompt tool-name drift)
why: libtmux-mcp exposes the server->session->window->pane tree as MCP
resources (a read interface distinct from the list_* tools); the
engine-ops server should too, built on its own vocabulary.

what:
- resources.py: register_resources(mcp, engine, *, is_async) with six
  tmux:// resources (sessions, session detail, session windows, window
  detail, pane detail, pane content) over alist_sessions/windows/panes +
  acapture_pane; rows filtered by session_name/window_index/pane_id
- single async body set; a sync server's engine is wrapped once
  (SyncToAsyncEngine) so there is no sync/async duplication
- drop libtmux-mcp's {?socket_name} query var (one socket per engine)
- both builders gain include_resources (default True)
- tests: offline read returns JSON; live read lists the session + pane
  content over a real tmux server
why: fail fast when the engine cannot reach tmux at startup (missing
binary, broken connection) instead of surfacing it on the first tool
call — parity with libtmux-mcp's preflight.

what:
- _lifespan.py: make_lifespan(engine) runs list-sessions at startup and
  raises RuntimeError only on an engine-broken outcome (it raises), never
  on a tmux-side error (returned as a CommandResult, e.g. no server)
- build_async_server gains lifespan (default True), passed at FastMCP
  construction; the sync server stays lifespan-less
- tests: broken engine fails the preflight; a tmux-side error is tolerated

note: the paste-buffer GC half of libtmux-mcp's lifespan is deferred —
engine-ops does not namespace MCP-created buffers, so there is no prefix
to GC (a follow-up).
why: the declarative workspace tier had no human entry point — building a
workspace meant calling analyze()+build() in Python. Mirror `tmuxp load`
so a .tmuxp.yaml launches from the shell.

what:
- workspace/cli.py: `python -m libtmux.experimental.workspace.cli load
  <file>` resolves a workspace file (path / directory -> .tmuxp.*/ bare
  name under $TMUXP_WORKSPACEDIR), expands ~/$VAR/./ paths relative to the
  file's dir (the cwd-bound step analyze() deliberately omits), analyzes +
  builds over a SubprocessEngine, then attaches (switch-client when inside
  tmux) unless -d; -L/-S socket, -s session-name override
- an already-running session is attached, not rebuilt (FileExistsError ->
  attach), matching tmuxp's behavior
- tests: file resolution (path/dir/missing), ./-relative path expansion,
  arg parsing, and a live detached build whose windows/panes match the file
why: real .tmuxp.yaml files use `- blank` / `- pane` / `- ` to mean "an
empty pane" (no command) — the analyzer was sending those as literal
commands. And launching a file blind is risky; a dry run lets you see the
tmux commands first.

what:
- analyzer: a pane whose sole content is None / "blank" / "pane" / "" (a
  bare string or a single-element shell_command) is now an empty pane,
  matching tmuxp's expand_cmd; a blank mixed with real commands is left
  alone
- cli: `load --dry-run` prints the tmux command lines (resolved against
  the in-memory ConcreteEngine so ids render) with host steps as comments,
  executing nothing
- tests: blank/pane/empty shorthands -> empty panes; dry-run prints the
  commands (blank pane creates a split but sends no keys) and starts no
  tmux server
why: window 0 reuses the session's implicit window/pane, so its first
pane inherited the *session* start_directory (-c on new-session) instead
of the window's. A per-project tmuxp config (each window cd'd into its
repo) opened window 1's first pane in the session root, not the repo.

what:
- compiler: _creator_start_directory folds the window's (and its first
  pane's) start_directory into the creator's -c with pane -> window ->
  session precedence; used for both new-session (window 0) and new-window
  (windows 2..N). A window without start_directory still falls back to
  the session's, so existing behavior is unchanged.
- test: window 0's start_directory drives new-session -c; fallback to the
  session dir; a first pane's own start_directory wins
why: The declarative runner needs to fold tmux dispatches yet still
interleave host-side steps (sleeps, pane-ready waits) between them.
These additive Core primitives let any driver reuse the plan
trampoline for that without putting host I/O in the sans-I/O core.

what:
- Add StepReport + _Host sentinel; _drive yields it after each step
  binds its results (the sched.delayfunc(0) seam), performing no I/O
- Add an on_step hook to execute/aexecute; extract _adispatch as the
  async twin of _dispatch so both pumps share one dispatch seam
- Add BoundedPlanner: run an inner planner over the full op list,
  then split its steps wherever a host-step boundary falls (a marked
  fold demotes to plain ; chains past the boundary)
- Export BoundedPlanner and StepReport from the ops package
- Test the hook stream, sync/async parity, and bounded splitting
why: A declarative build paid one tmux dispatch per operation because
the runner forked its own per-op loop to interleave host steps,
bypassing the Core planner. A multi-pane window now renders in a few
round-trips instead of dozens, with the same result.

what:
- Drive build_workspace/abuild_workspace through LazyPlan.execute with
  BoundedPlanner(MarkedPlanner, frozenset(host_after)) and an on_step
  hook that replays each index's host steps and build events, deleting
  the hand-rolled per-op loop
- Default the build to folding; add planner= to the runner functions
  and Workspace.build/abuild so a caller can override (e.g.
  SequentialPlanner for one legible tmux call per op)
- host_after keys are the fold boundaries, so sleeps, the wait_pane
  anti-race, and before_script keep a fold from ever crossing a pause;
  the PlanResult is identical, only the dispatch count drops
- Add folding contract tests (dispatch reduction, planner equivalence,
  boundary rules, live subprocess) and a CHANGES deliverable
why: The dry run rendered the unfolded sequential plan, but the build
folds by default -- so the preview misrepresented the dispatches that
would actually run (one tmux line per op instead of the ; chains).

what:
- Drive the dry run through the same BoundedPlanner(MarkedPlanner) the
  build uses, via a recording engine, so the printed lines are the real
  folded dispatches; a standalone ; renders as \; (copy-pasteable) and
  the header reports the dispatch count and shape
- Add --no-fold to load (and a fold= param) that controls BOTH the dry
  run rendering and the real build planner, keeping them consistent
- Cover the folded/{marked} dry run, --no-fold, and flag parsing
why: The engine-ops spine had 60 operations but none for tmux 3.7's
new-pane (floating panes); the workspace builder, facade, and MCP had
nothing to lower a floating pane into.

what:
- Add NewPane(Operation[SplitWindowResult]) rendering new-pane with
  absolute floating geometry (-x/-y size, -X/-Y position; cells or N%),
  -Z/-d/-E, styles, environment, and -P -F capture
- Reuse SplitWindowResult so SlotRef binding, facade, and MCP keep
  working unchanged; first op to set min_version='3.7' (whole-command
  version gate)
- Register + export NewPane; refresh the catalog all-kinds doctest
- Cover render/round-trip/registry/version-gate plus a live floating
  pane test asserting pane_floating_flag on tmux 3.7+
why: tmux 3.7 NULL-derefs the server on a nameless break-pane (fixed
upstream after 3.7) and ignores -n when one is given. The experimental
BreakPane op emitted no -n for nameless breaks, crashing the 3.7 server.
Mirrors the fix already shipped in Pane.break_pane (#693).

what:
- Inject a placeholder -n on exactly tmux 3.7 when no name is requested
- Gate via _normalize_tmux_version exact match; other builds render bare
- Document the workaround and cover placeholder/bare/named render paths

The gate fires only when a tmux version reaches args(); the engine
version resolution that activates it for live runs lands next.
why: Operations are version-aware, but execution defaulted to
version=None, so version-gating (flag drops, whole-command gates, the
break-pane 3.7 workaround) silently did nothing unless a caller threaded
the version by hand. This is why test_break_and_swap_live still crashed
even with the BreakPane workaround in place.

what:
- Add the optional SupportsTmuxVersion engine capability (base.py) and
  implement tmux_version() on the subprocess + asyncio engines (memoized
  `tmux -V`, None when unknown)
- Add resolve_engine_version() and use it in run()/arun() and at the
  LazyPlan execute()/aexecute() entry points so the live tmux version
  reaches rendering when the caller passes none
- Explicit version still wins; engines without the capability assume
  latest, so fakes and the in-memory engine are unaffected
- Cover resolution + gating activation for run/arun and a folded plan;
  this greens test_break_and_swap_live on tmux 3.7
why: The declarative workspace IR had no way to express tmux 3.7
floating panes; a user could not declare a floating overlay (e.g. a
lazygit popup) in a spec at all.

what:
- Add a Float geometry value type (width/height -> -x/-y size, x/y ->
  -X/-Y position; cells or N%) and FloatingPane (a Pane + Float +
  attach_to)
- Add Window.floats: Sequence[FloatingPane] overlays, kept as a plain
  declarative data shape like panes (NOT a live QueryList -- QueryList
  is the live object-query layer, not the spec)
- Round-trip floats through analyze()/to_dict(); export Float +
  FloatingPane from the workspace package
- Cover to_dict, defaults, and round-trip

Inert data only; the compiler emit + events/confirm wiring lands next.
why: Declared floating panes (Commit prior) were inert -- the compiler
had no branch to lower them, so a float-bearing workspace ignored its
overlays.

what:
- Factor per-pane command sending into _emit_pane_commands, shared by
  tiled panes and floats (uniform wait_pane / suppress_history / sleeps)
- Emit each Window.floats overlay as a NewPane after the tiled layout,
  targeting the window's first pane and kept out of the split chain and
  select-layout; send the float's own commands and honor its focus
- events: emit PaneCreated for new_pane; confirm: fold floats into the
  expected pane count (tiled + floats) so confirm() does not flag a
  spurious mismatch
- Reject cross-window attach_to for now (the symbol table lands next)
- Cover compile order, geometry/command emission, the attach_to guard,
  an offline in-memory build, and the new_pane event
why: A floating pane could only attach to its host window; the compiler
rejected attach_to pointing at another window. Cross-window overlays
(e.g. a status float over a different window) need name-based references
resolved across the whole spec.

what:
- Add a Symbols registry (Django app-registry style): each declared
  window publishes its first-pane SlotRef by name, so a float's attach_to
  resolves to any window declared anywhere (forward or backward)
- Add _topo_order, a graphlib.TopologicalSorter primitive that orders the
  reference graph (floats after the windows they attach to) and rejects
  cycles -- the seam for future join-pane / cross-window ops
- Compile floats in a second wire phase after every window exists, so
  cross-window SlotRefs always resolve; lift the cross-window raise and
  instead raise only for an undeclared attach_to name
- Cover cross-window attach (forward ref), offline build, unknown
  attach_to, Symbols.resolve, and _topo_order ordering + cycle detection
why: The spine could list panes, but there was no ergonomic, chainable
way to filter/order/project live panes the way QueryList powers
server.panes -- the read half of the chainable-prototype DX.

what:
- Add panes() -> PaneQuery: an immutable, chainable query
  (filter/order_by/limit/all/first/map) over live panes
- Resolve against a source that is either a TmuxEngine (a list-panes
  read) or a pure Sequence[PaneSnapshot]; filtering reuses QueryList so
  Django-style lookups (active=True, current_command="vim") work on
  snapshots
- map() returns a MappedPaneQuery for pure data projections
- Cover filter/order/limit/map/first/immutability, the empty-engine
  source, and a live engine-backed query scoped by window

This is the live-object query layer (distinct from the declarative
workspace IR); the command-building half (PaneRef + commands) is next.
why: The query read live panes but could not act on them. The chainable
prototype's headline DX is "do X to every pane matching Y in one tmux
call" -- bulk commands over a filtered set, folded to a single dispatch.

what:
- Add PaneRef (a matched pane + a cmd namespace) and BoundPaneCommands
  (send_keys/resize/select/respawn/clear_history/kill), each recording a
  typed op into a shared plan
- Add PaneQuery.commands(mapper) -> CommandPlan; CommandPlan.to_plan
  builds the ops against a snapshot (pure/inspectable) and CommandPlan.run
  reads the engine, builds, and dispatches folded (FoldingPlanner) by
  default
- Layered entirely over LazyPlan/SlotRef/Planner -- no new execution path
- Cover op-per-pane building, each command kind, the empty-match no-op,
  and a live folded run

The bulk-command layer over the live query (G18); fluent split/forward
handles remain a possible follow-up.
why: The typed pane facades exposed split() but not new_pane(), so
floating panes were reachable from the ops/workspace tiers but not the
eager/lazy/async handles that are the modern facade surface.

what:
- Add new_pane() to EagerPane (live handle), LazyPane (deferred handle
  over the plan), and AsyncPane (awaited live handle), each returning a
  handle to the created floating pane
- Share a _new_pane_op builder across the three facades so the floating
  geometry vocabulary (width/height/x/y/zoom/empty/styles/...) stays in
  one place
- Cover eager/lazy/async new_pane (live handle, recorded op + render,
  awaited handle)
why: NewPane auto-projects as op_new_pane, but that surface is hidden
behind the per-op tag; agents reach for the curated, always-visible
vocabulary. Floating panes had no curated tool, so they were effectively
undiscoverable.

what:
- Add anew_pane to the pane vocabulary (async-first) and new_pane =
  synced(anew_pane); FastMCP derives the input schema from the signature
  and the output schema from PaneResult
- Register ("new_pane", "mutating") in the adapter _TOOLS table; export
  anew_pane/new_pane from the vocabulary and new_pane from the mcp facade
- The tool description notes the tmux 3.7+ requirement
- Cover the curated new_pane tool over the in-memory engine

Surfacing whole-op min_version into the auto-projected op_* schema
(G8) remains a small follow-up.
why: The descriptor projected per-flag version gates but not a whole
operation's min_version, so the auto-projected op_new_pane advertised no
tmux requirement -- an agent on an older tmux would hit a raw
VersionUnsupported instead of a documented gate.

what:
- Add ToolDescriptor.min_version, populated from OpSpec.min_version
- Append "Requires tmux >= X.Y." to the projected tool description when a
  whole-command gate is set
- Cover op_new_pane surfacing min_version 3.7 (and an ungated op not)
why: wait_for_output takes target=, not pane=; a recipe emitting
pane= would fail FastMCP schema validation before dispatch.

what:
- Replace pane= with target= in run_and_wait,
  diagnose_failing_pane and interrupt_gracefully
- Add parametrized regression test asserting target= usage
why: A non-str/non-Mapping shell_command item (int, float, list)
was silently dropped, hiding malformed config from the user.

what:
- Raise TypeError on unsupported shell_command items, matching the
  module's existing "unsupported pane config" error style
- Keep None tolerated (a blank mixed with commands, tmuxp parity)
- Add parametrized tests for rejected and normalized items
why: A split pane with its own environment dropped the window
environment entirely, contradicting the documented "inherited by
its panes" contract and the first pane's merged creator env.

what:
- Merge window + pane environment for split-window -e (pane wins)
- Correct the creator-env test to assert the merged split env
- Add parametrized tests for window/pane env precedence
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

1 participant