fix(sandbox): add idle-timeout backstop to microsandbox exec stream#442
Open
arimxyer wants to merge 1 commit into
Open
fix(sandbox): add idle-timeout backstop to microsandbox exec stream#442arimxyer wants to merge 1 commit into
arimxyer wants to merge 1 commit into
Conversation
Signed-off-by: Ari Mayer <ari111097@gmail.com>
Contributor
|
@arimxyer is attempting to deploy a commit to the Vercel Team on Vercel. A member of the Team first needs to authorize it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A microsandbox sandbox command can hang indefinitely. In
adaptMicrosandboxExecToSandboxProcess(
packages/eve/src/execution/sandbox/bindings/microsandbox-process.ts) thecompletion loop consumes the SDK exec async-iterator:
While no
exitedevent has arrived (exitCode === undefined), it does a plainawait iterator.next()with no timeout. The 100 msnextWithTimeoutonlyapplies after an exit event, to drain trailing output. There is a guard for the
stream ending without an exit (
"Microsandbox command ended without an exit event.") but none for the stream stalling open — if the iterator never yieldsexitedand never closes, thisawait(and thefinishedpromise behindwait()) blocks forever. The microsandbox SDK exec layer wraps a native NAPIbinding with no timeout of its own, so there is no timeout anywhere in the JS
stack and a stalled exec becomes an infinite hang of the agent turn /
eve eval.Closes #440.
Fix (self-contained backstop)
This is the graduated fix #1 from the issue: an idle timeout on the no-exit
branch, with no public-API change.
await iterator.next()now goes throughnextWithTimeoutwith anidle deadline. Because each loop iteration starts a fresh race, the deadline
resets on every stdout/stderr/exit event — a command that keeps emitting
output is never killed; only total silence trips it.
finishedpromise is rejected(and the stdout/stderr controllers errored) with a clear error:
"Microsandbox command exceeded idle timeout (<N>ms with no output or exit event).", so a wedged exec surfaces as a failure instead of hanging.kill()is fire-and-forget (void command.kill().catch(() => {})),matching the existing cancellation path in
microsandbox-runtime.ts. Thepremise is that the native binding stalled;
kill()calls into that samebinding, so awaiting it could wedge again. The rejection must not depend on
kill completing.
The terminal branching in the
finallyblock was refactored to a singleterminalErrorvariable so the idle-timeout error, the iterator-threw error, andthe pre-existing "ended without an exit event" error all surface through one
path (first error wins;
ReadableStreamDefaultController.error()is a no-op oncethe stream is no longer readable).
Design tradeoff and default
A pure wall-clock timeout would wrongly kill long-but-progressing commands; an
idle timeout (reset on output) is better but still risks killing a legitimate
long compute that emits no output for the whole window (e.g.
sleep 600, asilent heavy calculation). So the ceiling is a named constant with a generous
5-minute default and an override knob.
A false kill (terminating a legit silent compute) is worse than waiting a few
extra minutes to kill a truly-wedged exec, so the default biases generous. Five
minutes of zero bytes — no stdout, no stderr, no exit — is well beyond any
normal tool command, while still bounding the previously-unbounded hang. Tune it
per environment via the
idleTimeoutMsoption or theEVE_MICROSANDBOX_EXEC_IDLE_TIMEOUT_MSenv var (a malformed value falls back tothe default rather than disabling the backstop).
Out of scope (follow-up)
The issue's fix #2 — threading a cancellation
AbortSignalend-to-end from thetool/turn context through
executeBashOnSandbox→runWithDevelopmentSandboxProgress→
sandbox.run({ command, abortSignal })— touches public API(
SessionContext/ callback context) and belongs in a separate change. It wouldalso make agent-level cancellation and eval
timeoutMsactually terminate arunning command. This PR is the self-contained backstop only.
Test
packages/eve/src/execution/sandbox/bindings/microsandbox-process.test.tsbuildsa fake async-iterable exec handle with a
kill()spy and covers:{kind:"exited", code:0}→wait()resolves{exitCode:0}, stdout delivers data,kill()not called.idle timeout,
wait()rejects with the idle-timeout error andkill()wascalled once. (This also exercises the
idleTimeoutMsoverride knob.)"ended without an exit event" error still fires (guards the
finallyrefactor).pnpm --filter eve run typecheckpasses.