perf(sidecar): cut guest fs RPC latency (fs-heavy workloads 5.7–41× faster)#77
Merged
Conversation
…code default_compile_cache_root was keyed by process id, so every fresh sidecar got an empty cache and cold module imports never reused compiled bytecode. Use a stable temp path; entries remain namespaced + V8-validated, so sharing is safe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fs.readdirSync({withFileTypes:true}) previously returned names only, so the guest
issued one cross-thread stat RPC per entry to build each Dirent. The readdir
handler already openat2's every child to validate it stays beneath the mount, so
we now fstat that fd in-process and return {name,isDirectory}. The guest's
normalizeReaddirEntries already consumes typed entries. metadata() follows
symlinks, matching prior statSync semantics (file count unchanged). Recursive
walk of node_modules: ~32.4s -> ~4.6s.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
🚅 Environment secure-exec-pr-77 in rivet-frontend has no services deployed. |
Guest sync fs/module RPCs are serviced by pump_process_events, which the stdio select loop only runs on EVENT_PUMP_INTERVAL. At 5ms a blocked guest call waited ~5ms before the host dequeued it (~5ms/stat). 250us (the sub-ms tokio timer is honored) cuts it dramatically: over the fs benchmark walk 32.4s->0.79s, stat 7.5s->1.3s, read 7.6s->1.2s. Idle pumps are cheap no-ops so the higher cadence costs negligible CPU. (A true event-driven wake would remove the residual timer wait but needs a notify channel from the execution layer; an adaptive interval was tried but proved unstable.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
397377a to
d71dc89
Compare
This was referenced Jun 19, 2026
Member
Author
|
Follow-up work split out into issues:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three optimizations to the Rust sidecar runtime that speed up guest workloads which
read a mounted host
node_modules(and any fs-heavy guest). Each is its own commit.Benchmarked with an out-of-tree harness (mounted off-the-shelf
node_modules;recursive walk + 1500× stat + 1500× read + module imports), baseline vs. fixed stack
measured back-to-back on an idle host.
Headline: host-filesystem operations 5.7–41× faster (0.3 cold)
Module-import time is compile/eval-bound and was flat within noise — this work
targets cross-thread RPC latency, which dominates fs-heavy paths, not compile.
Commits
default_compile_cache_rootwas keyed by PID, soevery fresh sidecar started with an empty V8 compile cache. Use a stable path
(entries stay namespaced + V8-validated). Neutral on these micro-benchmarks but
conceptually correct for bootstrap/repeated starts; 1 line, harmless.
fs.readdirSync({withFileTypes:true})returned names only, sothe guest issued one cross-thread
statRPC per entry to build eachDirent,making a directory walk O(total-entries) RPCs. The handler already
openat2s eachchild to validate it stays beneath the mount; we now
fstatthat fd and return{name,isDirectory}(the guest already consumes typed entries).metadata()follows symlinks, matching prior
statSyncsemantics (file count unchanged).Walk 32.4s → 4.6s on its own.
pump_process_events, which the stdio select loop only ran on the 5 msEVENT_PUMP_INTERVALtimer, so each blocked guest call waited ~5 ms before thehost dequeued it. The sub-ms tokio timer is honored; idle pumps are cheap no-ops,
so the higher cadence costs negligible CPU. stat 7.5s→1.3s, read 7.6s→1.2s; with
ci: add pkg.pr.new preview workflow #2, walk 32.4s→0.79s.
Root cause
The guest V8 isolate runs inside the sidecar process; a guest
fs.statSyncis across-thread synchronous RPC to the execution loop (not a socket, not cross-process).
Two things dominated: readdir not returning entry types (→ per-entry stat RPCs), and
the 5 ms pump timer gating when those RPCs were serviced.
Tried and reverted / deferred
resolveAndLoad(format+source in one RPC) regressed module import~4× — module RPC responses are delivered as raw strings, so the
{format,source}object wasn't consumed and every module hit a slower
readFileSyncfallback.oscillation) under load; the flat 250 µs interval ships instead.
readFileSync(skip base64) — fully traced:_fsReadFileBinary → fs.readFileSync(binary), and binary is{__agentOsType:"bytes", base64}becausethe sync-RPC return type is a JSON
Value. True raw transfer needs switchingreadFileSync to the
status=2raw-binary bridge-response path — a cross-cuttingprotocol change risking every binary read for a now-modest gain (read already 6.2×
faster). Deferred as a follow-up rather than rushed.
Follow-ups
True event-driven pump (notify channel execution→stdio loop, removes the residual
timer wait), cache the host_dir mount-root fd, and the binary
readFileSyncpath.