perf(sidecar): cut guest fs RPC latency (fs-heavy workloads 5.7–41× faster) by NathanFlurry · Pull Request #77 · rivet-dev/secure-exec

NathanFlurry · 2026-06-19T07:18:42Z

Summary

Three optimizations to the Rust sidecar runtime that speed up guest workloads which
read a mounted host node_modules (and any fs-heavy guest). Each is its own commit.
Benchmarked with an out-of-tree harness (mounted off-the-shelf node_modules;
recursive walk + 1500× stat + 1500× read + module imports), baseline vs. fixed stack
measured back-to-back on an idle host.

Headline: host-filesystem operations 5.7–41× faster (0.3 cold)

operation	baseline	fixed	speedup
recursive walk (909 readdir)	32396 ms	785 ms	41×
stat ×1500	7564 ms	1338 ms	5.7×
read ×1500 small files	7610 ms	1221 ms	6.2×

Module-import time is compile/eval-bound and was flat within noise — this work
targets cross-thread RPC latency, which dominates fs-heavy paths, not compile.

Commits

stable compile-cache root — default_compile_cache_root was keyed by PID, so
every fresh sidecar started with an empty V8 compile cache. Use a stable path
(entries stay namespaced + V8-validated). Neutral on these micro-benchmarks but
conceptually correct for bootstrap/repeated starts; 1 line, harmless.
typed readdir — fs.readdirSync({withFileTypes:true}) returned names only, so
the guest issued one cross-thread stat RPC per entry to build each Dirent,
making a directory walk O(total-entries) RPCs. The handler already openat2s each
child to validate it stays beneath the mount; we now fstat that fd and return
{name,isDirectory} (the guest already consumes typed entries). metadata()
follows symlinks, matching prior statSync semantics (file count unchanged).
Walk 32.4s → 4.6s on its own.
250 µs event-pump interval — guest sync fs/module RPCs are serviced by
pump_process_events, which the stdio select loop only ran on the 5 ms
EVENT_PUMP_INTERVAL timer, so each blocked guest call waited ~5 ms before the
host dequeued it. The sub-ms tokio timer is honored; idle pumps are cheap no-ops,
so the higher cadence costs negligible CPU. stat 7.5s→1.3s, read 7.6s→1.2s; with
ci: add pkg.pr.new preview workflow #2, walk 32.4s→0.79s.

Root cause

The guest V8 isolate runs inside the sidecar process; a guest fs.statSync is a
cross-thread synchronous RPC to the execution loop (not a socket, not cross-process).
Two things dominated: readdir not returning entry types (→ per-entry stat RPCs), and
the 5 ms pump timer gating when those RPCs were serviced.

Tried and reverted / deferred

Combined resolveAndLoad (format+source in one RPC) regressed module import
~4× — module RPC responses are delivered as raw strings, so the {format,source}
object wasn't consumed and every module hit a slower readFileSync fallback.
Adaptive (fine/coarse) and spin-wait pump variants were unstable (livelock /
oscillation) under load; the flat 250 µs interval ships instead.
Binary readFileSync (skip base64) — fully traced: _fsReadFileBinary → fs.readFileSync(binary), and binary is {__agentOsType:"bytes", base64} because
the sync-RPC return type is a JSON Value. True raw transfer needs switching
readFileSync to the status=2 raw-binary bridge-response path — a cross-cutting
protocol change risking every binary read for a now-modest gain (read already 6.2×
faster). Deferred as a follow-up rather than rushed.

Follow-ups

True event-driven pump (notify channel execution→stdio loop, removes the residual
timer wait), cache the host_dir mount-root fd, and the binary readFileSync path.

…code default_compile_cache_root was keyed by process id, so every fresh sidecar got an empty cache and cold module imports never reused compiled bytecode. Use a stable temp path; entries remain namespaced + V8-validated, so sharing is safe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fs.readdirSync({withFileTypes:true}) previously returned names only, so the guest issued one cross-thread stat RPC per entry to build each Dirent. The readdir handler already openat2's every child to validate it stays beneath the mount, so we now fstat that fd in-process and return {name,isDirectory}. The guest's normalizeReaddirEntries already consumes typed entries. metadata() follows symlinks, matching prior statSync semantics (file count unchanged). Recursive walk of node_modules: ~32.4s -> ~4.6s. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

railway-app · 2026-06-19T07:18:51Z

🚅 Environment secure-exec-pr-77 in rivet-frontend has no services deployed.

Guest sync fs/module RPCs are serviced by pump_process_events, which the stdio select loop only runs on EVENT_PUMP_INTERVAL. At 5ms a blocked guest call waited ~5ms before the host dequeued it (~5ms/stat). 250us (the sub-ms tokio timer is honored) cuts it dramatically: over the fs benchmark walk 32.4s->0.79s, stat 7.5s->1.3s, read 7.6s->1.2s. Idle pumps are cheap no-ops so the higher cadence costs negligible CPU. (A true event-driven wake would remove the residual timer wait but needs a notify channel from the execution layer; an adaptive interval was tried but proved unstable.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

NathanFlurry · 2026-06-19T08:17:22Z

Follow-up work split out into issues:

perf(sidecar): event-driven pump to remove residual sync-RPC timer latency #80 — event-driven pump (remove residual sync-RPC timer latency)
perf(sidecar): raw-binary readFileSync transfer (skip base64) #81 — raw-binary readFileSync transfer (skip base64)

NathanFlurry and others added 2 commits June 18, 2026 23:23

NathanFlurry force-pushed the perf-sidecar-guest-rpc branch from 397377a to d71dc89 Compare June 19, 2026 08:10

NathanFlurry changed the title ~~perf(sidecar): cut guest fs RPC latency (fs-heavy workloads 3.5–24× faster)~~ Jun 19, 2026

This was referenced Jun 19, 2026

perf(sidecar): event-driven pump to remove residual sync-RPC timer latency #80

Open

perf(sidecar): raw-binary readFileSync transfer (skip base64) #81

Open

NathanFlurry merged commit a621306 into main Jun 19, 2026
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(sidecar): cut guest fs RPC latency (fs-heavy workloads 5.7–41× faster)#77

perf(sidecar): cut guest fs RPC latency (fs-heavy workloads 5.7–41× faster)#77
NathanFlurry merged 3 commits into
mainfrom
perf-sidecar-guest-rpc

NathanFlurry commented Jun 19, 2026 •

edited

Loading

railway-app Bot commented Jun 19, 2026 •

edited

Loading

NathanFlurry commented Jun 19, 2026

Uh oh!

Labels

1 participant

Uh oh!

Conversation

NathanFlurry commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Headline: host-filesystem operations 5.7–41× faster (0.3 cold)

Commits

Root cause

Tried and reverted / deferred

Follow-ups

railway-app Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

NathanFlurry commented Jun 19, 2026

Uh oh!

Labels

1 participant

NathanFlurry commented Jun 19, 2026 •

edited

Loading

railway-app Bot commented Jun 19, 2026 •

edited

Loading