Skip to content

perf: avoid throw-as-control-flow in SessionManager hot path (#7756)#7775

Draft
JohnMcLear wants to merge 1 commit into
developfrom
perf/session-getinfo-no-throw
Draft

perf: avoid throw-as-control-flow in SessionManager hot path (#7756)#7775
JohnMcLear wants to merge 1 commit into
developfrom
perf/session-getinfo-no-throw

Conversation

@JohnMcLear

Copy link
Copy Markdown
Member

Summary

CPU profile of the SUT at the 100→400 author dive sweep (load-test workflow run 25956384097) attributed ~6% of total process CPU to the throw + catch around getSessionInfo:

  • ~1.82% to new CustomError('sessionID does not exist', 'apierror') (stack-trace capture)
  • ~4.12% downstream, via the catch block's console.debug(...) routed through log4js → sendToListenerssendLogEventToAppender

Both internal callers (findAuthorID on every CLIENT_READY, listSessionsWithDBKey on session listing) caught apierror and discarded it. The public exports.getSessionInfo still has to throw for the HTTP API (returns code: 1 for missing sessionID), so this PR introduces a private getSessionInfoOrNull helper that returns null and switches the two hot-path callers to it. exports.getSessionInfo becomes a thin wrapper that preserves the throw semantics.

Profile evidence

Inverted callers of CustomError constructor in the profile:

 1.82% exports.getSessionInfo [src/node/db/SessionManager.ts]

Inverted (non-log4js) callers of log4js Logger.<computed>:

 4.12% exports.checkAccess [src/node/db/SecurityManager.ts]

(checkAccess is the entry point that drives findAuthorIDgetSessionInfoconsole.debug.)

Behavior

  • No behavior change for the HTTP API. getSessionInfo still throws `apierror` when the session doesn't exist; RestAPI.ts/APIHandler.ts translate that to code: 1.
  • The two internal callers preserve their existing semantics: findAuthorID returns undefined for unknown sessions, listSessionsWithDBKey continues to log a warning and set sessions[sessionID] = null.

Test plan

  • `pnpm exec mocha tests/backend/specs/api/sessionsAndGroups.ts` — all 32 cases pass, including "getSessionInfo of deleted session" (still expects code:1).
  • Re-run the dive sweep with this branch as core_ref and confirm steady-state CPU% drops at the cliff (300-400 authors).

Part of #7756. Profile capture pipeline is in etherpad-load-test#109/110/111.

CPU profile of the SUT at the 100-400 author dive sweep
(load-test workflow run 25956384097) attributed about 6% of total
process CPU to the throw + catch around getSessionInfo:

  - ~1.82% to `new CustomError('sessionID does not exist', 'apierror')`
    construction (stack trace capture)
  - ~4.12% downstream, via the catch block's `console.debug(...)`
    routed through log4js -> sendToListeners -> sendLogEventToAppender

Both call sites (`findAuthorID` on every CLIENT_READY, and
`listSessionsWithDBKey` on session listing) immediately caught
`apierror` and discarded it. The public `exports.getSessionInfo`
contract still has to throw for the HTTP API (returning code:1 for
missing sessionID), so introduce a private `getSessionInfoOrNull`
helper that returns null and have the hot-path callers use it
directly. `exports.getSessionInfo` is kept as a thin wrapper that
preserves the existing throw semantics.

No behaviour change for the HTTP API — sessionsAndGroups.ts test
file (32 cases, including "getSessionInfo of deleted session"
expecting code:1) passes unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@qodo-free-for-open-source-projects

Copy link
Copy Markdown

Review Summary by Qodo

Avoid throw-as-control-flow in SessionManager hot path

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Eliminate throw-as-control-flow in hot-path callers of getSessionInfo
• Introduce private getSessionInfoOrNull helper returning null
• Refactor findAuthorID and listSessionsWithDBKey to use null checks
• Reduce CPU overhead by ~6% (1.8% CustomError construction + 4% logging)
Diagram
flowchart LR
  A["getSessionInfoOrNull<br/>returns null"] --> B["findAuthorID<br/>null check"]
  A --> C["listSessionsWithDBKey<br/>null check"]
  D["exports.getSessionInfo<br/>wrapper"] --> E["throws apierror<br/>for HTTP API"]
  A --> D
Loading

Grey Divider

File Changes

1. src/node/db/SessionManager.ts Performance optimization +20/-22

Replace throw-catch with null-check pattern

• Introduce private getSessionInfoOrNull helper that returns null instead of throwing
• Refactor findAuthorID to use null-returning helper with simple || check
• Refactor listSessionsWithDBKey to use null-returning helper with explicit null check
• Keep exports.getSessionInfo as thin wrapper preserving throw semantics for HTTP API

src/node/db/SessionManager.ts


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented May 16, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider

Great, no issues found!

Qodo reviewed your code and found no material issues that require review

Grey Divider

Qodo Logo

JohnMcLear added a commit that referenced this pull request May 16, 2026
CPU profile of develop at the 100-400 author dive sweep (load-test
run 25956384097) identified a ~6% process-CPU win in SessionManager:
throw-as-control-flow on every CLIENT_READY session lookup.

Add lever 9 section with the profile evidence, link the open PR
(#7775), and add a "Other CPU hotspots surfaced" subsection
documenting findings not yet acted on (Changeset internals,
appendRevision, ueberdb/dirty backing as test-harness artifact,
esbuild __name overhead). Update Recommendation to include #7775
as the highest-priority merge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JohnMcLear added a commit that referenced this pull request May 16, 2026
Replace the "score pending" placeholder under lever 9 with the
actual numbers from runs 25957107195/25957108328/25957109418
(perf branch) vs 25954537767/25954538807/25954540108 (develop),
both at authors=100..500:step=50:dwell=8s:warmup=2s.

Result: consistent -1.4% to -5.3% CPU reduction across all 9 steps,
matching profile direction at 2-5% (vs 6% profile-attributed upper
bound). Latency delta sits inside the noise envelope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JohnMcLear

Copy link
Copy Markdown
Member Author

N=3 measured impact

Three perf-branch runs (25957107195, 25957108328, 25957109418) compared against three develop baselines (25954537767, 25954538807, 25954540108) — same authors=100..500:step=50:dwell=8s:warmup=2s sweep on each.

Medians:

step dev CPU% perf CPU% ΔCPU%
100 4.76 4.67 -1.7%
150 9.09 8.95 -1.5%
200 15.21 14.60 -4.0%
250 21.51 21.32 -0.9%
300 30.46 29.68 -2.6%
350 41.58 39.36 -5.3%
400 56.26 54.23 -3.6%
450 72.33 70.49 -2.5%
500 88.38 87.14 -1.4%

ΔCPU% is consistently negative (-1.4% to -5.3%) across all 9 steps. The realised magnitude is below the profile-attributed 6% upper bound because some of the log4js cost the profile attributed to the throw path was unrelated info logging — but the direction is exactly what the profile predicted.

Latency delta sits inside the noise envelope across all steps (raw p95 triples overlap heavily run-to-run).

JohnMcLear added a commit that referenced this pull request May 16, 2026
Three combined-branch runs (perf/dive-combined = #7776 cherry-picked
onto #7775 base; runs 25960003164/25960004223/25960005248) vs the
same three develop baselines: -12% to -20% CPU% across all 9 sweep
steps, with the p95 cliff effectively moving from ~400 to ~500
authors (at step 400, two of three combined runs land below the
cliff at 45ms and 112ms p95 vs develop [1758, 2275, 2463]).

Adds:
- Lever 10 section for #7776 with its own N=3 numbers (-3.6 to -8.9%
  alone).
- "Stacking" section showing super-additive interaction.
- Local vCPU experiment showing the cliff is single-event-loop-bound,
  not total-CPU-bound: 4-core and 8-core pinned SUTs hit the same
  cliff at the same step.
- Updated TL;DR, Recommendation order (merge both #7775+#7776 first),
  and "Where to take this next" with worker-thread offload as the
  smallest next architectural step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JohnMcLear

Copy link
Copy Markdown
Member Author

#7775 + #7776 stacked — N=3 combined measurement

I cherry-picked #7775 onto #7776 (branch perf/dive-combined) and ran 3 cliff sweeps (25960003164, 25960004223, 25960005248) against the same 3 develop baselines used for the individual PR scoring. The two fixes stack super-additively:

step dev CPU% #7775 alone #7776 alone both Δboth
100 4.76 4.67 4.51 3.99 -16.1%
200 15.21 14.60 14.33 12.48 -17.9%
300 30.46 29.68 28.50 24.39 -19.9%
350 41.58 39.36 37.87 33.04 -20.5%
400 56.26 54.23 53.67 44.78 -20.4%
450 72.33 70.49 68.80 61.18 -15.4%
500 88.38 87.14 85.17 77.70 -12.1%

The stacked impact (-12% to -20% CPU%) is well above the sum of the individual gains. Both fixes remove call sites feeding the same log4js cluster-mode dispatch chain (sendToListeners → sendLogEventToAppender); halving the LogEvent allocation rate appears to relieve queue/GC pressure beyond what either fix accounts for in isolation.

Latency: cliff has effectively moved past step 400. Raw p95 triples:

step develop p95 [3 runs] combined p95 [3 runs]
400 [1758, 2275, 2463] [45, 112, 634]
450 [5415, 6167, 6611] [3297, 3719, 3897] (-40%)
500 [10655, 11759, 12183] [8091, 8711, 9127] (-26%)

At step 400, two of three combined runs land below the cliff entirely. Recommend landing these two together to capture the full effect.

JohnMcLear added a commit that referenced this pull request May 16, 2026
Post-#7775/#7776 profile shows applyToAText splits cleanly:
- applyToText (Changeset.ts:404) is pure (cs, text) -> text; trivially
  offloadable to a worker via worker_threads structured-clone postMessage.
- applyToAttribution (Changeset.ts:684) mutates AttributePool; not
  trivially offloadable.

Document the obvious first-pass design (run them in parallel via
Promise.all inside applyToAText) and the realistic estimate (~6-8%
CPU moved off the main event loop). putAttrib is only 0.26% in the
post-fix profile, confirming the bulk of applyToAText's cost is in
the string-manipulation half.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JohnMcLear added a commit that referenced this pull request May 16, 2026
Add a "Roadmap for future effort" section ahead of Reproducing,
ranking the next concrete options by impact-per-time-spent.

Tier 1 (mechanical / <1 day each):
- merge ready perf PRs (#7775+#7776+#7774)
- implement #7780 room-broadcast fan-out
- additional post-fix profile pass

Tier 2 (medium, real cliff moves):
- selective fan-out / viewport-based broadcast (~2 weeks; cliff ~500 → 1000-1500)
- per-pad worker isolation PoC (~1-2 weeks PoC, 1-2 months prod)

Tier 3 (large bets):
- sticky-session cluster mode (~2-4 weeks PoC)
- CRDT migration (months; anti-recommended)

Tier 4 (operational):
- production telemetry hookup (~3-5 days)
- nightly dive in CI (~1 day)

Records the recommended sequence (Tier 1.2 → Tier 2.4) so the
next person picking this up doesn't need to re-derive it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JohnMcLear JohnMcLear marked this pull request as draft May 18, 2026 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant