Reference

Benchmarks

These numbers measure the public secure-exec SDK paths that consumers actually use. The cold start matrix (packages/benchmarks/coldstart.bench.ts) times the full journey from booting a runtime to running guest code, and compares three ways of provisioning a runtime against how much you can reuse.

Scenarios

owned-sidecar: NodeRuntime.create() boots a fresh sidecar process for each runtime. This is the default, fully isolated path (see Process Isolation).
shared-sidecar: one sidecar process is created once and reused across many runtimes, instead of spawning a fresh sidecar per runtime. Sidecar setup is measured separately and excluded from cold start, so this isolates the per-runtime cost. (Sharing a sidecar is a reuse fast path: spawn one Sidecar and pass it to NodeRuntime.create({ sidecar }), instead of the default NodeRuntime.create() path that owns a fresh sidecar per runtime.)
resident-runner: a shared sidecar plus a resident runner, so repeated small snippets reuse one live guest Node process instead of starting a new one each time.

Cold vs warm

Each scenario reports two latencies:

Cold: provisioning the runtime and running the first guest snippet end to end.
Warm: running a second snippet on the same runtime, after it is already up.

A single sequential runtime (batch size 1) gives the cleanest picture:

Scenario	Cold mean	Cold p50	Warm mean	Warm p50
owned-sidecar	772.88ms	771.96ms	452.68ms	452.37ms
shared-sidecar	774.17ms	774.50ms	457.19ms	452.75ms
resident-runner	351.00ms	349.56ms	1.27ms	1.27ms

The headline result is the resident runner: once the live guest process exists, a warm snippet runs in about 1.3ms, roughly 350x faster than a warm execution that still has to stand up a guest process. Owned and shared sidecars cost about the same per runtime when run sequentially, because the sidecar process itself is cheap to spawn (around 3ms); sharing it mainly matters under concurrency.

Where the cold-start time goes

Breaking down a single owned-sidecar cold start (batch 1, sequential, p50 per phase):

Phase	p50
sidecar_spawn	2.78ms
session_open	2.34ms
vm_create	1.36ms
vm_configure	0.17ms
runtime_mount_node	2.71ms
runtime_mount_wasm	172.95ms
runtime_create_total	176.30ms
first_exec	596.26ms
warm_exec	452.37ms

Two phases dominate: mounting the WASM command set (about 173ms) and the first guest execution (about 596ms, which includes bringing up the guest Node runtime). Spawning the sidecar, opening a session, and creating the VM together cost only a few milliseconds. This is why the resident runner wins so decisively: it pays the first-exec cost once and then reuses the live process.

Concurrency

The matrix also runs each scenario at larger batch sizes, both sequentially and concurrently (up to the host concurrency cap). Cold mean latency by batch size:

Scenario	Mode	b=1	b=10	b=50	b=100	b=200
owned-sidecar	sequential	772.88ms	780.07ms	777.49ms	777.89ms	778.28ms
owned-sidecar	concurrent	776.22ms	1000.39ms	1041.81ms	1041.19ms	1044.03ms
shared-sidecar	sequential	774.17ms	627.58ms	610.76ms	619.24ms	616.20ms
shared-sidecar	concurrent	775.66ms	3493.51ms	3197.96ms	3187.99ms	3235.65ms
resident-runner	sequential	351.00ms	202.61ms	187.92ms	186.10ms	189.76ms
resident-runner	concurrent	347.91ms	205.00ms	187.45ms	187.07ms	191.33ms

Takeaways:

owned-sidecar holds flat when sequential. Run concurrently, per-runtime cold time rises to about 1040ms as many runtimes mount their WASM command sets at once and contend for CPU. Each runtime still has its own sidecar, so they make progress in parallel.
shared-sidecar is efficient sequentially (around 610ms once the shared sidecar is warm) but degrades sharply under concurrency (3000ms or more), because one sidecar process serializes the heavy concurrent mount and first-exec work. Share a sidecar when work arrives sequentially or at low concurrency, not for bursty parallel fan-out.
resident-runner improves with batch size as the one-time runner creation amortizes, settling around 187ms cold and about 1.3ms warm regardless of mode.

Choosing a strategy

Need strong isolation and unpredictable, bursty load: use owned-sidecar (the default). Each runtime is its own crash and resource domain.
Running many short tasks back to back where a shared failure domain is acceptable: a shared-sidecar amortizes setup.
Running the same kind of small snippet over and over (for example an evaluation loop): a resident-runner turns a roughly 450ms warm execution into a roughly 1.3ms one.

See Process Isolation for what each of these shares and isolates.

Methodology

CPU:   12th Gen Intel(R) Core(TM) i7-12700KF
Cores: 20 | Max concurrency: 16 | Max live runtimes: 8
RAM:   62.6 GB | Node: v24.13.0
Iterations: 5 (+ 1 warmup)
Batch sizes: 1, 10, 50, 100, 200
Scenarios: owned-sidecar, shared-sidecar, resident-runner
Sidecar: release build of secure-exec-sidecar

Cold start is wall time from the start of runtime provisioning through the first guest execution. Warm is a second execution on the already-running runtime. For shared-sidecar, the one-time sidecar setup (around 4ms to 5ms) is measured separately and excluded from the per-runtime cold number. Phase timings (sidecar_spawn, session_open, vm_create, runtime_mount_wasm, first_exec, and the resident-runner phases) are recorded in the machine-readable JSON output.

Running the benchmarks

Build a release sidecar first for meaningful timings:

cargo build --release -p secure-exec-sidecar

Run the full suite:

pnpm --dir packages/benchmarks bench

Run only the cold start matrix:

SECURE_EXEC_SIDECAR_BIN="$PWD/target/release/secure-exec-sidecar" \
  pnpm --silent --dir packages/benchmarks bench:coldstart \
  > packages/benchmarks/results/coldstart-local.json \
  2> packages/benchmarks/results/coldstart-local.log

A quick smoke run keeps it to one iteration:

BENCH_BATCH_SIZES=1 \
BENCH_ITERATIONS=1 \
BENCH_WARMUP=0 \
BENCH_SCENARIOS=owned-sidecar,shared-sidecar,resident-runner \
SECURE_EXEC_SIDECAR_BIN="$PWD/target/release/secure-exec-sidecar" \
  pnpm --silent --dir packages/benchmarks bench:coldstart