$ cargo run --release wast ./reports/001-stack-switching-stale-trap-handler/repro.wast -Wgc,exceptions,function-references,stack-switching
Finished `release` profile [optimized] target(s) in 0.12s
Running `target/release/wasmtime wast ./reports/001-stack-switching-stale-trap-handler/repro.wast -Wgc,exceptions,function-references,stack-switching`
zsh: segmentation fault (core dumped) cargo run --release wast -Wgc,exceptions,function-references,stack-switching
Details
# Stack switching: parent-stack trap after `resume` reads stale `last_wasm_entry_sp` / `last_wasm_entry_trap_handler`
Scope: crates/wasmtime/src/runtime/vm/stack_switching.rs,
crates/cranelift/src/func_environ/stack_switching/instructions.rs,
crates/wasmtime/src/runtime/vm/traphandlers.rs.
Severity: Crash (SIGSEGV) on a code path that should produce a clean wasm
trap. Stack switching is currently 🚧 (work-in-progress) on x86_64 Cranelift,
so this is not yet a security issue per the stability tiers, but it is a
soundness/runtime bug that must be fixed before stack switching can graduate
to a tier-1 feature.
Required configuration: Config::wasm_stack_switching(true) (and its
prerequisites: wasm_function_references(true), wasm_exceptions(true)).
The bug only manifests on unix + x86_64, which is the only platform on
which stack switching currently compiles.
Summary
VMStackLimits (the per-stack snapshot of VMStoreContext taken on
stack_switch) only contains stack_limit and last_wasm_entry_fp. It is
missing last_wasm_entry_sp and last_wasm_entry_trap_handler. As a
result, when a continuation runs and then hands control back to its parent
(via suspend or by returning normally), VMStoreContext.last_wasm_entry_sp
and VMStoreContext.last_wasm_entry_trap_handler still hold the values that
were written by the continuation's array-to-wasm trampoline. Those values
point into the (now suspended or torn-down) continuation's stack frame.
The next time the parent's wasm traps via a hardware signal (e.g.
unreachable → SIGILL, OOB memory access → SIGSEGV), the wasmtime signal
handler reads entry_trap_handler() and uses those stale sp / pc values
to set RSP and RIP via store_handler_in_ucontext. The kernel resumes the
process with RSP and RIP pointing into the continuation's stack while RBP
points into the parent's stack. The result is an immediate SIGSEGV (the
observed symptom in the reproducer) or, depending on what is left in the
continuation's stack, silent corruption or a confused stack-switch back to
the parent that swallows the trap.
The broken invariant
The contract — written into the doc comments of
write_limits_to_vmcontext and load_limits_from_vmcontext — says that on
resume/suspend, last_wasm_entry_sp is saved and restored along with
stack_limit:
crates/cranelift/src/func_environ/stack_switching/instructions.rs:718
/// Sets `last_wasm_entry_sp` and `stack_limit` fields in
/// `VMRuntimelimits` using the values from the `VMStackLimits` of this
/// object.
pub fn write_limits_to_vmcontext<'a>(...)
crates/cranelift/src/func_environ/stack_switching/instructions.rs:1343
// Note that the resume_contref libcall a few lines further below
// manipulates the stack limits as follows:
// 1. Copy stack_limit, last_wasm_entry_sp and last_wasm_exit* values from
// VMRuntimeLimits into the currently active continuation (i.e., the
// one that will become the parent of the to-be-resumed one)
//
// 2. Copy `stack_limit` and `last_wasm_entry_sp` in the
// `VMStackLimits` of `resume_contref` into the `VMRuntimeLimits`.
But the actual VMStackLimits struct only holds two fields:
crates/wasmtime/src/runtime/vm/stack_switching.rs:73
#[repr(C)]
#[derive(Debug, Default, Clone)]
pub struct VMStackLimits {
/// Saved version of `stack_limit` field of `VMStoreContext`
pub stack_limit: usize,
/// Saved version of `last_wasm_entry_fp` field of `VMStoreContext`
pub last_wasm_entry_fp: usize,
}
…and the cranelift lowering of write_limits_to_vmcontext and
load_limits_from_vmcontext only copies those two fields:
crates/cranelift/src/func_environ/stack_switching/instructions.rs:746-756
let pointer_size = u8::try_from(env.pointer_type().bytes()).unwrap();
let stack_limit_offset = env.offsets.ptr.vmstack_limits_stack_limit();
let last_wasm_entry_fp_offset = env.offsets.ptr.vmstack_limits_last_wasm_entry_fp();
copy_to_vm_runtime_limits(
stack_limit_offset,
pointer_size.vmstore_context_stack_limit(),
);
copy_to_vm_runtime_limits(
last_wasm_entry_fp_offset,
pointer_size.vmstore_context_last_wasm_entry_fp(),
);
last_wasm_entry_sp and last_wasm_entry_trap_handler, however, are
written by the array-to-wasm trampoline every time wasm is entered:
crates/cranelift/src/compiler.rs:1700-1726 (save_last_wasm_entry_context)
let fp = builder.ins().get_frame_pointer(pointer_type);
builder.ins().store(MemFlags::trusted(), fp, vm_store_context,
ptr_size.vmstore_context_last_wasm_entry_fp());
let sp = builder.ins().get_stack_pointer(pointer_type);
builder.ins().store(MemFlags::trusted(), sp, vm_store_context,
ptr_size.vmstore_context_last_wasm_entry_sp());
let trap_handler = builder.ins()
.get_exception_handler_address(pointer_type, block, 0);
builder.ins().store(MemFlags::trusted(), trap_handler, vm_store_context,
ptr_size.vmstore_context_last_wasm_entry_trap_handler());
fiber_start (which runs on every continuation's stack just before the
continuation's wasm body) reaches the wasm body via
VMFuncRef::array_call, which goes through that trampoline:
crates/wasmtime/src/runtime/vm/stack_switching/stack/unix.rs:298
unsafe extern "C" fn fiber_start(
func_ref: *mut VMFuncRef,
caller_vmctx: *mut VMContext,
args: *mut VMHostArray<ValRaw>,
return_value_count: u32,
) {
...
VMFuncRef::array_call(func_ref, None, caller_vmxtx, params_and_returns);
...
}
So the timeline of VMStoreContext.last_wasm_entry_{sp,fp,trap_handler} is:
-
Host enters wasm via array_call on the parent stack. Trampoline writes
parent_sp, parent_fp, parent_trap_pc to VMStoreContext.
-
Parent wasm executes resume. Cranelift IR saves the parent's
last_wasm_entry_fp into parent_csi (line 1366) and overwrites
VMStoreContext.last_wasm_entry_fp with the resumed continuation's
value (line 1367). last_wasm_entry_sp and last_wasm_entry_trap_handler
are not touched here.
-
stack_switch to the continuation's stack. wasmtime_continuation_start
runs fiber_start → VMFuncRef::array_call → array trampoline. The
trampoline writes cont_sp, cont_fp, cont_trap_pc to
VMStoreContext.
-
Continuation wasm runs and either suspends (back into the parent's
resume IR) or returns (back into the parent's resume IR). Either
path reaches code that calls parent_csi.write_limits_to_vmcontext
(lines 1477 and 1586). Only stack_limit and last_wasm_entry_fp are
restored.
-
Parent wasm continues. VMStoreContext.last_wasm_entry_sp is still
cont_sp. VMStoreContext.last_wasm_entry_trap_handler is still
cont_trap_pc.
-
Parent wasm traps. The signal handler in signals.rs:163-185 calls
info.test_if_trap(...) which calls set_jit_trap followed by
entry_trap_handler (traphandlers.rs:953-961):
pub(crate) fn entry_trap_handler(&self) -> Handler {
unsafe {
let vm_store_context = self.vm_store_context.get().as_ref();
let fp = *vm_store_context.last_wasm_entry_fp.get();
let sp = *vm_store_context.last_wasm_entry_sp.get();
let pc = *vm_store_context.last_wasm_entry_trap_handler.get();
Handler { pc, sp, fp }
}
}
This returns Handler { pc: cont_trap_pc, sp: cont_sp, fp: parent_fp }
— three values from two different stacks.
-
store_handler_in_ucontext writes those into the kernel's ucontext,
so the kernel resumes the process with RSP=cont_sp, RBP=parent_fp,
RIP=cont_trap_pc.
The net effect is a longjmp to a PC in the continuation's array trampoline
exception block, but with RBP from a different stack. Pushes/pops via RSP
go to the continuation's stack while local-variable accesses via [RBP] go
to the parent's. In the simplest case the very first such access (or the
trampoline's epilogue pop) faults, which is what the reproducer below
exhibits.
Reproducer
repro.wast (preferred form):
;;! stack_switching = true
;;! exceptions = true
;;! function_references = true
(module
(type $ft (func))
(tag $t (type $ft))
(type $ct (cont $ft))
(func $callee (suspend $t))
(elem declare func $callee)
(func (export "go")
(local $k (ref null $ct))
(local.set $k (cont.new $ct (ref.func $callee)))
(block $h (result (ref null $ct))
(resume $ct (on $t $h) (local.get $k))
(unreachable)
)
(drop)
(unreachable)
)
)
(assert_trap (invoke "go") "unreachable")
A standalone WAT (repro.wat) with the same body is also included for
running directly with the CLI.
Observed behavior
Build and run:
$ cargo build --release -p wasmtime-cli
$ ./target/release/wasmtime run \
-W stack-switching=y -W exceptions=y -W function-references=y \
--invoke go reports/001-stack-switching-stale-trap-handler/repro.wat
$ echo $?
139
Exit code 139 = 128 + 11 = SIGSEGV. The wasmtime CLI normally reports a
wasm trap: unreachable and exits with 134 (as the two control cases
below do); instead, on this input wasmtime is killed by the kernel.
Control 1 — unreachable without stack switching
(module
(func (export "go") (unreachable))
)
→ exit 134, wasm trap: wasm unreachable instruction executed. ✓ correct.
Control 2 — same shape, continuation returns instead of suspending
(module
(type $ft (func))
(tag $t (type $ft))
(type $ct (cont $ft))
(func $callee) ;; just returns, no suspend
(elem declare func $callee)
(func (export "go")
(local $k (ref null $ct))
(local.set $k (cont.new $ct (ref.func $callee)))
(block $h (result (ref null $ct))
(resume $ct (on $t $h) (local.get $k))
(unreachable)
)
(drop)
)
)
→ exit 139, SIGSEGV. ✓ confirms the bug also fires on the
continuation-returns path (the return_block at
instructions.rs:1573-1602 has the same omission as the suspend block).
Suggested fix
Two related places need to change in concert:
-
Add the missing fields to VMStackLimits:
#[repr(C)]
#[derive(Debug, Default, Clone)]
pub struct VMStackLimits {
pub stack_limit: usize,
pub last_wasm_entry_fp: usize,
pub last_wasm_entry_sp: usize,
pub last_wasm_entry_trap_handler: usize,
}
…with corresponding entries in wasmtime-environ's VMOffsets for the
new fields, and an updated VMStackLimits::with_stack_limit.
-
Extend write_limits_to_vmcontext (instructions.rs:721) and
load_limits_from_vmcontext (instructions.rs:764) to copy all four
fields, mirroring the existing two-field block.
The last_wasm_exit_* pair is already saved/restored as part of
CallThreadState::with_old_state for host-driven fiber suspensions
(traphandlers.rs:680-693) but not for stack-switching suspend/resume,
so the same fix shape may be needed for those fields if a host-callable
continuation can suspend with host-frames in between (out of scope for
this report — flagged as a follow-up to investigate).
Severity / impact assessment
- On unix + x86_64 with
wasm_stack_switching(true), any guest module that
performs a resume and then traps on the parent stack will crash the
embedder with SIGSEGV (or worse, given non-deterministic stack contents).
- Stack switching is gated behind a config flag and is currently 🚧 in the
stability matrix, so this is not a tier-1 security issue today. It is a
guest-controllable host crash and would become a security issue the
moment stack switching is promoted to tier 1 (or enabled by default in
any embedder).
- The fix is local and additive (extend the saved/restored set); it does
not change observable wasm semantics.
This input:
fails with:
An LLM-generated summary, possibly incorrect, of this issue is:
Details
# Stack switching: parent-stack trap after `resume` reads stale `last_wasm_entry_sp` / `last_wasm_entry_trap_handler`Scope:
crates/wasmtime/src/runtime/vm/stack_switching.rs,crates/cranelift/src/func_environ/stack_switching/instructions.rs,crates/wasmtime/src/runtime/vm/traphandlers.rs.Severity: Crash (SIGSEGV) on a code path that should produce a clean wasm
trap. Stack switching is currently 🚧 (work-in-progress) on x86_64 Cranelift,
so this is not yet a security issue per the stability tiers, but it is a
soundness/runtime bug that must be fixed before stack switching can graduate
to a tier-1 feature.
Required configuration:
Config::wasm_stack_switching(true)(and itsprerequisites:
wasm_function_references(true),wasm_exceptions(true)).The bug only manifests on
unix + x86_64, which is the only platform onwhich stack switching currently compiles.
Summary
VMStackLimits(the per-stack snapshot ofVMStoreContexttaken onstack_switch) only containsstack_limitandlast_wasm_entry_fp. It ismissing
last_wasm_entry_spandlast_wasm_entry_trap_handler. As aresult, when a continuation runs and then hands control back to its parent
(via
suspendor by returning normally),VMStoreContext.last_wasm_entry_spand
VMStoreContext.last_wasm_entry_trap_handlerstill hold the values thatwere written by the continuation's array-to-wasm trampoline. Those values
point into the (now suspended or torn-down) continuation's stack frame.
The next time the parent's wasm traps via a hardware signal (e.g.
unreachable→ SIGILL, OOB memory access → SIGSEGV), the wasmtime signalhandler reads
entry_trap_handler()and uses those stalesp/pcvaluesto set RSP and RIP via
store_handler_in_ucontext. The kernel resumes theprocess with RSP and RIP pointing into the continuation's stack while RBP
points into the parent's stack. The result is an immediate SIGSEGV (the
observed symptom in the reproducer) or, depending on what is left in the
continuation's stack, silent corruption or a confused stack-switch back to
the parent that swallows the trap.
The broken invariant
The contract — written into the doc comments of
write_limits_to_vmcontextandload_limits_from_vmcontext— says that onresume/suspend,last_wasm_entry_spis saved and restored along withstack_limit:But the actual
VMStackLimitsstruct only holds two fields:…and the cranelift lowering of
write_limits_to_vmcontextandload_limits_from_vmcontextonly copies those two fields:last_wasm_entry_spandlast_wasm_entry_trap_handler, however, arewritten by the array-to-wasm trampoline every time wasm is entered:
fiber_start(which runs on every continuation's stack just before thecontinuation's wasm body) reaches the wasm body via
VMFuncRef::array_call, which goes through that trampoline:So the timeline of
VMStoreContext.last_wasm_entry_{sp,fp,trap_handler}is:Host enters wasm via
array_callon the parent stack. Trampoline writesparent_sp,parent_fp,parent_trap_pctoVMStoreContext.Parent wasm executes
resume. Cranelift IR saves the parent'slast_wasm_entry_fpintoparent_csi(line 1366) and overwritesVMStoreContext.last_wasm_entry_fpwith the resumed continuation'svalue (line 1367).
last_wasm_entry_spandlast_wasm_entry_trap_handlerare not touched here.
stack_switchto the continuation's stack.wasmtime_continuation_startruns
fiber_start→VMFuncRef::array_call→ array trampoline. Thetrampoline writes
cont_sp,cont_fp,cont_trap_pctoVMStoreContext.Continuation wasm runs and either suspends (back into the parent's
resumeIR) or returns (back into the parent'sresumeIR). Eitherpath reaches code that calls
parent_csi.write_limits_to_vmcontext(lines 1477 and 1586). Only
stack_limitandlast_wasm_entry_fparerestored.
Parent wasm continues.
VMStoreContext.last_wasm_entry_spis stillcont_sp.VMStoreContext.last_wasm_entry_trap_handleris stillcont_trap_pc.Parent wasm traps. The signal handler in
signals.rs:163-185callsinfo.test_if_trap(...)which callsset_jit_trapfollowed byentry_trap_handler(traphandlers.rs:953-961):This returns
Handler { pc: cont_trap_pc, sp: cont_sp, fp: parent_fp }— three values from two different stacks.
store_handler_in_ucontextwrites those into the kernel'sucontext,so the kernel resumes the process with
RSP=cont_sp,RBP=parent_fp,RIP=cont_trap_pc.The net effect is a longjmp to a PC in the continuation's array trampoline
exception block, but with RBP from a different stack. Pushes/pops via RSP
go to the continuation's stack while local-variable accesses via [RBP] go
to the parent's. In the simplest case the very first such access (or the
trampoline's epilogue pop) faults, which is what the reproducer below
exhibits.
Reproducer
repro.wast(preferred form):A standalone WAT (
repro.wat) with the same body is also included forrunning directly with the CLI.
Observed behavior
Build and run:
Exit code
139 = 128 + 11 = SIGSEGV. The wasmtime CLI normally reports awasm trap: unreachableand exits with 134 (as the two control casesbelow do); instead, on this input wasmtime is killed by the kernel.
Control 1 —
unreachablewithout stack switching→ exit 134,
wasm trap: wasmunreachableinstruction executed. ✓ correct.Control 2 — same shape, continuation returns instead of suspending
→ exit 139, SIGSEGV. ✓ confirms the bug also fires on the
continuation-returns path (the
return_blockatinstructions.rs:1573-1602has the same omission as the suspend block).Suggested fix
Two related places need to change in concert:
Add the missing fields to
VMStackLimits:…with corresponding entries in
wasmtime-environ'sVMOffsetsfor thenew fields, and an updated
VMStackLimits::with_stack_limit.Extend
write_limits_to_vmcontext(instructions.rs:721) andload_limits_from_vmcontext(instructions.rs:764) to copy all fourfields, mirroring the existing two-field block.
The
last_wasm_exit_*pair is already saved/restored as part ofCallThreadState::with_old_statefor host-driven fiber suspensions(
traphandlers.rs:680-693) but not for stack-switchingsuspend/resume,so the same fix shape may be needed for those fields if a host-callable
continuation can suspend with host-frames in between (out of scope for
this report — flagged as a follow-up to investigate).
Severity / impact assessment
wasm_stack_switching(true), any guest module thatperforms a
resumeand then traps on the parent stack will crash theembedder with SIGSEGV (or worse, given non-deterministic stack contents).
stability matrix, so this is not a tier-1 security issue today. It is a
guest-controllable host crash and would become a security issue the
moment stack switching is promoted to tier 1 (or enabled by default in
any embedder).
not change observable wasm semantics.