Stackalloc localloc conditional2#127980
Conversation
…onditional2 # Conflicts: # src/coreclr/jit/codegenarm.cpp # src/coreclr/jit/codegenarm64.cpp # src/coreclr/jit/codegenloongarch64.cpp # src/coreclr/jit/codegenriscv64.cpp # src/coreclr/jit/codegenxarch.cpp # src/coreclr/jit/gentree.cpp # src/coreclr/jit/gentree.h # src/coreclr/jit/jitconfigvalues.h # src/coreclr/jit/lsraarm.cpp # src/coreclr/jit/lsraarm64.cpp # src/coreclr/jit/lsraloongarch64.cpp # src/coreclr/jit/lsrariscv64.cpp # src/coreclr/jit/lsraxarch.cpp # src/coreclr/jit/morph.cpp # src/coreclr/jit/objectalloc.cpp # src/coreclr/jit/objectalloc.h
The merge of upstream/main into StackallocLocallocConditional2 left the body of MorphNewArrNodeIntoStackAlloc in HEAD's old form (taking GenTree* len) while the header was updated to main's new signature (unsigned length, unsigned blockSize, returning unsigned int). Replace the body's signature and preamble with main's version; the rest of the function (including the return lclNum already added during the merge) is unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The five other codegen backends (xarch, arm, arm64, loongarch64, riscv64) zero LCLHEAP allocations when either info.compInitMem is set or the GTF_LCLHEAP_MUSTINIT flag is present on the LCLHEAP node. The wasm backend was missing the flag check, so an LCLHEAP marked MUSTINIT (e.g. the runtime length stack-array path that flows through helperexpansion.cpp) would not be zeroed when compInitMem is false. Validated by building clr.wasmjit subset. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduce an inline helper on Compiler that returns true iff a given LCLHEAP node must zero its allocation, encapsulating the common 'info.compInitMem || (tree->gtFlags & GTF_LCLHEAP_MUSTINIT)' check used by every codegen and LSRA backend. Replace the 12 inlined occurrences across xarch, arm, arm64, loongarch64, riscv64, and wasm with calls to the helper. Validated by building clr.jit and clr.wasmjit subsets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
This PR extends CoreCLR JIT object stack allocation for newarr to support cases where the array length isn’t a compile-time constant by routing allocation through a localloc-based expansion (with a runtime size check that can fall back to heap allocation). It also adds plumbing to ensure such locallocs are zero-initialized even when the method doesn’t have init-mem semantics.
Changes:
- Add an objectalloc “localloc” path for stack-allocated arrays with non-constant (and optionally in-loop) lengths, controlled by new JIT config switches.
- Teach stack-array helper expansion to generate a runtime size computation + conditional localloc vs heapalloc split, using a new well-known arg for element size.
- Introduce
GTF_LCLHEAP_MUSTINITand propagate “must zero” semantics through LSRA and codegen viaCompiler::gtMustZeroLocalloc.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/objectalloc.h | Adds state/config fields and a localloc morph helper declaration; extends stack-allocation eligibility API with lengthKnown. |
| src/coreclr/jit/objectalloc.cpp | Implements localloc morphing for runtime-sized (and in-loop) newarr, and adds an eligibility guard for runtime-sized arrays with GC elements. |
| src/coreclr/jit/morph.cpp | Adds display support for new WellKnownArg::StackArrayElemSize. |
| src/coreclr/jit/lsraxarch.cpp | Updates LCLHEAP reg/zero-init decisions to use gtMustZeroLocalloc / GTF_LCLHEAP_MUSTINIT. |
| src/coreclr/jit/lsrariscv64.cpp | Switches localloc “needs init” checks to gtMustZeroLocalloc. |
| src/coreclr/jit/lsraloongarch64.cpp | Switches localloc “needs init” checks to gtMustZeroLocalloc. |
| src/coreclr/jit/lsraarm64.cpp | Switches localloc “needs init” checks to gtMustZeroLocalloc. |
| src/coreclr/jit/lsraarm.cpp | Switches localloc “needs init” checks to gtMustZeroLocalloc. |
| src/coreclr/jit/jitmetadatalist.h | Adds a new JIT metric counter for localloc-allocated arrays. |
| src/coreclr/jit/jitconfigvalues.h | Adds config knobs for enabling localloc-based stack array allocation and permitting it in loops. |
| src/coreclr/jit/helperexpansion.cpp | Implements runtime-sized stack array expansion into size-check + localloc/heapalloc CFG split, and header initialization for localloc case. |
| src/coreclr/jit/gentree.h | Adds GTF_LCLHEAP_MUSTINIT and WellKnownArg::StackArrayElemSize. |
| src/coreclr/jit/gentree.cpp | Adds formatting for the new well-known arg in arg debug messages. |
| src/coreclr/jit/compiler.h | Adds Compiler::gtMustZeroLocalloc helper to centralize localloc zero-init requirements. |
| src/coreclr/jit/codegenxarch.cpp | Uses gtMustZeroLocalloc in localloc codegen, and records localloc usage. |
| src/coreclr/jit/codegenwasm.cpp | Uses gtMustZeroLocalloc in localloc codegen. |
| src/coreclr/jit/codegenriscv64.cpp | Uses gtMustZeroLocalloc in localloc codegen. |
| src/coreclr/jit/codegenloongarch64.cpp | Uses gtMustZeroLocalloc in localloc codegen. |
| src/coreclr/jit/codegenarm64.cpp | Uses gtMustZeroLocalloc in localloc codegen. |
| src/coreclr/jit/codegenarm.cpp | Uses gtMustZeroLocalloc in localloc codegen. |
| src/coreclr/jit/codegen.h | Adds a genLocallocUsed flag to CodeGen state. |
basicBlockHasBackwardJump and basicBlockInHandler in MorphAllocObjNodes are no longer used after the morph loop body was refactored into MorphAllocObjNodeHelperArr (which queries the block flags directly). MSVC tolerates this via /wd4189; clang/gcc on Linux/Mac warn. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The pointer-size round-up in fgExpandStackArrayAllocation was using (size + TARGET_POINTER_SIZE) & ~(TARGET_POINTER_SIZE - 1) which over-allocates by one pointer when the size is already aligned, and can push already-aligned sizes over the runtime stack threshold. The standard align-up formula is (size + TPS - 1) & ~(TPS - 1). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pre-merge HEAD admitted any 2-arg newarr helper as OAT_NEWARR and filtered constant-vs-variable inside the morph loop. Main's refactor extracted this into AllocationKind() and added an IsCnsIntOrI() gate as cleanup (a no-op at the time because no consumer existed for variable lengths). The merge inherited that gate, making the new m_UseLocalloc dispatch in MorphAllocObjNodeHelperArr unreachable. Relax the gate to also admit variable-length newarr when localloc is enabled, preserving main's behavior when localloc is off. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…king When MorphAllocObjNodeHelperArr dispatches a newarr to MorphNewArrNode- IntoLocAlloc, fgExpandStackArrayAllocation still emits a runtime heap fallback (the size check / heap helper path). The result local can therefore hold either a stack pointer or a real heap object reference, and must remain GC-reportable for the lifetime of the allocation. Previously MorphAllocObjNode unconditionally added the local to both m_PossiblyStackPointingPointers and m_DefinitelyStackPointingPointers, which causes it to be retyped to TYP_I_IMPL (a raw pointer the GC ignores). For the heap-fallback path that creates a GC hole. Plumb a m_definitelyStackPointing flag on AllocationCandidate (default true). Both localloc dispatch sites now clear it so the local stays in m_PossiblyStackPointingPointers only, retains TYP_BYREF, and is reported by the GC as an interior pointer. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous size guard compared the multiply-and-add result
(elemSize * length + base) against the runtime stack-alloc limit, using
a signed compare on TYP_INT against operands typed TYP_I_IMPL. For
negative length, the I_IMPL multiply wraps to a hugely negative value;
the addition wraps further; the signed compare can then pass and
localloc allocates a tiny buffer for what the runtime believes is a
huge array, corrupting the stack instead of throwing OverflowException.
A length near INT32_MAX fails the same way through plain elemSize *
length overflow.
Replace the post-multiply check with a JIT-time-precomputed
length-based unsigned compare:
maxSafeLength = (stackLimit - base - align8Pad) / elemSize
- 1 if elemSize is not pointer-aligned (round-up slack)
if ((uint)length > maxSafeLength) goto heapallocBlock;
This catches negative lengths (which look huge in unsigned space) and
near-INT32_MAX overflows in a single compare. The intermediate
totalSize stays where it is; even if it wraps before the guard, it is
only consumed in locallocBlock, which we only enter when the length is
provably safe. The heap fallback runs the original CORINFO_HELP_NEWARR_1_*
helper, which raises OverflowException for negative lengths.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When the newarr helper's length argument is TYP_LONG (some helper variants), constructing GT_GT with TYP_INT op and a TYP_INT limit fires the codegen assert `genTypeSize(type) >= max(genTypeSize(op1Type), genTypeSize(op2Type))`. Use `genActualType(lengthArg)` for the icon and the comparison so the operands match. The unsigned compare semantics are preserved via GTF_UNSIGNED, and `maxSafeLength` is in-range for both INT and LONG. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Modeled on Delegates.cs: five [Fact] tests covering the localloc dispatch path for variable-length newarr. * TestSmall - non-escaping variable-length int[8], expects stack alloc. * TestLarge - variable-length int[10000], expects heap fallback. * TestNegative, TestIntMin - negative lengths must throw OverflowException through the heap helper (validates the JIT-time length guard). * TestHuge - long[int.MaxValue]: elem * length overflows; helper must throw without stack corruption. Verified the negative-length tests fail without the length guard and pass with it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The localloc/heapalloc dispatch path assumes the original NEWARR call's result is consumed by a STORE_LCL_VAR; the heap-fallback block stores its call result into that same local. When the result is unused (e.g. DCE removed the consumer because escape analysis already retyped the call to byref and the destination became dead), `(*callUse)->gtNext` is null and the assert/dereference at the heap-store site AVs. Bail out of the dispatch when the call is not the data-source of a STORE_LCL_VAR root. The call is dead in that case and a later DCE pass will drop it. Caught 53 c0000005 failures across coreclr_tests.run.windows.x64.checked SPMI replay. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
What's here now should be pretty close to being functional. Whether it is interesting or not is another question. The size impact is fairly large per opportunity, and this will likely look like a perf loss when benchmarking, as the zeroing cost is now paid directly in the benchmark method instead of being paid somewhere else that the benchmark engine doesn't see. Extra SPMI misses seem to indicate the true impact is somewhat larger. Some ideas:
|
No description provided.