Skip to content

Stackalloc localloc conditional2#127980

Draft
AndyAyersMS wants to merge 31 commits intodotnet:mainfrom
AndyAyersMS:StackallocLocallocConditional2
Draft

Stackalloc localloc conditional2#127980
AndyAyersMS wants to merge 31 commits intodotnet:mainfrom
AndyAyersMS:StackallocLocallocConditional2

Conversation

@AndyAyersMS
Copy link
Copy Markdown
Member

No description provided.

AndyAyersMS and others added 22 commits February 4, 2025 17:04
…onditional2

# Conflicts:
#	src/coreclr/jit/codegenarm.cpp
#	src/coreclr/jit/codegenarm64.cpp
#	src/coreclr/jit/codegenloongarch64.cpp
#	src/coreclr/jit/codegenriscv64.cpp
#	src/coreclr/jit/codegenxarch.cpp
#	src/coreclr/jit/gentree.cpp
#	src/coreclr/jit/gentree.h
#	src/coreclr/jit/jitconfigvalues.h
#	src/coreclr/jit/lsraarm.cpp
#	src/coreclr/jit/lsraarm64.cpp
#	src/coreclr/jit/lsraloongarch64.cpp
#	src/coreclr/jit/lsrariscv64.cpp
#	src/coreclr/jit/lsraxarch.cpp
#	src/coreclr/jit/morph.cpp
#	src/coreclr/jit/objectalloc.cpp
#	src/coreclr/jit/objectalloc.h
The merge of upstream/main into StackallocLocallocConditional2 left the body
of MorphNewArrNodeIntoStackAlloc in HEAD's old form (taking GenTree* len)
while the header was updated to main's new signature (unsigned length,
unsigned blockSize, returning unsigned int). Replace the body's signature
and preamble with main's version; the rest of the function (including the
return lclNum already added during the merge) is unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The five other codegen backends (xarch, arm, arm64, loongarch64, riscv64)
zero LCLHEAP allocations when either info.compInitMem is set or the
GTF_LCLHEAP_MUSTINIT flag is present on the LCLHEAP node. The wasm backend
was missing the flag check, so an LCLHEAP marked MUSTINIT (e.g. the runtime
length stack-array path that flows through helperexpansion.cpp) would not
be zeroed when compInitMem is false.

Validated by building clr.wasmjit subset.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduce an inline helper on Compiler that returns true iff a given
LCLHEAP node must zero its allocation, encapsulating the common
'info.compInitMem || (tree->gtFlags & GTF_LCLHEAP_MUSTINIT)' check used
by every codegen and LSRA backend. Replace the 12 inlined occurrences
across xarch, arm, arm64, loongarch64, riscv64, and wasm with calls to
the helper.

Validated by building clr.jit and clr.wasmjit subsets.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 8, 2026 23:54
@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 8, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends CoreCLR JIT object stack allocation for newarr to support cases where the array length isn’t a compile-time constant by routing allocation through a localloc-based expansion (with a runtime size check that can fall back to heap allocation). It also adds plumbing to ensure such locallocs are zero-initialized even when the method doesn’t have init-mem semantics.

Changes:

  • Add an objectalloc “localloc” path for stack-allocated arrays with non-constant (and optionally in-loop) lengths, controlled by new JIT config switches.
  • Teach stack-array helper expansion to generate a runtime size computation + conditional localloc vs heapalloc split, using a new well-known arg for element size.
  • Introduce GTF_LCLHEAP_MUSTINIT and propagate “must zero” semantics through LSRA and codegen via Compiler::gtMustZeroLocalloc.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/coreclr/jit/objectalloc.h Adds state/config fields and a localloc morph helper declaration; extends stack-allocation eligibility API with lengthKnown.
src/coreclr/jit/objectalloc.cpp Implements localloc morphing for runtime-sized (and in-loop) newarr, and adds an eligibility guard for runtime-sized arrays with GC elements.
src/coreclr/jit/morph.cpp Adds display support for new WellKnownArg::StackArrayElemSize.
src/coreclr/jit/lsraxarch.cpp Updates LCLHEAP reg/zero-init decisions to use gtMustZeroLocalloc / GTF_LCLHEAP_MUSTINIT.
src/coreclr/jit/lsrariscv64.cpp Switches localloc “needs init” checks to gtMustZeroLocalloc.
src/coreclr/jit/lsraloongarch64.cpp Switches localloc “needs init” checks to gtMustZeroLocalloc.
src/coreclr/jit/lsraarm64.cpp Switches localloc “needs init” checks to gtMustZeroLocalloc.
src/coreclr/jit/lsraarm.cpp Switches localloc “needs init” checks to gtMustZeroLocalloc.
src/coreclr/jit/jitmetadatalist.h Adds a new JIT metric counter for localloc-allocated arrays.
src/coreclr/jit/jitconfigvalues.h Adds config knobs for enabling localloc-based stack array allocation and permitting it in loops.
src/coreclr/jit/helperexpansion.cpp Implements runtime-sized stack array expansion into size-check + localloc/heapalloc CFG split, and header initialization for localloc case.
src/coreclr/jit/gentree.h Adds GTF_LCLHEAP_MUSTINIT and WellKnownArg::StackArrayElemSize.
src/coreclr/jit/gentree.cpp Adds formatting for the new well-known arg in arg debug messages.
src/coreclr/jit/compiler.h Adds Compiler::gtMustZeroLocalloc helper to centralize localloc zero-init requirements.
src/coreclr/jit/codegenxarch.cpp Uses gtMustZeroLocalloc in localloc codegen, and records localloc usage.
src/coreclr/jit/codegenwasm.cpp Uses gtMustZeroLocalloc in localloc codegen.
src/coreclr/jit/codegenriscv64.cpp Uses gtMustZeroLocalloc in localloc codegen.
src/coreclr/jit/codegenloongarch64.cpp Uses gtMustZeroLocalloc in localloc codegen.
src/coreclr/jit/codegenarm64.cpp Uses gtMustZeroLocalloc in localloc codegen.
src/coreclr/jit/codegenarm.cpp Uses gtMustZeroLocalloc in localloc codegen.
src/coreclr/jit/codegen.h Adds a genLocallocUsed flag to CodeGen state.

Comment thread src/coreclr/jit/objectalloc.cpp Outdated
Comment thread src/coreclr/jit/objectalloc.cpp
Comment thread src/coreclr/jit/objectalloc.cpp
Comment thread src/coreclr/jit/helperexpansion.cpp
Comment thread src/coreclr/jit/helperexpansion.cpp
AndyAyersMS and others added 2 commits May 8, 2026 19:31
basicBlockHasBackwardJump and basicBlockInHandler in MorphAllocObjNodes
are no longer used after the morph loop body was refactored into
MorphAllocObjNodeHelperArr (which queries the block flags directly).
MSVC tolerates this via /wd4189; clang/gcc on Linux/Mac warn.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The pointer-size round-up in fgExpandStackArrayAllocation was using
(size + TARGET_POINTER_SIZE) & ~(TARGET_POINTER_SIZE - 1) which
over-allocates by one pointer when the size is already aligned, and
can push already-aligned sizes over the runtime stack threshold.

The standard align-up formula is (size + TPS - 1) & ~(TPS - 1).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
AndyAyersMS and others added 7 commits May 8, 2026 19:32
Pre-merge HEAD admitted any 2-arg newarr helper as OAT_NEWARR and
filtered constant-vs-variable inside the morph loop. Main's refactor
extracted this into AllocationKind() and added an IsCnsIntOrI() gate
as cleanup (a no-op at the time because no consumer existed for
variable lengths). The merge inherited that gate, making the new
m_UseLocalloc dispatch in MorphAllocObjNodeHelperArr unreachable.

Relax the gate to also admit variable-length newarr when localloc is
enabled, preserving main's behavior when localloc is off.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…king

When MorphAllocObjNodeHelperArr dispatches a newarr to MorphNewArrNode-
IntoLocAlloc, fgExpandStackArrayAllocation still emits a runtime heap
fallback (the size check / heap helper path). The result local can
therefore hold either a stack pointer or a real heap object reference,
and must remain GC-reportable for the lifetime of the allocation.

Previously MorphAllocObjNode unconditionally added the local to both
m_PossiblyStackPointingPointers and m_DefinitelyStackPointingPointers,
which causes it to be retyped to TYP_I_IMPL (a raw pointer the GC
ignores). For the heap-fallback path that creates a GC hole.

Plumb a m_definitelyStackPointing flag on AllocationCandidate (default
true). Both localloc dispatch sites now clear it so the local stays in
m_PossiblyStackPointingPointers only, retains TYP_BYREF, and is
reported by the GC as an interior pointer.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous size guard compared the multiply-and-add result
(elemSize * length + base) against the runtime stack-alloc limit, using
a signed compare on TYP_INT against operands typed TYP_I_IMPL. For
negative length, the I_IMPL multiply wraps to a hugely negative value;
the addition wraps further; the signed compare can then pass and
localloc allocates a tiny buffer for what the runtime believes is a
huge array, corrupting the stack instead of throwing OverflowException.
A length near INT32_MAX fails the same way through plain elemSize *
length overflow.

Replace the post-multiply check with a JIT-time-precomputed
length-based unsigned compare:

  maxSafeLength = (stackLimit - base - align8Pad) / elemSize
                  - 1 if elemSize is not pointer-aligned (round-up slack)
  if ((uint)length > maxSafeLength) goto heapallocBlock;

This catches negative lengths (which look huge in unsigned space) and
near-INT32_MAX overflows in a single compare. The intermediate
totalSize stays where it is; even if it wraps before the guard, it is
only consumed in locallocBlock, which we only enter when the length is
provably safe. The heap fallback runs the original CORINFO_HELP_NEWARR_1_*
helper, which raises OverflowException for negative lengths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When the newarr helper's length argument is TYP_LONG (some helper
variants), constructing GT_GT with TYP_INT op and a TYP_INT limit
fires the codegen assert
`genTypeSize(type) >= max(genTypeSize(op1Type), genTypeSize(op2Type))`.

Use `genActualType(lengthArg)` for the icon and the comparison so
the operands match. The unsigned compare semantics are preserved via
GTF_UNSIGNED, and `maxSafeLength` is in-range for both INT and LONG.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Modeled on Delegates.cs: five [Fact] tests covering the localloc
dispatch path for variable-length newarr.

* TestSmall  - non-escaping variable-length int[8], expects stack alloc.
* TestLarge  - variable-length int[10000], expects heap fallback.
* TestNegative, TestIntMin - negative lengths must throw OverflowException
  through the heap helper (validates the JIT-time length guard).
* TestHuge   - long[int.MaxValue]: elem * length overflows; helper must
  throw without stack corruption.

Verified the negative-length tests fail without the length guard and
pass with it.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The localloc/heapalloc dispatch path assumes the original NEWARR call's
result is consumed by a STORE_LCL_VAR; the heap-fallback block stores
its call result into that same local. When the result is unused (e.g.
DCE removed the consumer because escape analysis already retyped the
call to byref and the destination became dead), `(*callUse)->gtNext`
is null and the assert/dereference at the heap-store site AVs.

Bail out of the dispatch when the call is not the data-source of a
STORE_LCL_VAR root. The call is dead in that case and a later DCE pass
will drop it.

Caught 53 c0000005 failures across coreclr_tests.run.windows.x64.checked
SPMI replay.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@AndyAyersMS
Copy link
Copy Markdown
Member Author

What's here now should be pretty close to being functional. Whether it is interesting or not is another question. The size impact is fairly large per opportunity, and this will likely look like a perf loss when benchmarking, as the zeroing cost is now paid directly in the benchmark method instead of being paid somewhere else that the benchmark engine doesn't see.

Extra SPMI misses seem to indicate the true impact is somewhat larger.

Some ideas:

  • Coalesce adjacent or control-equivalent locallocs into one (we could do this now anyways, may be worth prototyping)
  • Use a "local subroutine" for the allocation/zeroing (also doesn't really depend on this PR)
  • Split out the "within loops" case and consider that separately (and, use a better "in loop" detector; we can run true loop finding for instance).
  • Try and prove zeroing isn't needed (again, not really dependent on this PR). Likely hard but maybe not impossible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants