Workaround: route NuGet RestoreTask to transient TaskHost in server or mt modes#13660
Workaround: route NuGet RestoreTask to transient TaskHost in server or mt modes#13660OvesN wants to merge 8 commits intodotnet:mainfrom
Conversation
Workaround for static singleton state issues in NuGet RestoreTask (e.g., PluginManager, EnvironmentWrapper) that persist across builds when running in sidecar TaskHost processes. When /mt mode or MSBuild server (MSBUILDUSESERVER=1) is active, RestoreTask is now forced to run in a transient (non-sidecar) TaskHost that terminates after execution, ensuring all static state is cleaned up. Changes: - TaskRouter: Add IsKnownProblematicTask() to identify tasks by full name - AssemblyTaskFactory: Force transient TaskHost for problematic tasks - Tests: Add unit and integration tests for the workaround Fixes dotnet#13315 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…alue of MSBUILDUSESERVER. Add logging for task host spawning details.
9793967 to
a1ce582
Compare
|
/review |
|
✅ Expert Code Review (command) completed successfully! |
There was a problem hiding this comment.
Pull request overview
Implements a workaround to isolate NuGet’s RestoreTask (which relies on process-wide static singletons) by forcing it onto a transient TaskHost when running under MSBuild Server mode or /mt, preventing static state from leaking across builds/invocations.
Changes:
- Add an internal “original server mode” env var (
_MSBUILDORIGINALUSESERVER) to preserve server-mode detection through environment snapshotting. - Route allow-listed “known problematic” tasks (currently
NuGet.Build.Tasks.RestoreTask) to a non-sidecar (transient) TaskHost in/mtor server mode. - Add TaskHost diagnostic logging and integration tests validating routing + per-invocation process isolation.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Framework/Traits.cs | Adds _MSBUILDORIGINALUSESERVER env var name and trait for detecting server-mode launch context. |
| src/Build/BackEnd/Components/Communications/NodeLauncher.cs | Stashes/restores the original MSBUILDUSESERVER value into the new internal env var around child process creation. |
| src/Build/BackEnd/Node/OutOfProcServerNode.cs | Preserves the internal env var across SetEnvironment(...) so server node can still detect original server-mode intent. |
| src/Build/BackEnd/Components/RequestBuilder/TaskRouter.cs | Introduces allow-list logic to identify “known problematic” tasks by full type name. |
| src/Build/Instance/TaskFactories/AssemblyTaskFactory.cs | Forces problematic tasks to run in TaskHost and disables sidecar reuse in /mt or server mode. |
| src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcTaskHost.cs | Extends host acquisition to report host PID / creation status for diagnostics. |
| src/Build/Instance/TaskFactories/TaskHostTask.cs | Logs per-invocation TaskHost details (PID, reuse, etc.) into the build log/binlog. |
| src/Build.UnitTests/BackEnd/TaskRouter_IntegrationTests.cs | Adds integration tests for problematic-task routing and “fresh process per invocation” behavior. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Expert Review — 24-Dimension Analysis
Summary
| # | Dimension | Verdict |
|---|---|---|
| 1 | Backward Compatibility | ✅ LGTM (test-only) |
| 2 | ChangeWave Discipline | ✅ LGTM (test-only) |
| 3 | Performance | ✅ LGTM (test-only) |
| 4 | Allocation Awareness | ✅ LGTM (test-only) |
| 5 | Test Coverage | ✅ LGTM — covers MT mode, server mode, fresh-process guarantee, and no-workaround fallback |
| 6 | Error Message Quality | ✅ LGTM (test-only) |
| 7 | Logging Fidelity | ✅ LGTM (test-only) |
| 8 | String Comparison | ✅ LGTM — regex matches ASCII digits only, ShouldContain is ordinal |
| 9 | API Surface | ✅ LGTM (test-only) |
| 10 | Target Authoring | ✅ LGTM (test-only) |
| 11 | Cross-Platform | ✅ LGTM — uses Path.Combine, Process.GetCurrentProcess().Id, standard APIs |
| 12 | Code Simplification | ✅ LGTM — boilerplate duplication is acceptable for test readability |
| 13 | Concurrency | ✅ LGTM — xunit.runner.json enforces parallelizeTestCollections: false / maxParallelThreads: 1 |
| 14 | Naming Precision | ✅ LGTM — names are descriptive and consistent with existing tests |
| 15 | SDK Integration | ✅ LGTM (test-only) |
| 16 | Evaluation Model | ✅ LGTM (test-only) |
| 17 | Correctness | ✅ LGTM — fake RestoreTask FullName matches TaskRouter.IsKnownProblematicTask check; Build() overload is self-contained |
| 18 | Documentation Accuracy | 📝 NIT — see inline comment |
| 19 | Dependency Management | ✅ LGTM (test-only) |
| 20 | Scope Discipline | ✅ LGTM — tests + comment cleanup in the same PR is reasonable |
| 21 | Security | ✅ LGTM (test-only) |
| 22 | Build Infrastructure | ✅ LGTM (test-only) |
| 23 | Binary Log Compatibility | ✅ LGTM (test-only) |
| 24 | Error Handling | ✅ LGTM (test-only) |
Findings: 1 NIT
One nit on the server-mode test's EnableNodeReuse comment — // Load-bearing: see the /mt counterpart. is a fragile cross-reference that could become a dead pointer. Consider duplicating the 1-line explanation inline. Details in the inline comment.
The comment cleanup is well-executed overall: removed XML docs restated what the method name already says, // Arrange/// Act/// Assert markers were noise for these straightforward tests, and the condensed // Load-bearing: comments on the MT test preserve the non-obvious reasoning. The tests themselves are correct and well-structured.
Note
🔒 Integrity filter blocked 2 items
The following items were blocked because they don't meet the GitHub integrity level.
- #13660
pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved". - #13660
pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
To allow these resources, lower min-integrity in your GitHub frontmatter:
tools:
github:
min-integrity: approved # merged | approved | unapproved | noneGenerated by Expert Code Review (command) for issue #13660 · ● 9.7M
…OvesN/msbuild into workaround-restore-issues-in-mt
| /// This is a temporary workaround until the task authors fix their static state issues. | ||
| /// See https://github.com/dotnet/msbuild/issues/13315 | ||
| /// </summary> | ||
| private static readonly string[] s_knownProblematicTaskNames = |
There was a problem hiding this comment.
There was a problem hiding this comment.
or in this case just const string if it's one problematic task...
Fixes #13315
Context
NuGet's RestoreTask holds static singletons (PluginManager, EnvironmentVariableWrapper) that assume
one invocation per process. Two MSBuild modes break that assumption:
builds back-to-back.
In both cases NuGet's static state survives past its intended scope.
This PR routes RestoreTask to a transient TaskHost (nodeReuse=false) when either mode is active, so
the spawned MSBuild.exe exits after Execute() and statics die with it
Changes Made
Why the original attempt didn't work
Original commit.
The server-mode trigger read
Traits.Instance.UseMSBuildServer, which checksMSBUILDUSESERVER. That env var is0/nullin the worker process where tasks run,stripped by:
NodeLauncher.DisableMSBuildServerzeroes it before spawning the Server child(recursion guard).
OutOfProcServerNode.HandleServerNodeBuildCommandoverwrites the server's env fromthe client's snapshot, which doesn't include MSBuild internals.
Net effect: the workaround branch never fired.
What this PR does
Traits.OriginalUseMSBuildServerEnvVarName(_MSBUILDORIGINALUSESERVER).NodeLauncher.DisableMSBuildServerstashes the originalMSBUILDUSESERVERvalue intoit;
OutOfProcServerNodere-applies it after the env-snapshot restore. Exposed asTraits.Instance.WasLaunchedInMSBuildServerMode.TaskRouter.IsKnownProblematicTask— currentlyNuGet.Build.Tasks.RestoreTask.AssemblyTaskFactory.CreateTaskInstance— when the task is on theallow-list AND (
/mtmode ORWasLaunchedInMSBuildServerMode), forceuseTaskFactory = trueandforceTransientTaskHost = true.Diagnostic logging added
Per-invocation TaskHost diagnostics in
TaskHostTask.Execute(low importance, capturedin binlog) — complements the existing
ExecutingTaskInTaskHostmessage. RecordsProcessId,ParentProcessId,NewNodeContext,IsSidecar,NodeReuseEffective.Useful when investigating any TaskHost-routing question.
Testing
Unit tests in
src/Build.UnitTests/BackEnd/TaskRouter_IntegrationTests.cs:Manual end-to-end with
dotnet restore App.csproj /bl:r.binlog:DOTNET_CLI_USE_MSBUILD_SERVER=1(server mode) and again with/mt, opened thebinlog and confirmed two
TaskHost details for task "RestoreTask"lines withIsSidecar=False,NodeReuseEffective=False, and differentProcessIdbetweenmultiple dotnet restore invocations.
Notes