Skip to content

feat(mcp): per-server code_mode_run_js + multi-graph MCP demo#2840

Draft
asoorm wants to merge 10 commits intomainfrom
ahmet/mcp-demo-multi-graph
Draft

feat(mcp): per-server code_mode_run_js + multi-graph MCP demo#2840
asoorm wants to merge 10 commits intomainfrom
ahmet/mcp-demo-multi-graph

Conversation

@asoorm
Copy link
Copy Markdown
Contributor

@asoorm asoorm commented May 7, 2026

Summary

  • Adds per-sub-server code_mode_run_js MCP tool that auto-registers on every MultiServer collection with at least one operation loaded. Each server gets its own V8 sandbox bound to its upstream URL; tools.<OperationName>(vars) dispatches via the existing op catalog (no Yoko, no storage backend, no session state).
  • Slim, scoped description: variable shapes and return types are NOT duplicated — they live on the per-op MCP tools that already publish them. Description lists bound names + sandbox/usage rules + return-value discipline. ~3.6 KB per server vs the 7 KB TS-bundle version.
  • Includes the merge of feat(router): add Code Mode MCP server #2825 (code-mode-v2) so the standalone code-mode server, sandbox, harness, and tsgen packages are available; this PR's integration reuses sandbox/harness/tsgen but does NOT enable the standalone code-mode server (code_mode.enabled defaults to false).
  • Watcher logs now make the broadcast explicit: "MCP tool refresh broadcast to connected clients (tools/list_changed)" with total_tools count, plus "...reloading tools and notifying connected clients" on the watcher tick.

Status

Draft — primarily here so CI can produce a Docker image for the conference demo. Not intended to merge as-is:

  • Carries the full code-mode-v2 PR diff (~+15.7K lines) on top of the multi-graph commits; the integration commit itself is small (+200 lines under pkg/mcpserver/).
  • Compile fixes were applied to the merged code (yoko.Searcher interface, opsFragment rename, fakeYoko test stub) — see commit 56a3a393f for details.
  • The standalone code-mode server (Jens's tool) and the per-server integration (this work) are independent paths and should probably ship as separate PRs once feat(router): add Code Mode MCP server #2825 lands.

Commits on this branch (vs main)

  • 28f10365f refactor(mcp): slim code_mode_run_js description, scope to bound names
  • eadb47be1 chore(mcp): clarify reload logs as client-broadcast notifications
  • 56a3a393f feat(mcp): per-server code_mode_run_js tool, merging code-mode-v2
  • 1720a9260 feat(mcp): demo multi-graph MCP server with hot-reloading operations
  • 9088da4ed chore: wip multi-collection mcp
  • plus 5 commits from the merged feat(router): add Code Mode MCP server #2825

Test plan

  • CI green on router unit tests + router-tests
  • Docker image builds successfully (the reason this draft exists)
  • Smoke test on conference-demo config: tools/list on each sub-server returns code_mode_run_js plus the per-op tools
  • code_mode_run_js against /countries returns just a number for "how many countries speak Spanish" (return-value discipline)
  • File-watcher hot-reload emits the new "broadcast to connected clients" log line

Summary by CodeRabbit

Code Mode Support & Demo Environments

  • New Features
    • Added Code Mode sandbox for executing JavaScript/TypeScript operations with GraphQL integration.
    • Introduced MCP tools for discovering GraphQL operations and executing custom code in the sandbox.
    • Added named operations storage with session state management and mutation approval workflows.
    • New demo environments demonstrating Code Mode usage with local federation and external supergraph support.

asoorm and others added 10 commits May 1, 2026 17:33
Adds a second MCP server next to the existing per-operation one,
exposing two generic tools instead of one-tool-per-operation:

- code_mode_search_tools: takes prompts, generates GraphQL ops,
  registers them in the session catalog, returns their TS signatures.
- code_mode_run_js: runs an async arrow against the session catalog
  inside a JS sandbox (V8 isolate), with tools.<name>(vars) bound
  to the registered ops.

Includes:
- router/internal/codemode: harness, sandbox, server, storage, tsgen,
  yoko (query-gen client), calltrace, observability.
- proto/wg/cosmo/code_mode/yoko/v1: Yoko query-gen Connect API.
- router/pkg/config: code_mode config block (auth, sandbox limits,
  query-generation endpoint, named-ops, mutation approval).
- demo/code-mode: local federation + Yoko mock + start scripts +
  MCP client config snippets (Claude Code, Claude Desktop, Codex).
- demo/code-mode-connect: alternate demo against an external yoko
  Connect supergraph (set YOKO_DIR).
- router-tests: end-to-end named-ops integration test.
Previously a single non-serializable leaf (BigInt, function, symbol,
undefined, circular ref) caused the sandbox to drop the entire return
value and surface a NotSerializable error. Multi-source aggregations
that put raw federation errors or upstream payloads into a debug field
lost the rest of an expensive run.

The validator now mutates each bad leaf to the sentinel string
"<<non-serializable: KIND>>" and records {path, kind} in a new warnings
array on the response envelope. The successful result still flows
through. NotSerializable now only fires when JSON.stringify itself
throws after sanitization.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…down

Moves the long inline tool/resource description string constants out of server.go into a new internal/codemode/server/descriptions sub-package, where each description lives in its own .md file embedded via go:embed. Lets the prose be edited as readable markdown without touching Go source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The proxy now retries the initial dial with exponential backoff and
supervises the live session via session.Wait(); when the upstream
disconnects it reconnects transparently so downstream clients (Claude
Desktop, etc.) keep working across router restarts. KeepAlive pings make
dead connections surface quickly instead of waiting for the next call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eState.wg

WriteTimeout was 30s while executeTimeout is 120s, so net/http would cut
off legitimately long code_mode_run_js responses mid-flight. Lift the
write deadline above the configured execute timeout and switch the
listener to ReadHeaderTimeout so a body upload doesn't share the 30s
budget either. The unused sync.WaitGroup on executeState (and its
matching state.wg.Wait() in Execute) was a no-op leftover; remove both.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the building blocks for a demo router that turns any GraphQL graph
into MCP tools, hosts multiple operation collections side by side, and
hot-reloads tools when .graphql files change on disk.

- pkg/mcpserver/introspection.go: fetch upstream SDL via standard
  introspection so a collection can target any GraphQL endpoint without
  a local schema file.
- pkg/mcpserver/multi_server.go: mount many GraphQLSchemaServer handlers
  on one shared listener, each with its own path, operations dir, and
  optional upstream override.
- pkg/mcpserver/operations_watcher.go: per-collection directory watcher
  that picks up added/modified/removed .graphql files and triggers a
  tools reload after a settling tick.
- mcp.config.yaml: working demo config wiring four public graphs
  (anilist, countries, rickandmorty, swapi) as separate MCP servers.
Merges code-mode-v2 (#2825) and adds a per-sub-server code_mode_run_js
MCP tool that lets an LLM compose multiple operations in one V8-sandboxed
JS call instead of N MCP round-trips.

The new tool is auto-registered on every MultiServer sub-server that has
at least one operation loaded. Each sub-server gets its own sandbox bound
to its upstream URL; tools.<name>(vars) dispatches via the existing op
catalog (no Yoko, no storage backend, no session state).

Reuses the codemode/sandbox, codemode/harness, and codemode/tsgen packages
from the merged PR. Description bundled inline includes the typed TS
catalog plus shape constraints (single async-arrow root, no top-level
await, no IIFE, sandbox console disabled, etc.) so the LLM lands on the
right syntax first try.

Compile fixes from the merge:
- Add EnsureIndexed(ctx) error to yoko.Searcher and *yoko.Client
- Rename opsFragment -> newOpsFragment to match call sites/tests
- Update fakeYoko test fake with EnsureIndexed counter
- Resolve mcpServer field conflict in router_config.go (keep
  MultiServer type, add codeModeServer alongside)
The watcher and applyToolDiff already logged on every reload, but the
messages didn't make it clear that a tools/list_changed notification
was being broadcast to connected MCP clients. Two tweaks:

- operations_watcher: "...reloading tools and notifying connected clients"
- applyToolDiff: "MCP tool refresh broadcast to connected clients
  (tools/list_changed)" with a total_tools count; the no-change branch
  now says "no notification sent to clients" so the distinction between
  "ticked, nothing useful happened" and "ticked, clients notified" is
  visible at a glance.
The previous description duplicated the per-op MCP tools' inputSchema info
as an inline TypeScript bundle, and used hardcoded example operation
names (SearchAnime, GetA/B, GetCountries) that don't necessarily exist on
every server. The model could read those examples as available bindings,
producing cross-server confusion (e.g. tools.SearchAnime appearing in
the countries server's description).

Drop the TS catalog and the foreign-named examples. The description now:

- Lists the actual bound operation names on this server, nothing more
- Points the model at the per-op MCP tools (and get_operation_info) for
  variable shapes and return types — single source of truth, no dupe
- Uses generic placeholder names (tools.SomeOp) in syntax examples
- Trims to ~half the prior length (3.6 KB vs 7.0 KB) by removing the
  redundant "WHEN TO USE" example list, since RETURN-VALUE DISCIPLINE
  already covers the same ground
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Review Change Stack

Walkthrough

Adds a Code Mode MCP server with JS sandbox/harness, storage (memory/Redis), Yoko proto/client/mock, router integration (multi-server), configs/demos/scripts, schema/config updates, and comprehensive tests.

Changes

Code Mode Stack

Layer / File(s) Summary
Core config/proto
proto/.../yoko.proto, router/pkg/config/..., router/pkg/schemaloader/loader.go
Defines Yoko API; extends router MCP config (code_mode, servers), defaults and schema; strips operation descriptions when loading.
Router integration
router/core/*, router/pkg/mcpserver/*
Starts Code Mode server, builds MCP multi-server, introspects/loads SDL, watches ops, reloads Code Mode on graph rebuild.
Sandbox/harness
router/internal/codemode/harness/*, router/internal/codemode/sandbox/*, router/internal/codemode/observability/*
Implements transpile/shape-check, execution, envelopes, validation, host bridge, telemetry.
Server and approval
router/internal/codemode/server/*
Adds search/execute tools, persisted resource, lifecycle, elicitation-based approval, descriptions, session handling.
Storage/TS generation
router/internal/codemode/storage/*, router/internal/codemode/tsgen/*
Memory/Redis backends, naming, interfaces, and TypeScript bundle/fragment rendering.
Yoko client/mock
router/internal/codemode/yoko/*, demo/code-mode/yoko-mock/*
Client with indexing/search; mock service with schema and tests.
Proxy and demos
demo/code-mode*/**, .gitignore, Makefile
MCP stdio proxy; code-mode demos/configs/scripts; make targets.
Tests
router-tests/*, many *_test.go
End-to-end and unit tests for named ops, handlers, storage, tsgen, observability, lifecycle.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • wundergraph/cosmo#2636 — Touches MCP server behavior and tool surface; overlaps with multi-server and tooling changes.
  • wundergraph/cosmo#2823 — Adjusts router core graph mux/reload logic related to lifecycle used here.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Router-nonroot image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-7f4d02d9db82bc74d084f28fb69e585822d6d45c-nonroot

@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 41.98%. Comparing base (9780ce7) to head (28f1036).
⚠️ Report is 22 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #2840       +/-   ##
===========================================
- Coverage   64.32%   41.98%   -22.35%     
===========================================
  Files         311      275       -36     
  Lines       44308    29500    -14808     
  Branches     4767        0     -4767     
===========================================
- Hits        28502    12386    -16116     
+ Misses      15783    15727       -56     
- Partials       23     1387     +1364     
Files with missing lines Coverage Δ
router/core/errors.go 72.07% <ø> (ø)
router/core/factoryresolver.go 60.65% <ø> (ø)
router/core/flushwriter.go 71.79% <ø> (ø)
router/core/graph_server.go 68.45% <ø> (ø)
router/core/graphql_handler.go 62.71% <ø> (ø)
router/core/header_rule_engine.go 35.62% <ø> (ø)
router/core/plan_generator.go 0.00% <ø> (ø)
router/core/router.go 51.72% <ø> (ø)
router/core/router_config.go 88.12% <ø> (ø)
router/core/transport.go 75.62% <ø> (ø)
... and 48 more

... and 528 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
router/pkg/config/config.schema.json (1)

2777-2885: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Per-server OAuth configs bypass the current schema guardrails.

The if/then at the end of mcp only validates the top-level oauth block. servers[].oauth.enabled: true currently passes schema validation without requiring jwks, and it also never triggers the shared mcp.server.base_url requirement needed for OAuth discovery. That lets broken multi-server auth configs through validation and pushes the failure to runtime.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/pkg/config/config.schema.json` around lines 2777 - 2885, The schema's
existing if/then only enforces OAuth requirements for the top-level mcp.oauth,
but does not validate per-server OAuth blocks (servers[].oauth), so
servers[].oauth.enabled: true can bypass requiring jwks and the shared
mcp.server.base_url; update the schema to add a conditional that detects any
servers array entry with oauth.enabled === true (e.g., an "if" that uses
"properties": {"servers": {"items": {"properties": {"oauth": {"properties":
{"enabled": {"const": true}},"required":["enabled"]}}}}}) and a corresponding
"then" that requires that the server's oauth.jwks has minItems:1 and that the
top-level mcp.server.base_url is present (require "jwks" under that server
object and add "server": {"required":["base_url"]} to the overall then),
ensuring servers[].oauth.jwks and mcp.server.base_url are validated when any
per-server oauth.enabled is true; reference the "servers" array,
"servers[].oauth.enabled", "servers[].oauth.jwks", and top-level
"server.base_url" symbols when making the change.
🟡 Minor comments (14)
router/pkg/config/config.schema.json-2789-2792 (1)

2789-2792: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Enforce the leading slash that servers[].path promises.

The description says this must be a mount path like /mcp, but the schema accepts any string right now. Adding a simple ^/ pattern will catch invalid paths before startup.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/pkg/config/config.schema.json` around lines 2789 - 2792, The
servers[].path property currently allows any string but the description requires
a leading slash; update the JSON Schema for the "path" property (servers[].path)
to include a pattern ensuring it starts with "/" (e.g., add "pattern": "^/") so
invalid mount paths are rejected by schema validation; keep the existing
"description" and "type": "string" and, if present, add a clear "pattern"
error-friendly message or ensure existing validation tooling surfaces the
pattern failure.
router/pkg/mcpserver/introspection.go-110-112 (1)

110-112: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Limit the non-2xx body read before truncating it.

io.ReadAll(resp.Body) still buffers the entire error response even though only 256 characters are reported. A large HTML/proxy error page can make startup allocate far more memory than needed here.

Suggested fix
-		raw, _ := io.ReadAll(resp.Body)
+		raw, _ := io.ReadAll(io.LimitReader(resp.Body, 1024))
 		return "", fmt.Errorf("introspect %s: status %d: %s", url, resp.StatusCode, truncate(string(raw), 256))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/pkg/mcpserver/introspection.go` around lines 110 - 112, The code reads
the entire resp.Body into memory before truncating, which can allocate huge
buffers for large error pages; change the read to a bounded read (e.g., use
io.LimitReader on resp.Body limited to 256 or 257 bytes) so only the portion
needed for truncate(...) is buffered, then pass that limited content into
truncate() and the error message; update the error branch that checks
resp.StatusCode in introspection.go to use the bounded read against resp.Body
(reference resp, resp.Body and truncate).
router/internal/codemode/calltrace/calltrace.go-85-91 (1)

85-91: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Enabled should treat nil/empty FileRecorder as disabled.

A typed-nil *FileRecorder or empty path currently reports enabled, even though recording is effectively off.

Suggested fix
 func Enabled(recorder Recorder) bool {
-	switch recorder.(type) {
+	switch r := recorder.(type) {
 	case nil, NopRecorder, *NopRecorder:
 		return false
+	case *FileRecorder:
+		return r != nil && r.path != ""
 	default:
 		return true
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/calltrace/calltrace.go` around lines 85 - 91, The
Enabled(recorder Recorder) function currently returns true for a typed-nil
*FileRecorder or a FileRecorder with an empty path; change it to treat those as
disabled by first checking if recorder == nil, then attempt a type assertion to
*FileRecorder (or FileRecorder) and, if that type matches, return false when the
asserted pointer is nil or when its path field (e.g., r.path or r.Path) is
empty; keep the existing NopRecorder checks for other recorder types and
otherwise return true. Ensure you reference Enabled and the
Recorder/FileRecorder types and inspect the FileRecorder.path field when
deciding enabled/disabled.
router/internal/codemode/calltrace/calltrace.go-82-82 (1)

82-82: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don’t ignore file write failures.

Discarding Write results can silently lose trace entries.

Suggested fix
-	_, _ = file.Write(line)
+	n, err := file.Write(line)
+	if err != nil || n != len(line) {
+		return
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/calltrace/calltrace.go` at line 82, The file write
result is being ignored (_, _ = file.Write(line)), which can drop trace entries
silently; change this to capture the return values (n, err := file.Write(line))
and handle errors from file.Write inside the enclosing function in calltrace.go
(e.g., return the error up the call chain or log it via the package's
logger/standard error) and ensure you also handle short writes (n != len(line))
appropriately by retrying or treating it as an error so trace writes are never
silently lost.
demo/code-mode/mcp-configs/claude.desktop.json-4-4 (1)

4-4: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Ensure the placeholder path is documented in the README.md.

/ABSOLUTE/PATH/TO/cosmo/... requires manual substitution by each user. If the README doesn't already call this out, add a note explaining where the binary lives and how to build/locate it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@demo/code-mode/mcp-configs/claude.desktop.json` at line 4, The "command"
field in claude.desktop.json contains a placeholder path (the value starting
with "/ABSOLUTE/PATH/TO/cosmo/...") that must be manually substituted by users;
update README.md to document this: state that the user must replace that
placeholder in the "command" key with the actual mcp-stdio-proxy binary
location, describe how to build or where to find the mcp-stdio-proxy binary
(brief build steps or typical install/output directory), and show one short
clear instruction on how to locate the built binary so users can set the correct
path.
router/mcp.config.yaml-73-82 (1)

73-82: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Hardcoded personal home-directory paths won't work outside the author's machine (or in a Docker build).

Every file_system storage provider path is anchored to /Users/asoorm/go/src/github.com/wundergraph/conference-mcp-demo/.... With watch: true on each server, the router will attempt to watch these paths on startup — on any other host (including a Docker container) this will fail or silently produce no-op watchers.

Consider one of:

  • Replace with paths relative to the repo root or to $PWD (e.g., ./demo/code-mode/operations/<name>).
  • Parameterise via an env-var overlay or a --config-override flag.
  • Add a prominent # TODO: replace with your local path comment so the next person isn't surprised.
📝 Suggested placeholder form
 storage_providers:
   file_system:
     - id: rickandmorty
-      path: /Users/asoorm/go/src/github.com/wundergraph/conference-mcp-demo/.operations/rickandmorty
+      path: /path/to/conference-mcp-demo/.operations/rickandmorty   # TODO: update for your environment
     - id: countries
-      path: /Users/asoorm/go/src/github.com/wundergraph/conference-mcp-demo/.operations/countries
+      path: /path/to/conference-mcp-demo/.operations/countries       # TODO: update for your environment
     - id: anilist
-      path: /Users/asoorm/go/src/github.com/wundergraph/conference-mcp-demo/.operations/anilist
+      path: /path/to/conference-mcp-demo/.operations/anilist         # TODO: update for your environment
     - id: swapi
-      path: /Users/asoorm/go/src/github.com/wundergraph/conference-mcp-demo/.operations/swapi
+      path: /path/to/conference-mcp-demo/.operations/swapi           # TODO: update for your environment
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/mcp.config.yaml` around lines 73 - 82, The
storage_providers.file_system entries currently use hardcoded absolute
home-directory paths (ids: rickandmorty, countries, anilist, swapi) which will
fail on other machines or containers; update these to use relative paths (e.g.,
./... relative to the repo root) or substitute an environment-variable-based
placeholder (e.g., ${OPERATIONS_DIR}/<name>) so paths can be overridden at
runtime, and add a short TODO comment above the file_system block indicating
that implementers must set OPERATIONS_DIR or adjust paths before running.
demo/code-mode/mcp-configs/README.md-113-117 (1)

113-117: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Documentation contradicts the actual config value for require_mutation_approval.

Line 115 states the demo config sets require_mutation_approval: false, but demo/code-mode/router-config.yaml Line 42 sets it to true. This will mislead anyone testing prompt 04-mutation-not-approved.txt — the comment says to skip it only for approval-rejection demos, but with true it will actually enforce approval.

📝 Proposed fix
-The historical prompt name mentions "not approved", but the demo config sets `require_mutation_approval: false` in `demo/code-mode/router-config.yaml`.
-That means this prompt is not declined by operator approval in the default demo; it should run like a normal mutation if the mock can generate the operation.
-Skip this prompt when you specifically need to demonstrate approval rejection.
+The demo config sets `require_mutation_approval: true` in `demo/code-mode/router-config.yaml`,
+so this prompt will trigger the mutation-approval flow. If you want to skip approval gating,
+set `require_mutation_approval: false` before running the demo.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@demo/code-mode/mcp-configs/README.md` around lines 113 - 117, The README
incorrectly states the demo config sets require_mutation_approval: false while
the actual router config (demo/code-mode/router-config.yaml) sets
require_mutation_approval: true; update either the README text or the config so
they match: either change the README line referencing require_mutation_approval
to state "true" and explain that sample-prompts/04-mutation-not-approved.txt
will be declined by default, or change the router-config.yaml
require_mutation_approval setting to false if the intent is to not enforce
approval by default; ensure references to
sample-prompts/04-mutation-not-approved.txt and the require_mutation_approval
setting are consistent.
demo/code-mode/Makefile-3-4 (1)

3-4: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

$(wildcard ../cli/.env) checks the wrong path and silently disables env-file loading.

Make expands $(wildcard ...) relative to its own working directory (demo/code-mode/), so ../cli/.env resolves to demo/cli/.env — not the repo-root cli/.env. The conditional is therefore always false, and the --env-file flag is never appended even when cli/.env exists.

The --env-file ../cli/.env string embedded in the shell command is fine as-is: the recipe runs under cd .. so the shell sees demo/../cli/.env = cli/.env from the root.

Only the wildcard guard needs the corrected path:

🐛 Proposed fix
-wgc_env_arg = $(if $(wildcard ../cli/.env),--env-file ../cli/.env,)
+wgc_env_arg = $(if $(wildcard ../../cli/.env),--env-file ../cli/.env,)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@demo/code-mode/Makefile` around lines 3 - 4, The Makefile's conditional uses
$(wildcard ../cli/.env) relative to demo/code-mode/, which points at
demo/cli/.env and is therefore false; update the guard so it checks the
repo-root file by using the correct relative path ../../cli/.env (i.e., change
the wildcard in the wgc_env_arg definition), leaving the wgc_router command
(--env-file ../cli/.env) unchanged because the recipe runs from the parent dir;
this fixes the conditional so --env-file is appended when cli/.env exists.
router/core/graph_server.go-1403-1410 (1)

1403-1410: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Both Code Mode error branches reuse the "failed to reload MCP server" message, shadowing the adjacent s.mcpServer log.

Both branches emit "failed to reload MCP server" but they concern s.codeModeServer, not s.mcpServer (which is already handled with the same message just above at lines 1398–1402). Any log-based alert or grep for that message will conflate the two. The first branch also double-wraps: the outer log message already implies a reload failure, making the fmt.Errorf("failed to print Code Mode schema SDL: %w", printErr) wrapper redundant.

🔧 Proposed fix
-		if printErr != nil {
-			s.logger.Error("failed to reload MCP server", zap.Error(fmt.Errorf("failed to print Code Mode schema SDL: %w", printErr)))
-		} else if mErr := s.codeModeServer.Reload(executor.ClientSchema, sdl); mErr != nil {
-			s.logger.Error("failed to reload MCP server", zap.Error(mErr))
-		}
+		if printErr != nil {
+			s.logger.Error("failed to print Code Mode schema SDL", zap.Error(printErr))
+		} else if mErr := s.codeModeServer.Reload(executor.ClientSchema, sdl); mErr != nil {
+			s.logger.Error("failed to reload Code Mode server", zap.Error(mErr))
+		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/core/graph_server.go` around lines 1403 - 1410, The log messages for
the code-mode reload use the wrong text and double-wrap errors: update the two
s.codeModeServer error branches so they emit a distinct message (e.g., "failed
to reload Code Mode server") instead of "failed to reload MCP server", and
simplify the first branch to log the original printErr directly (no extra
fmt.Errorf wrapping). Locate the block using
astprinter.PrintString(executor.ClientSchema) and s.codeModeServer.Reload(...)
and change the s.logger.Error calls to use the new message and pass the original
error values (printErr and mErr) so logs clearly distinguish code-mode failures
from s.mcpServer failures.
router/internal/codemode/server/observability_handler_test.go-50-58 (1)

50-58: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard span indexing before accessing element 0.

Line 57 and Line 101 access [0] before asserting length; if span emission regresses, the test panics instead of failing with a clear assertion.

Suggested hardening
- assert.Equal(t, []tracetest.SpanStub{{
+ gotStubs := normalizedSpanStubs(traces.Ended())
+ require.Len(t, gotStubs, 1)
+ assert.Equal(t, []tracetest.SpanStub{{
   Name:     "MCP Code Mode - Search",
   SpanKind: trace.SpanKindServer,
   Attributes: []attribute.KeyValue{
     attribute.String("mcp.tool", "code_mode_search_tools"),
     attribute.String("mcp.status", "success"),
   },
-  InstrumentationLibrary: normalizedSpanStubs(traces.Ended())[0].InstrumentationLibrary,
- }}, normalizedSpanStubs(traces.Ended()))
+  InstrumentationLibrary: gotStubs[0].InstrumentationLibrary,
+ }}, gotStubs)

Also applies to: 94-102

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/server/observability_handler_test.go` around lines
50 - 58, The test indexes normalizedSpanStubs(traces.Ended())[0] before
asserting the slice length, which can cause a panic if no spans are emitted;
change the test to first capture the slice into a variable (e.g., spans :=
normalizedSpanStubs(traces.Ended())), assert its length with
require.Len/require.Greater/ assert.LenEqual (or require.NotEmpty) to ensure at
least one element, then access spans[0] when constructing the expected
tracetest.SpanStub for comparison with assert.Equal; apply the same guard for
other occurrences (the second access around spans[94-102] / similar sites).
router/internal/codemode/sandbox/preamble.go-20-27 (1)

20-27: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make splice marker detection whitespace/line-ending tolerant.

Line 23 requires an exact match; CRLF or incidental spacing will miss the marker and return 1, breaking user-code line mapping.

Suggested fix
- if line == spliceComment {
+ if strings.TrimSpace(line) == spliceComment {
    return i + 2
  }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/sandbox/preamble.go` around lines 20 - 27, The
splice marker detection in userCodeStartLine currently requires an exact line
match and fails on CRLF or incidental spacing; update the check in
userCodeStartLine to compare a trimmed version of each line (e.g., use
strings.TrimSpace(line)) against spliceComment so CR ("\r") and extra whitespace
won't prevent detection, leaving the rest of the function behavior (including
the return i+2) unchanged.
demo/code-mode/yoko-mock/main_test.go-129-132 (1)

129-132: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Windows stub misses the Search (resume) behavior.

Line 130–Line 132 only emulate Index output. Search-path tests depend on writing resumeMessage to the -o file, so Windows runs won’t reflect the same behavior.

Suggested direction
- // Minimal Windows fallback — only Index path is exercised in CI on Unix.
- script = "@echo off\r\ntype \"" + indexFile + "\"\r\n"
+ // Keep parity with Unix stub: detect "resume" and honor "-o <FILE>".
+ // (or explicitly skip Search tests on Windows if parity is intentionally unsupported)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@demo/code-mode/yoko-mock/main_test.go` around lines 129 - 132, The Windows
branch in the runtime.GOOS == "windows" stub only emits the Index output and
omits the Search/resume behavior expected by tests; update the Windows fake
script (the script variable built in that branch) to also write the
resumeMessage to the -o output file path (the same path the non-Windows shell
writes to) when invoked with the Search/resume code path so tests that rely on
resumeMessage being written to the output file will pass; locate the script
construction near the indexFile and resumeMessage variables and add the
conditional output logic mirroring the non-Windows branch.
router/internal/codemode/server/descriptions/search_tool.md-18-18 (1)

18-18: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix malformed quoting in the filter example.

The sentence currently has broken nested quotes, which makes the instruction ambiguous. Rewrite to keep each example quoted cleanly.

✏️ Proposed wording
-- Any required filters/arguments but never specific values ("employee by id - not "employee 123", "employee filtered by department name" - not "employee in department 'Engineering'").
+- Any required filters/arguments but never specific values ("employee by id" — not "employee 123"; "employee filtered by department name" — not "employee in department 'Engineering'").
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/server/descriptions/search_tool.md` at line 18, Fix
the malformed nested quotation in the example sentence that currently reads: Any
required filters/arguments but never specific values ("employee by id - not
"employee 123", "employee filtered by department name" - not "employee in
department 'Engineering'"). Rewrite it so each example uses consistent,
non-conflicting quotes (for example: "employee by id" — not "employee 123";
"employee filtered by department name" — not "employee in department
'Engineering'") to remove ambiguity and broken nesting.
demo/code-mode/start.sh-79-102 (1)

79-102: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

start_background does not use exec, unlike start_background_root.

In start_background_root (lines 92-102), the command uses exec "$@" which replaces the subshell. However, start_background (lines 79-90) uses "$@" directly without exec. This means --down will signal the subshell rather than the go run process directly.

For go run commands this may work because the subshell exits when the child exits, but it's inconsistent with the pattern in start_background_root and the comment in code-mode-connect/start.sh (lines 88-90) that explains why exec is needed.

♻️ Suggested fix: Use exec for consistency
 start_background() {
   local name="$1"
   local cwd="$2"
   shift 2

   echo "Starting $name"
   (
     cd "$cwd"
-    "$@"
+    exec "$@"
   ) > "$LOG_DIR/$name.log" 2>&1 &
   append_pid "$name" "$!"
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@demo/code-mode/start.sh` around lines 79 - 102, The start_background function
launches the command in a subshell but omits exec, causing signals (like --down)
to target the subshell rather than the actual child process; make
start_background mirror start_background_root by using exec "$@" inside the
subshell so the subshell is replaced by the child process (preserve the existing
cd "$cwd", redirection to "$LOG_DIR/$name.log", backgrounding and append_pid
"$name" "$!").
🧹 Nitpick comments (22)
router/internal/codemode/observability/logging.go (1)

18-25: ⚡ Quick win

Use warn/error levels for the failure helpers.

These are real failure paths, so emitting them at Info makes them invisible to the log filters and alerts that usually watch Warn/Error. Keeping lifecycle events at Info is fine, but the failure helpers should log at a higher severity.

Suggested fix
-	logger.Info("code mode transpile failure",
+	logger.Warn("code mode transpile failure",
 		zap.String("session_id", sessionID),
 		zap.String("diagnostic", diagnostic),
 	)
-	logger.Info("code mode tool invocation failure",
+	logger.Error("code mode tool invocation failure",
 		zap.String("session_id", sessionID),
 		zap.String("op_name", opName),
 		zap.Error(err),
 	)

Also applies to: 39-47

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/observability/logging.go` around lines 18 - 25, The
LogTranspileFailure helper is logging actual failures at Info level; change its
call from logger.Info to logger.Error (preserving the message and zap.String
fields) so failures emit at error severity, and apply the same change to the
other failure helper functions in this file (the other transpile/failure helpers
near LogTranspileFailure) so all real failure paths use logger.Error rather than
Info while keeping the nil logger short‑circuit and existing field structure
intact.
router/internal/codemode/calltrace/calltrace_test.go (1)

35-46: ⚡ Quick win

Expand coverage for recorder state semantics.

Please add assertions for request/response direction (if kind is added) and Enabled behavior for (*FileRecorder)(nil) / empty-path recorder to prevent regressions.

Also applies to: 49-51

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/calltrace/calltrace_test.go` around lines 35 - 46,
Update the test to assert recorder state semantics: extend the existing
assertions on []Record to check the Record.kind (or direction) values for
request vs response entries once a kind field is added, and add cases that
verify Enabled() returns false for (*FileRecorder)(nil) and for a FileRecorder
configured with an empty path (and true for a normal recorder) to prevent
regressions; target the Record type assertions and the FileRecorder.Enabled
method in the calltrace tests (e.g., in the Test that currently compares
records) and add short subtests asserting Enabled() behavior for nil and
empty-path receivers.
router/internal/codemode/sandbox/semaphore.go (1)

1-16: 💤 Low value

LGTM — canonical Go buffered-channel semaphore, correctly implemented.

acquire fills a slot (sends), release drains one (receives); ctx.Done() exits cleanly on cancellation. One non-blocking contract worth documenting (or enforcing via defer): release blocks indefinitely if called without a preceding successful acquire. Callers should always use defer s.release() immediately after a successful acquire call to avoid goroutine leaks on any future refactor.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/sandbox/semaphore.go` around lines 1 - 16, The
semaphore is correct but lacks a contract warning: add a short doc comment on
the Sandbox type and the acquire/release methods (referencing Sandbox.acquire
and Sandbox.release and the s.sem channel) stating that release will block if
called without a successful acquire and callers must call release only after a
successful acquire, recommending the canonical pattern “if err :=
s.acquire(ctx); err == nil { defer s.release() }”. Optionally include a small
usage example in the comment to show the defer pattern to prevent goroutine
leaks.
router/core/router_config.go (1)

338-342: 💤 Low value

Consider adding code_mode telemetry to Usage().

The MCP usage block (Lines 338–342) tracks MCP settings but omits the new Code Mode server. Adding usage["code_mode_enabled"] = c.mcp.CodeMode.Enabled here would keep the usage telemetry consistent with the new feature surface.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/core/router_config.go` around lines 338 - 342, The Usage() MCP
telemetry block is missing the new Code Mode flag; inside the Usage() method
where usage["mcp"] and related keys are set (the c.mcp.* entries), add a new
entry usage["code_mode_enabled"] = c.mcp.CodeMode.Enabled so the Code Mode
server state is reported alongside mcp, mcp_enable_arbitrary_operations,
mcp_exclude_mutations and mcp_expose_schema.
router/internal/codemode/storage/memory_backend_test.go (1)

215-222: ⚡ Quick win

Prefer WaitGroup.Go over manual Add/Done goroutine wiring.

Line 219–Line 222 can be simplified and made less error-prone with wg.Go(...) in this codebase.

Suggested refactor
 for i := range goroutines {
-  wg.Add(1)
-  go func(i int) {
-    defer wg.Done()
+  i := i
+  wg.Go(func() {
     _, err := backend.Append(ctx, "shared", []SessionOp{{
       Name:        "shared-op",
       Body:        fmt.Sprintf("query Shared%d { shared%d }", i, i),
       Kind:        OperationKindQuery,
       Description: fmt.Sprintf("Shared %d", i),
     }})
     errs <- err
-  }(i)
+  })
 }

Based on learnings: In Go code (Go 1.25+), prefer using sync.WaitGroup.Go(func()) and avoid manual wg.Add(1) + go func(){ defer wg.Done() }().

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/storage/memory_backend_test.go` around lines 215 -
222, Replace the manual wg.Add(1) + go func(...){ defer wg.Done() ... }()
pattern with the sync.WaitGroup.Go method to simplify and reduce
error-proneness: change the goroutine spawn that calls backend.Append(ctx,
"shared", []SessionOp{{...}}) and writes to errs so that you call wg.Go(func() {
_, err := backend.Append(...); errs <- err }) (keeping the same use of
goroutines slice, errs channel, ctx, backend.Append and SessionOp) and remove
the explicit Add/Done calls.
demo/code-mode/mcp-stdio-proxy/main.go (1)

198-211: 💤 Low value

Consider: Inconsistent error handling between tools and resources.

Tool handlers (lines 177-195) wrap errors using SetError and return the result, while resource handlers (lines 200-210) return errors directly. This may be intentional based on MCP semantics, but verify that returning raw errors from resource handlers is the desired behavior for downstream clients.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@demo/code-mode/mcp-stdio-proxy/main.go` around lines 198 - 211, The resource
handler passed to localServer.AddResource currently returns errors directly from
upstream.awaitSession and session.ReadResource (in the closure around
upstream.awaitSession and session.ReadResource), which is inconsistent with the
tool handlers that use SetError on the result and return the result object;
update the resource handler to mirror the tool-handler pattern: on failure
create a mcp.ReadResourceResult, call SetError with a descriptive message that
includes req.Params.URI and the underlying error, and return the result (nil
error), so downstream clients receive errors via the MCP result object rather
than raw errors.
router/internal/codemode/yoko/client_test.go (1)

321-336: 💤 Low value

Consider using sync.WaitGroup.Go for cleaner goroutine management.

Based on learnings, Go 1.25+ prefers wg.Go(func() { ... }) over manual wg.Add(1) + go func() { defer wg.Done(); ... }() patterns. This applies to lines 321-336 and 364-379.

♻️ Example refactor
-	var wg sync.WaitGroup
-	wg.Add(2)
-	results := make([]*yokov1.SearchResponse, 2)
-	errs := make([]error, 2)
-	go func() {
-		defer wg.Done()
-		results[0], errs[0] = client.Search(context.Background(), "session-1", []string{"first"})
-	}()
-	<-indexStarted
-	go func() {
-		defer wg.Done()
-		results[1], errs[1] = client.Search(context.Background(), "session-2", []string{"second"})
-	}()
+	var wg sync.WaitGroup
+	results := make([]*yokov1.SearchResponse, 2)
+	errs := make([]error, 2)
+	wg.Go(func() {
+		results[0], errs[0] = client.Search(context.Background(), "session-1", []string{"first"})
+	})
+	<-indexStarted
+	wg.Go(func() {
+		results[1], errs[1] = client.Search(context.Background(), "session-2", []string{"second"})
+	})
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/yoko/client_test.go` around lines 321 - 336, Replace
the manual wg.Add(1)+go+defer wg.Done() pattern with sync.WaitGroup.Go to
simplify goroutine management: remove wg.Add(2) and instead call wg.Go twice,
each wrapping the existing goroutine bodies that call client.Search, storing
into results[i] and errs[i]; ensure you capture the loop/index by assigning a
local index variable (e.g., i := 0/1) inside each Go closure so results and errs
are written to the correct slot; apply the same change to the second pair of
goroutines around the client.Search usages (the block using indexStarted and
releaseIndex), keeping the time.Sleep and close(releaseIndex) logic unchanged.
router/internal/codemode/server/search_handler_test.go (1)

300-327: ⚡ Quick win

Consider using sync.WaitGroup.Go for cleaner goroutine management.

The manual wg.Add(1) followed by go func() { defer wg.Done(); ... }() pattern can be simplified using wg.Go(func()) in Go 1.25+, which manages Add/Done automatically. This applies to multiple test functions in this file.

Example refactor
-		wg.Add(1)
-		go func() {
-			defer wg.Done()
+		wg.Go(func() {
 			result, err := srv.handleSearch(ctx, searchToolRequest(t, "session-1", map[string]any{
 				"prompts": []string{"orders", "customers"},
 			}))
 			require.NoError(t, err)
 			results[0] = result
-		}()
+		})

Based on learnings: "In Go code (Go 1.25+), prefer using sync.WaitGroup.Go(func()) to run a function in a new goroutine."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/server/search_handler_test.go` around lines 300 -
327, Replace the manual wg.Add(1); go func() { defer wg.Done(); ... }() pattern
with sync.WaitGroup.Go to simplify goroutine management: for the two goroutines
that invoke srv.handleSearch (which use searchToolRequest and write into
results), call wg.Go(func() { ... }) so you no longer need explicit wg.Add or
defer wg.Done; keep the same body that calls srv.handleSearch with ctx and the
"session-1" prompts, assigns into results[0]/results[1], and asserts NoError,
and leave the rest of the test (waiting on searcher.block, close, wg.Wait, and
assertions on searcher.callCount and results equality) unchanged.
router/internal/codemode/storage/redis_backend_test.go (1)

182-207: ⚡ Quick win

Consider using sync.WaitGroup.Go for the concurrent test.

Per repository conventions for Go 1.25+, the manual WaitGroup pattern can be simplified:

Suggested refactor
 	var wg sync.WaitGroup
 	errs := make(chan error, goroutines)
 	for i := 0; i < goroutines; i++ {
-		wg.Add(1)
-		go func(worker int) {
-			defer wg.Done()
+		wg.Go(func(worker int) func() {
+			return func() {
 			ops := make([]SessionOp, 0, opsPerGoroutine)
 			for j := 0; j < opsPerGoroutine; j++ {
 				ops = append(ops, SessionOp{Name: fmt.Sprintf("op_%02d_%02d", worker, j), Body: "query { ok }", Kind: OperationKindQuery})
 			}
 			_, err := backend.Append(ctx, "session-1", ops)
 			errs <- err
-		}(i)
+			}
+		}(i))
 	}

Based on learnings: "In Go code (Go 1.25+), prefer using sync.WaitGroup.Go(func()) to run a function in a new goroutine."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/storage/redis_backend_test.go` around lines 182 -
207, The test uses the manual WaitGroup pattern (wg.Add, go func, defer wg.Done)
to launch goroutines; update it to use sync.WaitGroup.Go for Go 1.25+ by
replacing the wg.Add + go { defer wg.Done(); ... } pattern with wg.Go(func() {
... }) to spawn each worker; keep the errs channel and the Append call
(backend.Append) and ensure the loop worker variable is captured correctly (pass
worker into the closure or use the loop variable safely) so opsPerGoroutine,
goroutines, errs, backend.Append and backend.opsKey("session-1") behavior remain
unchanged.
router/internal/codemode/sandbox/errors.go (1)

179-200: 💤 Low value

Consider validating unknown Base64 characters in VLQ decoding.

If segment contains a character not in base64VLQ, the map lookup silently returns 0, potentially producing incorrect decoded values. Since source maps should always contain valid Base64 VLQ, this is unlikely to cause issues in practice.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/sandbox/errors.go` around lines 179 - 200, The
decodeVLQSegment function silently treats unknown Base64 characters as zero due
to direct map lookup from base64VLQ; change the lookup to check presence (e.g.,
value, ok := base64VLQ[r]) and return a descriptive error when ok is false,
referencing the offending rune/character, so callers can fail fast instead of
producing incorrect decoded values; update any callers/tests if they expect nil
error on bad input and ensure the error mentions decodeVLQSegment, the invalid
character, and that it violates VLQ/Base64 input.
router/internal/codemode/server/lifecycle.go (1)

173-182: 💤 Low value

Redundant header clone.

req.Clone(ctx) already deep-copies request headers. The explicit cloned.Header = req.Header.Clone() on line 179 is unnecessary.

♻️ Proposed simplification
 func (t staticBearerRoundTripper) RoundTrip(req *http.Request) (*http.Response, error) {
 	next := t.next
 	if next == nil {
 		next = http.DefaultTransport
 	}
 	cloned := req.Clone(req.Context())
-	cloned.Header = req.Header.Clone()
 	cloned.Header.Set("Authorization", "Bearer "+t.token)
 	return next.RoundTrip(cloned)
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/server/lifecycle.go` around lines 173 - 182, The
RoundTrip implementation in staticBearerRoundTripper redundantly clones headers
after calling req.Clone; remove the extra cloned.Header = req.Header.Clone()
line so the code uses the headers already copied by req.Clone(req.Context()),
then set the Authorization header on the cloned request (cloned.Header.Set(...))
and call next.RoundTrip(cloned) as before; this touches the
staticBearerRoundTripper.RoundTrip function only.
router/internal/codemode/sandbox/sandbox_test.go (1)

623-635: ⚡ Quick win

Use wg.Go instead of manual wg.Add/go func pattern.

Per repository guidance for Go 1.25+, prefer sync.WaitGroup.Go(func()) to let WaitGroup manage Add/Done automatically.

♻️ Suggested fix
 	var wg sync.WaitGroup
 	for range 5 {
-		wg.Add(1)
-		go func() {
-			defer wg.Done()
+		wg.Go(func() {
 			_, err := s.Execute(context.Background(), ExecuteRequest{
 				SessionID: "s1",
 				ToolNames: []string{"ping"},
 				WrappedJS: `async () => await tools.ping()`,
 			})
 			assert.NoError(t, err)
-		}()
+		})
 	}

Based on learnings: "In Go code (Go 1.25+), prefer using sync.WaitGroup.Go(func()) to run a function in a new goroutine, letting the WaitGroup manage Add/Done automatically."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/sandbox/sandbox_test.go` around lines 623 - 635, The
test uses manual wg.Add(1) and launching a goroutine that defers wg.Done;
replace that pattern by calling wg.Go(func() { ... }) so the WaitGroup manages
Add/Done automatically; update the loop that calls s.Execute with ExecuteRequest
(SessionID "s1", ToolNames ["ping"], WrappedJS `async () => await tools.ping()`)
to use wg.Go, removing the explicit wg.Add and defer wg.Done lines and keeping
the assert.NoError(t, err) inside the function.
router/internal/codemode/harness/envelope.go (1)

63-73: 💤 Low value

Potential stack overflow on deeply nested error causes.

The cloneErrorEnvelope function recursively clones the Cause chain without depth limiting. While the sandbox itself truncates cause chains at depth 5 (per TestExecuteErrorCauseChainTruncatesAfterDepthFive), if a malformed or externally constructed ErrorEnvelope with a deeper chain is passed, this could cause a stack overflow.

🛡️ Optional: Add depth guard to match sandbox behavior
-func cloneErrorEnvelope(err *ErrorEnvelope) *ErrorEnvelope {
+func cloneErrorEnvelope(err *ErrorEnvelope) *ErrorEnvelope {
+	return cloneErrorEnvelopeWithDepth(err, 0)
+}
+
+func cloneErrorEnvelopeWithDepth(err *ErrorEnvelope, depth int) *ErrorEnvelope {
 	if err == nil {
 		return nil
 	}
+	var cause *ErrorEnvelope
+	if depth < 5 {
+		cause = cloneErrorEnvelopeWithDepth(err.Cause, depth+1)
+	}
 	return &ErrorEnvelope{
 		Name:    err.Name,
 		Message: err.Message,
 		Stack:   err.Stack,
-		Cause:   cloneErrorEnvelope(err.Cause),
+		Cause:   cause,
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/harness/envelope.go` around lines 63 - 73,
cloneErrorEnvelope currently recurses unbounded through ErrorEnvelope.Cause and
can overflow on deeply nested/malformed chains; modify cloneErrorEnvelope (or
create a helper) to enforce a maximum depth (match sandbox behavior, e.g., depth
limit 5) when cloning Cause chains—stop cloning further causes once the depth
limit is reached (return nil or a truncated marker) to prevent stack overflow
while keeping Name/Message/Stack for cloned nodes; reference the
cloneErrorEnvelope function and the ErrorEnvelope.Cause field when making the
change.
router/internal/codemode/server/search_handler.go (2)

56-61: 💤 Low value

Singleflight result type assertion without error handling.

The singleflight's Do returns (any, error, bool) but the error is ignored with _. If handleSearchStateful were to return an error in the future, this would panic on the type assertion.

Currently handleSearchStateful always returns *mcp.CallToolResult and never errors (the function signature returns only a result), so this is safe. However, the pattern is fragile if the function signature changes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/server/search_handler.go` around lines 56 - 61, The
code ignores the error from s.searchGroup.Do and directly type-asserts the
result to *mcp.CallToolResult, which can panic if Do returns an error or a
different type; update the call to capture the error and the shared bool: value,
err, _ := s.searchGroup.Do(key, func() (any, error) { return
s.handleSearchStateful(ctx, sessionID, prompts), nil }); if err != nil { return
nil, err } then safely type-assert value to *mcp.CallToolResult (checking the
assertion and returning an error if it fails) so
searchSingleFlightKey/s.searchGroup.Do/handleSearchStateful use a robust pattern
that won’t panic if signatures change.

224-248: 💤 Low value

extractGraphQLVariablesBlock may return incorrect results for complex operation syntax.

The function uses simple byte scanning without proper GraphQL parsing. While it handles nested parentheses, it doesn't account for:

  • Comments containing parentheses: query Foo /* ( */ { ... }
  • String literals in directives: query Foo @deprecated(reason: "(old)") { ... }

However, since this extracts from generated operation bodies (presumably well-formed), these edge cases may not occur in practice.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/server/search_handler.go` around lines 224 - 248,
extractGraphQLVariablesBlock uses a naive byte scan and can be tripped by
parentheses inside comments or string literals; update
extractGraphQLVariablesBlock to properly skip GraphQL comments and string/
block-string literals when scanning for the variable parentheses or, better,
replace this ad-hoc scanner by using a real GraphQL parser/tokenizer (e.g.,
github.com/graphql-go/graphql or any existing project tokenizer) to locate the
operation variable definitions; specifically, implement scanning that recognizes
and ignores // and /* */ comments and both "..." and """...""" string literals
(or call a tokenizer) before treating '(' and ')' as structural tokens so the
function reliably returns the variable block.
router/internal/codemode/sandbox/sandbox_preamble.js (1)

45-55: ⚡ Quick win

Use strict null checks in the helper functions.

notNull and compact currently rely on == / !=. This repo's JS guidelines require strict equality, so these checks should be expanded explicitly to === null || === undefined and the corresponding !== forms.

Suggested fix
 globalThis.notNull = (v, msg) => {
-  if (v == null) throw new Error(msg ?? "notNull: value was null/undefined");
+  if (v === null || v === undefined) throw new Error(msg ?? "notNull: value was null/undefined");
   return v;
 };
 globalThis.compact = (v) => {
-  if (Array.isArray(v)) return v.map(compact).filter((x) => x != null);
+  if (Array.isArray(v)) return v.map(compact).filter((x) => x !== null && x !== undefined);
   if (v && typeof v === "object") {
     const out = {};
     for (const k in v) {
       const c = compact(v[k]);
-      if (c != null) out[k] = c;
+      if (c !== null && c !== undefined) out[k] = c;
     }
     return out;
   }

As per coding guidelines "Always use strict equality (=== and !==) instead of loose equality (== and !=)."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/sandbox/sandbox_preamble.js` around lines 45 - 55,
Update the helper functions to use strict null/undefined checks: in
globalThis.notNull replace the loose check (v == null) with an explicit (v ===
null || v === undefined); in globalThis.compact change the Array branch filter
to use (x !== null && x !== undefined) instead of (x != null), and change the
object guard from (v && typeof v === "object") to (v !== null && v !== undefined
&& typeof v === "object"); likewise when assigning c check c !== null && c !==
undefined before setting out[k]. Keep the same return/throw behavior and
function names (globalThis.notNull and globalThis.compact).
router/internal/codemode/storage/redis_backend.go (1)

345-350: ⚡ Quick win

Non-atomic TTL setting could leave keys without expiry.

The setWithTTL method performs Set and Expire as separate operations. If the process crashes or Redis connection fails between these calls, the key could remain without a TTL. Consider using Set with the EX option for atomic TTL setting.

🔧 Suggested fix
 func (b *RedisBackend) setWithTTL(ctx context.Context, key string, value []byte) error {
-	if err := b.client.Set(ctx, key, value, 0).Err(); err != nil {
-		return err
-	}
-	return b.client.Expire(ctx, key, b.sessionTTL).Err()
+	return b.client.Set(ctx, key, value, b.sessionTTL).Err()
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/storage/redis_backend.go` around lines 345 - 350,
The current setWithTTL in RedisBackend calls Set and then Expire separately
which can leave keys without TTL if interrupted; change it to perform an atomic
set-with-ttl by passing b.sessionTTL as the expiration argument to b.client.Set
(use the client Set overload that accepts a time.Duration/expiration or the EX
option), remove the separate b.client.Expire call, and propagate any error from
the single Set call so key creation and TTL are applied in one atomic operation.
router/internal/codemode/sandbox/sandbox.go (1)

111-143: 💤 Low value

Dead code: MaxConcurrent validation after defaults.

The check on lines 113-115 will never trigger an error because withDefaults (called on line 112) already sets MaxConcurrent to defaultMaxConcurrent (4) if it's <= 0. The validation is defensive but unreachable.

🧹 Consider removing dead validation
 func New(cfg Config) (*Sandbox, error) {
 	cfg = withDefaults(cfg)
-	if cfg.MaxConcurrent <= 0 {
-		return nil, errors.New("sandbox max concurrent must be positive")
-	}
 	if cfg.StorageLookup == nil {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/sandbox/sandbox.go` around lines 111 - 143, The
MaxConcurrent validation in New is dead because withDefaults already enforces
cfg.MaxConcurrent > 0 (setting it to defaultMaxConcurrent), so remove the
redundant check (the if block that returns an error) to eliminate unreachable
code; update or remove any related error message handling in New and keep
withDefaults as the single source of truth for defaulting/validation of
MaxConcurrent.
router/internal/codemode/sandbox/execute.go (1)

153-181: ⚖️ Poor tradeoff

Potential goroutine leak in awaitWithContext is bounded but worth noting.

The goroutine spawned on line 159 may not return if value.Await() hangs even after rt.Close(). The 100ms timeout on line 177 abandons the goroutine without cleanup. While this is documented as acceptable (bounded by GC and per-call freshness), consider adding a metric or log for monitoring abandoned await goroutines in production.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/sandbox/execute.go` around lines 153 - 181, In
awaitWithContext, when the ctx.Done() path abandons the spawned goroutine (the
one calling value.Await()) after the 100ms timeout, add an observability signal:
emit a metric and/or log an error including context (e.g., indicate awaiting
promise abandoned, include the promise/value identity and ctx.Err()) so
production can monitor these abandoned await goroutines; update the code around
the ctx.Done() branch (referencing awaitWithContext, the done channel,
value.Await(), and rt.Close()) to increment a counter and/or call
processLogger.Errorf(...) before returning nil, ctx.Err().
router/internal/codemode/storage/memory_backend.go (1)

258-280: ⚖️ Poor tradeoff

Bundle truncation could be expensive for large session op lists.

The renderCapped function iteratively removes operations from the end and re-renders until the bundle fits within maxBundleBytes. For sessions with many operations and an expensive renderer, this could result in O(n) render calls.

Consider a binary search approach if performance becomes an issue, though for typical session sizes this linear approach is likely acceptable.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/storage/memory_backend.go` around lines 258 - 280,
The renderCapped function currently re-renders in a linear loop removing one op
at a time (O(n) renders); change it to a binary-search style search over the
number of ops to keep to reduce render calls: keep the initial full-render check
using b.renderer.Render(ctx, ops, b.Schema()), then if too large perform a
binary search on the range [0, len(ops)] calling b.renderer.Render(ctx,
ops[:mid], b.Schema()) to find the largest mid such that len(truncated) <=
b.maxBundleBytes, preserving ctx.Err() checks and error propagation; update
logic in renderCapped (and references to b.renderer.Render, b.maxBundleBytes,
ops) so it returns the best truncated bundle or an empty string on failure.
router/internal/codemode/sandbox/host.go (1)

168-177: 💤 Low value

GraphQL success determination may be misleading.

The success attribute is set based on HTTP status code (resp.StatusCode < 400), but GraphQL APIs typically return errors in the response body with a 200 OK status. The graphQLErrors check on line 169-172 logs errors but doesn't influence the codemode.op.success attribute. This could cause misleading observability data where a request with GraphQL errors is marked as successful.

Consider also checking for GraphQL errors when determining success:

🔧 Suggested improvement
 	result := normalizeGraphQLResponse(resp.StatusCode, respBody)
-	if errorsJSON := graphQLErrors(result); errorsJSON != "" {
+	errorsJSON := graphQLErrors(result)
+	hasGraphQLErrors := errorsJSON != ""
+	if hasGraphQLErrors {
 		span.SetAttributes(attribute.String("codemode.graphql.errors", errorsJSON))
 		observability.LogToolInvocationFailure(s.cfg.Logger, state.req.SessionID, name, fmt.Errorf("graphql errors: %s", errorsJSON))
 	}
-	span.SetAttributes(attribute.Bool("codemode.op.success", resp.StatusCode < 400))
+	span.SetAttributes(attribute.Bool("codemode.op.success", resp.StatusCode < 400 && !hasGraphQLErrors))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/sandbox/host.go` around lines 168 - 177, The
codemode.op.success attribute is currently set only from HTTP status
(resp.StatusCode < 400) which ignores GraphQL-level errors; update the logic in
the block using normalizeGraphQLResponse and graphQLErrors so success is false
if either resp.StatusCode >= 400 or graphQLErrors(result) returns a non-empty
string: compute errorsJSON := graphQLErrors(result) first, set
span.SetAttributes(attribute.Bool("codemode.op.success", resp.StatusCode < 400
&& errorsJSON == "")), and keep the existing
observability.LogToolInvocationFailure calls to include errorsJSON and non-2xx
statuses (use the existing names normalizeGraphQLResponse, graphQLErrors,
span.SetAttributes, and observability.LogToolInvocationFailure).
router/internal/codemode/server/server.go (1)

325-344: 💤 Low value

Cross-origin protection is bypassed for all paths.

The AddInsecureBypassPattern("/{path...}") on line 327 effectively disables cross-origin protection for the entire MCP server. While this may be intentional for MCP client compatibility, it should be documented or made configurable for production deployments.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/server/server.go` around lines 325 - 344, The
handler currently unconditionally calls AddInsecureBypassPattern("/{path...}")
which disables CrossOriginProtection for all paths; update Server.handler to
only add that insecure bypass when explicitly enabled via a config flag or
environment-controlled field (e.g., add a boolean like
Server.allowInsecureMCPBypass or use an existing config field) and otherwise
omit the AddInsecureBypassPattern call (or restrict it to a specific safe
pattern); reference the CrossOriginProtection creation and the call to
AddInsecureBypassPattern in Server.handler and make the behavior configurable
and documented for production use.

Comment on lines +1 to +22
module github.com/wundergraph/cosmo/demo/code-mode/yoko-mock

go 1.25.0

require (
connectrpc.com/connect v1.19.1
github.com/dgraph-io/ristretto/v2 v2.4.0
github.com/stretchr/testify v1.11.1
github.com/wundergraph/cosmo/router v0.0.0
google.golang.org/protobuf v1.36.10
)

require (
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/dustin/go-humanize v1.0.1 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
golang.org/x/sys v0.40.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)

replace github.com/wundergraph/cosmo/router => ../../../router
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Explicitly pin go.opentelemetry.io/otel/sdk to >= v1.43.0 to avoid PATH-hijacking CVEs.

go.opentelemetry.io/otel/sdk versions v1.21.0–v1.39.0 are vulnerable to PATH hijacking (arbitrary code execution) on macOS/Darwin; the first patch was v1.40.0. A follow-up incomplete fix (CVE-2026-39883) left BSD platforms vulnerable through v1.42.x; upgrading to v1.43.0 or above fully remediates both.

This module's github.com/wundergraph/cosmo/router dependency (replaced with the local path) brings in google.golang.org/grpc, which can transitively resolve a vulnerable otel/sdk version unless explicitly overridden here.

🛡️ Proposed fix
 require (
 	github.com/cespare/xxhash/v2 v2.3.0 // indirect
 	github.com/davecgh/go-spew v1.1.1 // indirect
 	github.com/dustin/go-humanize v1.0.1 // indirect
+	go.opentelemetry.io/otel/sdk v1.43.0 // indirect
 	github.com/pmezard/go-difflib v1.0.0 // indirect
 	golang.org/x/sys v0.40.0 // indirect
 	gopkg.in/yaml.v3 v3.0.1 // indirect
 )

Based on learnings: "In every module go.mod, explicitly pin go.opentelemetry.io/otel/sdk to v1.40.0 or later. Do not rely on transitive resolution for this dependency."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@demo/code-mode/yoko-mock/go.mod` around lines 1 - 22, The go.mod currently
relies on transitive versions that may pull a vulnerable
go.opentelemetry.io/otel/sdk; explicitly add a require entry for
go.opentelemetry.io/otel/sdk v1.43.0 (or later) in the existing require block to
pin the safe version, then run go mod tidy to update the go.sum and vendor
state; locate the module declaration and the require blocks in this file (module
github.com/wundergraph/cosmo/demo/code-mode/yoko-mock and the existing
require(...) section) and insert the new require line alongside the other direct
dependencies.

Comment on lines +157 to +186
s.mu.Lock()
if existing, ok := s.schemas[id]; ok {
s.mu.Unlock()
existing.mu.RLock()
existingSession := existing.sessionID
existing.mu.RUnlock()
log.Printf("Index schema_id=%s reused dir=%s session_id=%s", id, existing.dir, existingSession)
return connect.NewResponse(&yokov1.IndexResponse{SchemaId: id}), nil
}
s.mu.Unlock()

dir, err := os.MkdirTemp("", "yoko-schema-"+id+"-")
if err != nil {
return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("create schema temp dir: %w", err))
}
if err := os.WriteFile(filepath.Join(dir, "schema.graphql"), []byte(schemaSDL), 0o600); err != nil {
_ = os.RemoveAll(dir)
return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("write schema.graphql: %w", err))
}

sessionID, err := s.runCodexIndex(ctx, dir)
if err != nil {
_ = os.RemoveAll(dir)
return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("codex pre-warm: %w", err))
}

entry := &schemaEntry{dir: dir, sessionID: sessionID}
s.mu.Lock()
s.schemas[id] = entry
s.mu.Unlock()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Serialize first-time indexing per schema ID.

Two concurrent Index calls for the same SDL can both miss s.schemas[id], each create a temp dir and Codex session, and then race on s.schemas[id] = entry. The later write wins, which breaks the idempotent contract and leaks the loser’s temp dir/session state. Re-check before storing or gate this path with a per-schemaID singleflight.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@demo/code-mode/yoko-mock/main.go` around lines 157 - 186, Concurrent Index
calls can race creating temp dirs and sessions; after creating dir and session
via os.MkdirTemp and s.runCodexIndex you must re-check s.schemas[id] (and/or use
a singleflight) before storing to avoid leaking the loser’s dir/session. Modify
the code around schemaEntry creation: acquire s.mu, check if entry already
exists (s.schemas[id]) and if so, release lock, clean up the newly created
dir/session (remove dir and close/terminate session if applicable), and return
the existing schema response; otherwise set s.schemas[id]=entry and release the
lock. Ensure cleanup handles both the temp dir and any resources created by
runCodexIndex when abandoning the new entry.

Comment on lines +64 to +66
employee.Tag = tag
employee.UpdatedAt = time.Now().String()
return employee, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid returning the shared in-memory employee pointer.

At Line 66, returning employee leaks internal mutable state after unlocking, which can lead to concurrent read/write races and cross-request aliasing. Keep the in-place update, but return a copy.

Proposed fix
 		if id == employee.ID {
 			employee.Tag = tag
 			employee.UpdatedAt = time.Now().String()
-			return employee, nil
+			out := *employee
+			if employee.Details != nil {
+				d := *employee.Details
+				out.Details = &d
+			}
+			return &out, nil
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
employee.Tag = tag
employee.UpdatedAt = time.Now().String()
return employee, nil
employee.Tag = tag
employee.UpdatedAt = time.Now().String()
out := *employee
if employee.Details != nil {
d := *employee.Details
out.Details = &d
}
return &out, nil
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@demo/pkg/subgraphs/employees/subgraph/schema.resolvers.go` around lines 64 -
66, The code currently returns the shared in-memory pointer `employee`, leaking
internal mutable state; instead, after performing the in-place update (setting
`employee.Tag` and `employee.UpdatedAt`), create and return a copy to avoid
aliasing and races: make a new value (e.g., `copied := *employee`) and if
`Employee` contains reference fields (slices, maps, pointers) deep-copy those
(or allocate new slices/maps and copy elements) and return the address/value of
`copied` rather than the original `employee` pointer from the resolver function
in schema.resolvers.go.

Comment thread router-tests/go.mod
require (
connectrpc.com/connect v1.19.1
github.com/MicahParks/jwkset v0.11.0
github.com/alicebob/miniredis/v2 v2.34.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Checking otel/sdk pins in router-tests/go.mod:"
rg -n 'go\.opentelemetry\.io/otel/sdk' --glob 'router-tests/go.mod'

echo
echo "Cross-module otel/sdk visibility:"
rg -n 'go\.opentelemetry\.io/otel/sdk' --glob '**/go.mod'

Repository: wundergraph/cosmo

Length of output: 1898


Update go.opentelemetry.io/otel/sdk to v1.40.0 or later in router-tests/go.mod.

The module currently pins go.opentelemetry.io/otel/sdk v1.39.0 (line 36), which is below the required minimum version. Additionally, the replace directive at line 213 downgrades this to v1.28.0, further violating the dependency pinning policy.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router-tests/go.mod` at line 8, Update the go.mod to ensure the OpenTelemetry
SDK dependency meets the minimum required version: change the require entry for
go.opentelemetry.io/otel/sdk to v1.40.0 or later and remove or update any
replace directive that forces go.opentelemetry.io/otel/sdk to v1.28.0 (the
replace at the bottom of the file) so the module is not downgraded; after
updating, run `go mod tidy` to verify the final resolved version is v1.40.0+ and
that there are no remaining replace entries forcing older otel/sdk versions.

Comment thread router/go.mod
golang.org/x/time v0.9.0
)

require github.com/tetratelabs/wazero v1.9.0 // indirect
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Checking otel/sdk pins in router/go.mod:"
rg -n 'go\.opentelemetry\.io/otel/sdk' --glob 'router/go.mod'

echo
echo "Checking all module go.mod files for otel/sdk pins:"
rg -n 'go\.opentelemetry\.io/otel/sdk' --glob '**/go.mod'

Repository: wundergraph/cosmo

Length of output: 1886


Update go.opentelemetry.io/otel/sdk to v1.40.0 or later in both require and replace directives.

Line 46 pins go.opentelemetry.io/otel/sdk to v1.39.0, and line 197 replaces it with v1.28.0—both below the required v1.40.0 minimum. While updating dependencies here, align these pins with the project policy.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/go.mod` at line 95, Update the go.mod entries that pin
go.opentelemetry.io/otel/sdk: change the require directive currently at v1.39.0
and the replace directive currently at v1.28.0 so both use v1.40.0 or later;
locate and edit the lines referencing module name "go.opentelemetry.io/otel/sdk"
in the require and replace blocks and set their versions to at least v1.40.0 to
satisfy the project policy and keep them consistent.

Comment on lines +87 to +116
func renderBundleBlocks(blocks []string, omitted int) string {
var b strings.Builder
b.WriteString(graphQLErrorAlias)
b.WriteByte('\n')
b.WriteString(responseAlias)
b.WriteString("\n\n")

if len(blocks) == 0 {
b.WriteString("declare const tools: {};")
} else {
b.WriteString("declare const tools: {\n")
for i, block := range blocks {
if i > 0 {
b.WriteString("\n\n")
}
b.WriteString(indentBlock(block, " "))
}
b.WriteString("\n};")
}

b.WriteString("\n\n")
b.WriteString(notNullHelper)
b.WriteByte('\n')
b.WriteString(compactHelper)

if omitted > 0 {
fmt.Fprintf(&b, "\n// truncated: %d ops omitted", omitted)
}

return b.String()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep the bundle prelude in sync with the persisted-ops contract.

renderBundleBlocks no longer emits the Known limitation: union and interface selections are typed as unknown. banner, but the persisted-bundle expectations in router-tests/code_mode_named_ops_test.go still include it. That means bundles rendered here will drift from the tested/output contract.

Suggested fix
 const (
 	defaultMaxBundleBytes = 64 * 1024
 	graphQLErrorAlias     = "type GraphQLError = { message: string; path?: (string | number)[]; extensions?: Record<string, unknown> };"
 	responseAlias         = "type R<T> = Promise<{ data: T | null; errors?: GraphQLError[] }>;"
+	knownLimitationNote   = "// Known limitation: union and interface selections are typed as unknown."
 	notNullHelper         = "declare function notNull<T>(value: T | null | undefined, message?: string): T;"
 	compactHelper         = "declare function compact<T>(value: T): T;"
 )
@@
 	b.WriteString(graphQLErrorAlias)
 	b.WriteByte('\n')
 	b.WriteString(responseAlias)
-	b.WriteString("\n\n")
+	b.WriteByte('\n')
+	b.WriteString(knownLimitationNote)
+	b.WriteString("\n\n")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/internal/codemode/tsgen/tsgen.go` around lines 87 - 116,
renderBundleBlocks no longer emits the "Known limitation: union and interface
selections are typed as unknown." banner, causing a mismatch with the
persisted-ops contract; update renderBundleBlocks to prepend or include that
exact banner string (the same text used in persisted bundles/tests) in the
output alongside graphQLErrorAlias/responseAlias so rendered bundles match
router-tests/code_mode_named_ops_test.go expectations. Ensure the banner
placement and wording exactly match the contracted text and that it appears when
appropriate (i.e., in the same conditions the persisted-bundle expects) within
the function renderBundleBlocks.

Comment on lines +144 to +157
func (s *GraphQLSchemaServer) ensureCodeModeSandbox() (*sandbox.Sandbox, error) {
if s.codeModeSandbox != nil {
return s.codeModeSandbox, nil
}
sb, err := sandbox.New(sandbox.Config{
RouterGraphQLEndpoint: s.routerGraphQLEndpoint,
StorageLookup: s.codeModeStorageLookup,
Logger: s.logger.With(zap.String("component", "code_mode_sandbox")),
})
if err != nil {
return nil, fmt.Errorf("create code mode sandbox: %w", err)
}
s.codeModeSandbox = sb
return sb, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard sandbox lazy initialization against concurrent calls.

handleCodeModeRunJS can run in parallel, and this check-then-assign path lets multiple first requests create separate sandboxes before one overwrites s.codeModeSandbox. That leaks the loser and makes initialization nondeterministic. Please protect this with a mutex or sync.Once.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/pkg/mcpserver/code_mode.go` around lines 144 - 157, The lazy
initialization of s.codeModeSandbox in ensureCodeModeSandbox is racy when
handleCodeModeRunJS runs concurrently; protect it with a mutex-backed
double-check: add a field like codeModeMu sync.Mutex on GraphQLSchemaServer,
then in ensureCodeModeSandbox do the existing fast-path nil check, acquire
s.codeModeMu, re-check s.codeModeSandbox, create the sandbox (sb, err :=
sandbox.New(...)) only if still nil, assign s.codeModeSandbox = sb, release the
mutex, and return sb or the error—this fixes the race without changing the
external behavior (alternatively you may use sync.Once plus a stored initErr
field if you prefer Once semantics).

Comment on lines +63 to +102
for _, h := range m.handlers {
h.RegisterRoutes(mux)

// Upstream-bound handlers carry their own SDL — load it now so their
// tools are ready before any client connects. Supergraph-bound handlers
// wait for the router's Reload(supergraphSchema, ...) call.
if h.HasUpstreamSchema() {
doc, err := parseSDL(h.upstreamSchemaSDL)
if err != nil {
return fmt.Errorf("mcp server %q: parse upstream SDL: %w", h.graphName, err)
}
if err := h.Reload(doc, nil); err != nil {
return fmt.Errorf("mcp server %q: initial reload: %w", h.graphName, err)
}
}

// Per-collection operations directory watcher: hot-reloads tools when
// .graphql / .gql files are added, modified, or removed.
// Supergraph-bound handlers without an initial Reload yet still benefit —
// the watcher is no-op until the first Reload populates a schema, after
// which it picks up file changes on the next tick.
watchEnabled, interval := h.WatchSettings()
if watchEnabled && h.OperationsDir() != "" {
handler := h // capture loop variable for the callback
err := WatchOperationsDir(handler.Context(), handler.OperationsDir(), interval, func() {
if err := handler.ReloadOperations(); err != nil {
m.logger.Warn("hot-reload of MCP operations failed",
zap.String("name", handler.graphName),
zap.String("path", handler.path),
zap.Error(err))
return
}
m.logger.Info("MCP operations hot-reloaded",
zap.String("name", handler.graphName),
zap.String("path", handler.path))
}, m.logger.With(zap.String("mcp_server_name", handler.graphName)))
if err != nil {
return fmt.Errorf("mcp server %q: start operations watcher: %w", h.graphName, err)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Tear down started watchers on partial startup failure.

Start launches per-handler watchers as it iterates. If a later handler then fails parseSDL, Reload, or watcher setup, Start returns an error while the earlier watchers keep running unless the caller remembers to call Stop on a failed start. Consider validating/priming all handlers first, or canceling any already-started handlers before returning the error.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/pkg/mcpserver/multi_server.go` around lines 63 - 102, Start currently
begins per-handler watchers as it loops and may return on a later error while
earlier watchers keep running; fix by either (A) priming/validating all handlers
before starting any watchers (run parseSDL and h.Reload for HasUpstreamSchema()
for every handler first), or (B) track started watcher resources and teardown on
error (maintain a startedHandlers/startedWatchCancels slice while iterating and
on any parseSDL/Reload/WatchOperationsDir error iterate it to call the
appropriate stop/cancel method on each handler or invoke m.Stop() to cleanly
stop already-started handlers) so no watcher remains running after Start returns
an error; use the existing symbols parseSDL, h.Reload, WatchOperationsDir, and
m.Stop/handler stop method names to locate and implement the changes.

Comment on lines +69 to +73
if pendingReload {
pendingReload = false
logger.Info("operations directory changed; reloading tools and notifying connected clients")
onChange()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard onChange against panics to keep the watcher alive.

A panic in onChange() will terminate the goroutine, silently disabling further reloads.

🛡️ Proposed hardening
 				if pendingReload {
 					pendingReload = false
 					logger.Info("operations directory changed; reloading tools and notifying connected clients")
-					onChange()
+					func() {
+						defer func() {
+							if r := recover(); r != nil {
+								logger.Error("operations reload callback panicked", zap.Any("panic", r))
+							}
+						}()
+						onChange()
+					}()
 				}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/pkg/mcpserver/operations_watcher.go` around lines 69 - 73, The call to
onChange() inside the pendingReload block can panic and kill the watcher
goroutine; wrap the invocation in a panic-safe guard (e.g., defer/recover) so
the goroutine stays alive and future reloads still occur. Specifically, in the
pendingReload handling where logger.Info(...) and onChange() are called, invoke
onChange through a small safety wrapper or add a deferred recover around the
call that logs the recovered panic (including the panic value and stack) via
logger.Error and continues execution; do not change pendingReload semantics or
suppress the original logging.

Comment on lines +81 to +107
// fileFingerprint identifies a file's relevant state (modification time + size).
type fileFingerprint struct {
modTime time.Time
size int64
}

// snapshotOperationFiles returns a map of path → fingerprint for every .graphql / .gql
// file under dir. Used by the watcher to detect added, removed, or modified operations.
func snapshotOperationFiles(dir string) (map[string]fileFingerprint, error) {
out := map[string]fileFingerprint{}
walkErr := filepath.WalkDir(dir, func(path string, d fs.DirEntry, err error) error {
if err != nil {
return err
}
if d.IsDir() {
return nil
}
ext := strings.ToLower(filepath.Ext(path))
if ext != ".graphql" && ext != ".gql" {
return nil
}
info, err := d.Info()
if err != nil {
return nil // skip unreadable entries; treat as if absent
}
out[path] = fileFingerprint{modTime: info.ModTime(), size: info.Size()}
return nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

modTime + size fingerprints can miss real file edits.

If content changes without size change and filesystem timestamp granularity is coarse, changes can be missed and tools won’t reload.

🔍 More reliable fingerprinting option
 type fileFingerprint struct {
 	modTime time.Time
 	size    int64
+	hash    string
 }
...
 		info, err := d.Info()
 		if err != nil {
 			return nil // skip unreadable entries; treat as if absent
 		}
-		out[path] = fileFingerprint{modTime: info.ModTime(), size: info.Size()}
+		sum, err := hashFile(path)
+		if err != nil {
+			return nil
+		}
+		out[path] = fileFingerprint{modTime: info.ModTime(), size: info.Size(), hash: sum}
 		return nil
 	})

Also applies to: 115-122

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/pkg/mcpserver/operations_watcher.go` around lines 81 - 107, The
current fileFingerprint (modTime, size) in snapshotOperationFiles can miss
content-only edits; change the fingerprinting to include a content hash: update
the fileFingerprint type (or add a new field like digest string) and in
snapshotOperationFiles read the file contents (e.g., os.ReadFile) and compute a
stable hash (SHA256) to store alongside or instead of modTime/size; use that
digest when comparing snapshots (also update the other snapshot-using code
around the similar block at lines 115-122 to compare digests) so edits that
don't change size or have coarse mtime granularity are detected reliably.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants