Skip to content

fix(review-pr): schema mismatch, chunk size, and timeout diagnostics#202

Open
derekmisler wants to merge 2 commits intodocker:mainfrom
derekmisler:fix/review-pr-schema-chunk-timeout
Open

fix(review-pr): schema mismatch, chunk size, and timeout diagnostics#202
derekmisler wants to merge 2 commits intodocker:mainfrom
derekmisler:fix/review-pr-schema-chunk-timeout

Conversation

@derekmisler
Copy link
Copy Markdown
Contributor

Summary

Three targeted fixes for the runaway-drafter timeout seen in docker/sailor PR #754.

Failing run: https://github.com/docker/sailor/actions/runs/25577728034/job/75088929732

The root cause: the drafter sub-agent looped for 1799 s re-reading the same 4 files ~140× each because its enforced structured_output schema rejected every emit attempt, causing the model to keep doing more file reads instead of returning results.


Fix 1 — Schema mismatch in review-pr/agents/pr-review.yaml

Root cause of the infinite loop. The root agent's transfer_task delegation message did not explicitly document the drafter's enforced structured_output schema field names. When the LLM generated the task string it used title/body (natural GitHub review terminology) instead of the schema-required category/issue/details/in_diff. Every emit attempt was rejected by schema validation, causing the drafter to loop indefinitely reading the same source files ~140× each until SIGKILL at 1800 s.

Changes:

  • Add a CRITICAL block in the ## CRITICAL: How to delegate to the drafter section documenting the exact drafter response schema, with an explicit warning never to use title/body
  • Update step 4d (aggregate findings) to reference the correct field names
  • Update step 5 (parse drafter response) to explicitly name issue and details
  • Update step 9 (build inline comments) to list all correct finding fields

The drafter's structured_output schema itself is unchanged — it is the contract.


Fix 2 — Lower soft chunk target 1000 → 600 lines (review-pr/action.yml)

Chunk 1 of sailor PR #754 was 1229 lines of dense Rust snapshot/restore code — at the edge of what a drafter can process in one turn, amplifying the schema-loop impact. Lowering the soft target produces smaller, more manageable chunks.

  • Soft target: 1000 → 600 lines (split triggers when this is exceeded and directory changes)
  • Hard cap: 2000 lines (unchanged)
  • Updated the nearby comment and log echo to stay accurate

Fix 3 — Better exit-124 diagnostics (review-pr/action.yml)

When the agent timed out, the PR received a generic "PR Review Failed" comment and an unhelpful step summary. Users had no idea why it failed or what to do.

Changes:

  • Add a dedicated elif [ "$EXIT_CODE" = "124" ] branch before the generic non-zero exit handler
  • Post an actionable PR comment explaining the timeout and suggesting /review retry or splitting the PR
  • Add a TIMEOUT_NOTE (exit code 124, 1800 s limit, verbose log artifact reference) printed in the step summary
  • All other non-zero exit code behaviour is unchanged

Testing

  • pnpm build ✅ (all bundles compiled successfully)
  • pnpm lint — Biome and tsc pass; one pre-existing actionlint warning about node24 runner in test-e2e.yml (existed on main before this PR)
  • pnpm test — pre-existing DNS sandbox failure (unrelated to these changes, also reproducible on main)

Three targeted fixes for the runaway-drafter timeout seen in docker/sailor PR #754
(https://github.com/docker/sailor/actions/runs/25577728034/job/75088929732).

## Fix 1 — Schema mismatch in pr-review.yaml (root cause of infinite loop)

The root agent's transfer_task delegation message was not explicitly documenting
the drafter's enforced structured_output schema field names. When the LLM generated
the task string it used 'title'/'body' (natural GitHub review terminology) instead
of the schema-required 'category'/'issue'/'details'/'in_diff'. Every emit attempt
was rejected by schema validation, causing the drafter to loop indefinitely reading
the same source files ~140 times each until SIGKILL at 1800 s.

Changes:
- Add a CRITICAL block in the delegation section documenting the exact drafter
  response schema with explicit warning never to use title/body
- Update step 4d (aggregate) to reference the correct field names
- Update step 5 (parse drafter response) to name 'issue' and 'details' explicitly
- Update step 9 (build comments) to list the correct finding fields

## Fix 2 — Lower soft chunk target 1000 → 600 lines (review-pr/action.yml)

Chunk 1 of sailor PR #754 was 1229 lines of dense Rust snapshot/restore code —
at the edge of what a drafter can process in one turn, amplifying the schema-loop
impact. Lowering the soft target produces smaller, more manageable chunks.

The 2000-line hard cap is unchanged.

## Fix 3 — Better exit-124 diagnostics (review-pr/action.yml)

When the agent timed out the PR received a generic 'PR Review Failed' comment
and an unhelpful step summary. Users had no idea why or what to do next.

Changes:
- Add a dedicated elif branch for EXIT_CODE=124 before the generic non-zero handler
- Post an actionable PR comment explaining the timeout and suggesting /review retry
  or splitting the PR into smaller pieces
- Add a TIMEOUT_NOTE variable with exit code, 1800 s limit, and verbose log artifact
  reference, printed in the step summary after the STATUS line
Copy link
Copy Markdown
Contributor

@docker-agent docker-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟢 APPROVE

All three fixes look correct and address the described root causes. The schema documentation addition in pr-review.yaml is accurate and will prevent the drafter loop. The 1000→600 line chunk-size reduction is straightforward. The timeout branch logic is sound. One minor cosmetic issue noted inline.

Comment thread review-pr/action.yml Outdated
Inside a double-quoted bash string, \' is not a valid escape sequence and
produces a literal backslash. The step summary was rendering
  (\'some-log-file.txt\')
instead of the intended
  (some-log-file.txt)

Drop the attempted single-quote wrapping and write () directly.
@derekmisler derekmisler enabled auto-merge (squash) May 9, 2026 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants