Skip to content

fix: add rate-limit retry to PR creation and fallback issue paths#31244

Merged
pelikhan merged 6 commits intomainfrom
copilot/fix-github-app-rate-limit-exhaustion
May 9, 2026
Merged

fix: add rate-limit retry to PR creation and fallback issue paths#31244
pelikhan merged 6 commits intomainfrom
copilot/fix-github-app-rate-limit-exhaustion

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 9, 2026

When the GitHub App installation token quota hits 0/15,000, safe_outputs was silently dropping agent work: pulls.create() had no retry logic, and when it failed, the fallback issues.create() also failed immediately — no backoff, no recovery.

Changes

  • create_pull_request.cjs

    • Wrap githubClient.rest.pulls.create() with withRetry(RATE_LIMIT_RETRY_CONFIG) — 5 retries, exponential backoff (30s→60s→120s→240s→240s), honors X-RateLimit-Reset header
    • Wrap createFallbackIssue() with the same retry config internally (covering rate-limit and other transient errors), so all call sites (push-failed, permission-denied, general PR failure) are covered without repetition
    • When a 422 assignee error forces a retry without assignees, the stripped assignee names are persisted across all subsequent retry attempts and a [!NOTE] callout is appended to the fallback issue body so users know which assignees could not be set
  • push_to_pull_request_branch.cjs

    • Wrap manifest-protection fallback issues.create() with withRetry(RATE_LIMIT_RETRY_CONFIG)
  • create_pull_request.test.cjs

    • Four new tests: PR creation retry succeeds, PR exhausts retries and falls back to issue, fallback issue creation retries on rate limit, fallback issue body contains assignee-removed note on 422 error
// Before: direct call, no retry
const { data: pullRequest } = await githubClient.rest.pulls.create({ ... });

// After: rate-limit aware retry with exponential backoff
const { data: pullRequest } = await withRetry(
  () => githubClient.rest.pulls.create({ ... }),
  RATE_LIMIT_RETRY_CONFIG,
  `create pull request in ${repoParts.owner}/${repoParts.repo}`
);

RATE_LIMIT_RETRY_CONFIG was already used by create_issue.cjs; this brings the PR creation path to parity.

[!WARNING]

Copilot AI and others added 2 commits May 9, 2026 17:08
Wrap githubClient.rest.pulls.create() and createFallbackIssue() in
create_pull_request.cjs with RATE_LIMIT_RETRY_CONFIG (5 retries,
exponential backoff up to 4 min, X-RateLimit-Reset honored).

Wrap manifest-protection githubClient.rest.issues.create() in
push_to_pull_request_branch.cjs with the same retry config.

Add 3 vitest tests covering:
- PR creation retries on rate-limit and succeeds
- PR creation falls back to issue after all rate-limit retries exhausted
- Fallback issue creation retries on rate-limit and succeeds

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/03801e76-0170-4920-a610-9833e75261f3

Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix GitHub App installation rate-limit exhaustion issues fix: add rate-limit retry to PR creation and fallback issue paths May 9, 2026
Copilot AI requested a review from gh-aw-bot May 9, 2026 17:13
@pelikhan pelikhan marked this pull request as ready for review May 9, 2026 17:21
Copilot AI review requested due to automatic review settings May 9, 2026 17:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds rate-limit-aware retry/backoff to GitHub PR/issue creation paths so safe-outputs work isn’t silently dropped when the GitHub App installation token quota is exhausted.

Changes:

  • Wrap pulls.create() in create_pull_request.cjs with withRetry(RATE_LIMIT_RETRY_CONFIG) to handle rate-limit/transient failures with exponential backoff.
  • Wrap fallback issue creation paths with the same retry config (including manifest-protection fallback issue creation in push_to_pull_request_branch.cjs).
  • Add Vitest coverage for PR creation retry success, retry exhaustion with fallback-to-issue, and retrying fallback issue creation.
Show a summary per file
File Description
actions/setup/js/create_pull_request.cjs Adds rate-limit retry to PR creation and wraps fallback issue creation with the shared retry config.
actions/setup/js/push_to_pull_request_branch.cjs Adds rate-limit retry to manifest-protection fallback issue creation.
actions/setup/js/create_pull_request.test.cjs Adds tests covering PR create retry behavior and fallback issue retry behavior.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 3/3 changed files
  • Comments generated: 2

Comment on lines +198 to +202
* Creates a fallback GitHub issue, retrying on rate-limit errors (with exponential back-off)
* and retrying without assignees if the API rejects them.
* This ensures fallback issue creation remains reliable even if an assignee username
* is invalid or the repository does not have that collaborator.
* is invalid, the repository does not have that collaborator, or the installation token
* quota is temporarily exhausted.
Comment on lines +229 to +233
if (isAssigneeError && assignees && assignees.length > 0) {
core.warning(`Fallback issue creation failed due to assignee error, retrying without assignees: ${getErrorMessage(error)}`);
const { assignees: _removed, ...payloadWithoutAssignees } = payload;
return await githubClient.rest.issues.create(payloadWithoutAssignees);
}
@github-actions github-actions Bot mentioned this pull request May 9, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnose and /tdd — this is a targeted bug fix for silent quota-exhaustion failures, with accompanying regression tests.

Key Themes

  • Test → implementation coupling: The toHaveBeenCalledTimes(6) assertion in the fallback test hardcodes a magic number derived from RATE_LIMIT_RETRY_CONFIG.maxRetries. Importing the constant removes the coupling and keeps the test self-consistent when the config is tuned.
  • Retry callback scope: In createFallbackIssue, the assignee-fallback logic lives inside the withRetry callback. If the assignee-less call hits a rate-limit, withRetry restarts from the top — re-attempting with invalid assignees before recovering again. Functionally correct, but wastes one call per retry under quota pressure.
  • Missing test coverage in push_to_pull_request_branch.cjs: The manifest-protection issue path now has retry wiring, but no test exercises that path under a rate-limit scenario.

Positive Highlights

  • ✅ Clean, focused change — only the two write paths that were previously unprotected are touched
  • ✅ Descriptive context strings passed to withRetry make warnings easy to trace in logs
  • ✅ Three well-structured test scenarios with vi.useFakeTimers() correctly handling async retry delays
  • ✅ Existing assignee-retry behaviour is preserved correctly inside the new withRetry wrapper

Verdict

Observations only — no blocking issues. The fix addresses the real root cause and the test suite gives confidence in the happy and sad paths.

Note

🔒 Integrity filter blocked 1 item

The following item was blocked because it doesn't meet the GitHub integrity level.

  • #31244 pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 5.9M

expect(result.fallback_used).toBe(true);
expect(result.issue_number).toBe(99);
// 1 initial + 5 retries = 6 total PR creation attempts (RATE_LIMIT_RETRY_CONFIG.maxRetries = 5)
expect(global.github.rest.pulls.create).toHaveBeenCalledTimes(6);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] This assertion hardcodes 6 (1 initial + 5 retries), coupling the test to RATE_LIMIT_RETRY_CONFIG.maxRetries. If the retry config changes, this test fails with a cryptic number mismatch.

Consider importing the constant:

const { RATE_LIMIT_RETRY_CONFIG } = require('./error_recovery.cjs');
// ...
expect(global.github.rest.pulls.create).toHaveBeenCalledTimes(RATE_LIMIT_RETRY_CONFIG.maxRetries + 1);

This makes the test self-documenting and resilient to config tuning.

if (isAssigneeError && assignees && assignees.length > 0) {
core.warning(`Fallback issue creation failed due to assignee error, retrying without assignees: ${getErrorMessage(error)}`);
const { assignees: _removed, ...payloadWithoutAssignees } = payload;
return await githubClient.rest.issues.create(payloadWithoutAssignees);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] If the assignee-less issues.create call on this line throws a rate-limit error, withRetry will restart the entire callback — including the first issues.create(payload) call with assignees. The retry cycle will hit the 422 assignee error again, recover inline, then make a fresh attempt without assignees. This is correct but wastes an extra API call on every rate-limit retry.

If reducing call count under quota pressure matters, sanitize assignees before entering withRetry so all retries go directly to the clean payload:

const sanitizedAssignees = sanitizeFallbackAssignees(assignees);
const payload = { ..., ...(sanitizedAssignees.length > 0 && { assignees: sanitizedAssignees }) };
return withRetry(
  () => githubClient.rest.issues.create(payload),
  RATE_LIMIT_RETRY_CONFIG,
  `create fallback issue in ${repoParts.owner}/${repoParts.repo}`
);

Not blocking — current behaviour is functionally correct.

body: issueBody,
labels: ["agentic-workflows"],
});
const { data: issue } = await withRetry(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The manifest-protection issue creation path now uses withRetry, but there are no new tests covering this code path's rate-limit behaviour. The only rate-limit tests are in create_pull_request.test.cjs.

A test for push_to_pull_request_branch.cjs that simulates a rate-limit error on the manifest-protection issues.create call would confirm this retry path works end-to-end and protect against regressions if the retry wiring is ever accidentally removed.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

🧪 Test Quality Sentinel Report

Test Quality Score: 80/100

Excellent — all new tests verify behavioral contracts

Metric Value
New/modified tests analyzed 3
✅ Design tests (behavioral contracts) 3 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 3 (100%)
Duplicate test clusters 0
Test inflation detected ⚠️ Yes (175 test lines / 51 production lines ≈ 3.4:1)
🚨 Coding-guideline violations None

Test Classification Details

Test File Classification Notes
should retry PR creation on rate limit error and succeed actions/setup/js/create_pull_request.test.cjs ✅ Design Verifies retry succeeds: result.success, pull_request_number, call count=2
should fall back to issue when PR creation fails after all rate-limit retries actions/setup/js/create_pull_request.test.cjs ✅ Design Verifies fallback contract: fallback_used, issue_number, exhausted retries=6
should retry fallback issue creation on rate limit error and succeed actions/setup/js/create_pull_request.test.cjs ✅ Design Verifies fallback path also retries on rate-limit: issue_number, call count=2

Flagged Tests — Requires Review

No tests are flagged for behavioral quality issues. All three tests assert on observable outputs (success flag, PR/issue number, fallback indicator) and exercise genuine error paths (rate-limit errors via mock chaining).

i️ Test Inflation Note

175 new test lines were added against 51 new production lines (≈3.4:1 ratio, threshold 2:1). This is expected here: the tests require significant boilerplate (global.github, global.core, global.exec mock setup, fake-timer scaffolding, beforeEach/afterEach teardown) that dwarfs the production change. The underlying tests are high-quality and the inflate is from necessary test infrastructure, not padding.

i️ Call-Count Assertions

All three tests use toHaveBeenCalledTimes(N) on external GitHub API mocks (pulls.create, issues.create). In a retry-logic context, call count is part of the behavioral contract — it verifies the system retried exactly the expected number of times rather than failing fast or retrying indefinitely. This is not an implementation-detail assertion.


Language Support

Tests analyzed:

  • 🐹 Go (*_test.go): 0 tests — no Go test files changed in this PR
  • 🟨 JavaScript (*.test.cjs): 3 tests (vitest)

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). All three new tests verify the behavioral contract of the rate-limit retry and fallback logic with genuine error-path coverage.


📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §25607131257

🧪 Test quality analysis by Test Quality Sentinel · ● 9.4M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 80/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 3 tests verify behavioral contracts for rate-limit retry and fallback logic with genuine error-path coverage.

@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented May 9, 2026

@copilot review comments

… JSDoc

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 9, 2026

@copilot review comments

Both review comments have been addressed in commit fix: persist stripped assignees across withRetry attempts and clarify JSDoc:

  1. JSDoc accuracy (198-202): Updated the comment from "retrying on rate-limit errors" to "retrying on rate-limit and other transient errors" to accurately reflect that RATE_LIMIT_RETRY_CONFIG.shouldRetry = isTransientError also covers 5xx responses and network timeouts.

  2. Assignee payload persistence (229-233): Replaced the local payloadWithoutAssignees copy with delete payload.assignees so the mutation persists across all subsequent withRetry attempts. Previously, each retry would start fresh with the original payload (still containing assignees), causing an extra 422 on every retry loop.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI requested a review from pelikhan May 9, 2026 19:11
@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented May 9, 2026

@copilot insert a note in the issue that the assignees were removed when removing assignee

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 9, 2026

@copilot insert a note in the issue that the assignees were removed when removing assignee

Done in the latest commit. When a 422 assignee error triggers the retry-without-assignees path, the issue body now gets a [!NOTE] callout listing the removed assignees:

> [!NOTE]
> Assignees (user1, user2) could not be set on this issue due to an API error.

A new test covers this path and verifies the note appears in the second issues.create call.

@pelikhan pelikhan merged commit b2d754f into main May 9, 2026
@pelikhan pelikhan deleted the copilot/fix-github-app-rate-limit-exhaustion branch May 9, 2026 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[aw-failures] P0 recurrence: GitHub App installation rate-limit exhaustion blocks safe_outputs (2026-05-08 ~16:46–17:04 UTC)

4 participants