Skip to content

perf(redis_admin): drop clear-by-filter preview load; reuse chart numbers#909

Open
thomasrockhu-codecov wants to merge 5 commits intomainfrom
redis-admin/clear-by-filter-skip-preview
Open

perf(redis_admin): drop clear-by-filter preview load; reuse chart numbers#909
thomasrockhu-codecov wants to merge 5 commits intomainfrom
redis-admin/clear-by-filter-skip-preview

Conversation

@thomasrockhu-codecov
Copy link
Copy Markdown
Contributor

@thomasrockhu-codecov thomasrockhu-codecov commented May 1, 2026

Why

The lazy preview added in #907 still costs seconds because the
count requires a streaming LRANGE scan of the full queue —
the very work the preview was supposed to defer. On the
deepest queues (200k–500k+ messages) the operator-facing
confirmation page can take 5–15 seconds to first byte.

But the operator already saw the bucket count and queue depth
on the chart that linked them here. The chart's per-row "Clear
queue" button is the only way to land on this page in normal
use, and the chart computed both numbers to render its own
row. So the confirmation page can extrapolate an "approximately
N messages (~Y% of queue)" callout from those passed-through
hints — no further Redis call required, and no async polling /
skeleton placeholder either.

What

  • Chart form (_frequency_chart.html): per-row "Clear
    queue" form now carries bucket_count, bucket_pct,
    total_visible, and total_depth as hidden inputs. Method
    is GET since these are non-secret display hints.
  • clear_by_filter_view GET: rip out all lazy-preview
    wiring (no LLEN, no streaming_celery_count, no preview
    cache, no background job, no sample materialization). Read
    the four hint params via new _coerce_int / _coerce_float
    helpers (so a hand-edited URL never 500s) and render the
    confirmation page synchronously.
  • clear_by_filter_view POST: all three actions
    (dry_run, clear_keep_one, clear_all) now uniformly
    spawn the chunked background-job worker and 302 to the
    progress page. The synchronous-dry_run-rerender shape is
    gone — the chunked worker walks the queue and computes the
    exact match count as it scans, regardless of whether the
    pass tombstones anything. Audit log lands at job completion
    under mode=chunked-dry-run for the dry-run case.
  • clear_by_filter.html: skeleton placeholders, JS
    include, and CSS include all deleted. Replaced with a static
    filter fieldset (queue, task, repo with admin link, commit
    with full-SHA tooltip) and an "Approximate impact"
    fieldset showing either the extrapolated count or
    "Approximate count not available — the chunked clear will
    compute the exact match count as it scans." Typed-
    confirmation pattern + three submits unchanged.
  • services.streaming_celery_count: deleted (only caller
    was the deleted preview surface; chunked worker uses
    _streaming_celery_clear directly with dry_run=True).
  • README: clear-by-filter/ URLs row + "chart-driven
    targeted clear" section rewritten to document the chart-hint
    passthrough, the synchronous-no-Redis-I/O GET, and the
    uniform 302-to-progress shape across all three actions.
  • Note on PR perf(redis_admin): lazy clear-by-filter preview (count off the request thread) #907: that PR's lazy-preview infrastructure
    (preview/skip-count routes, JS, CSS, cache helpers, the two
    REDIS_ADMIN_CLEAR_BY_FILTER_PREVIEW_* settings) was never
    merged to main, so there was nothing on the file system to
    delete beyond streaming_celery_count. If perf(redis_admin): lazy clear-by-filter preview (count off the request thread) #907 lands first,
    this PR will need a follow-up to drop those leftovers; until
    then the surface is already clean.

Approximate-count display

Rendered inline (no JS) when all four hints are present and
total_visible > 0:

Will tombstone approximately 78,500 messages (~37% of
queue at depth 211,052)

Falls back to:

Approximate count not available — the chunked clear will
compute the exact match count as it scans.

…when any hint is missing / non-numeric / divides by zero.

Tests

CeleryBrokerClearByFilterViewTest rewritten as a contract
suite over the new view shape:

  • GET with chart hints renders the approximate-count callout
    with the right numbers and makes zero Redis calls
    (asserted via a MagicMock proxy on
    redis_admin_conn.get_connection whose method_calls must
    be empty after the request).
  • GET without hints / with malformed hints / with zero
    total_visible falls back to "Approximate count not
    available", still no Redis calls.
  • GET resolves the core.Repository admin link when repoid
    is set.
  • POST clear_all / clear_keep_one / dry_run (with valid
    typed-confirm where required) each 302 to
    …/clear-by-filter/job/<uuid>/; start_celery_broker_clear_job
    is invoked once with the right dry_run / keep_one flags.
  • Typed-confirm gate still rejects destructive POSTs with a
    200 + error message.
  • Legacy action=confirm form shape still aliases through to
    the chunked-job path.
  • Refusal of unscoped clears + superuser gating preserved.
  • test_clear_by_filter_view_dry_run_preview_unchanged in
    test_celery_broker_clear_job.py renamed to
    …_dry_run_redirects_to_progress_page and updated to assert
    the new shape.

End-to-end exercise of the chunked worker (queue actually
drained, audit-log shape, keep_one survivor placement) lives
in test_celery_broker_clear_job.py (existing).

Migration note

This PR removes URL routes that #907 would have introduced
(clear-by-filter/preview/, clear-by-filter/skip-count/)
but those routes were never merged to main, so there's
nothing externally linkable to break. The clear-by-filter/
route itself is unchanged.

If #907 merges first, rebase this PR on top of it; the
leftover preview helpers / settings / static files / preview
test file that #907 introduces should then be deleted as part
of the rebase.

Test plan

  • uv run ruff check redis_admin/
  • uv run ruff format --check redis_admin/
  • pytest -q apps/codecov-api/redis_admin/tests/ (294
    passed, run inside the api container which provides
    Postgres for the Django ORM tests)
  • Manual smoke once deployed: open the celery_broker
    drill-down on a deep queue, click "Clear queue…" on a
    bucket, confirm the page renders sub-second with the
    approximate-count callout.

Made with Cursor


Note

Medium Risk
Changes the celery broker admin clear-by-filter flow and how destructive/dry-run actions are executed (now always asynchronous), which could affect operator workflows and queue-clearing behavior if the new job-spawn path misfires. Risk is mitigated by preserving typed-confirmation gates, legacy action=confirm compatibility, and expanded tests around zero-Redis GET and redirects.

Overview
Speeds up the celery broker clear-by-filter operator flow by making the confirmation page fully synchronous with zero Redis I/O, reusing count/depth hints passed from the frequency chart to show an approximate impact instead of doing a live scan.

Updates the chart’s per-bucket Clear queue… action to GET the confirmation URL with bucket_count/bucket_pct/total_visible/total_depth hints, and simplifies the confirmation template to a static filter summary (including a repo admin link) plus the approximate-impact callout.

Unifies POST handling so all three actions (dry_run, clear_keep_one, clear_all) now spawn the existing chunked background clear job and redirect to the progress page; removes the now-unused streaming_celery_count helper and updates docs/tests accordingly.

Reviewed by Cursor Bugbot for commit 25d9949. Bugbot is set up for automated code reviews on this repo. Configure here.

…bers

Render the clear-by-filter confirmation page synchronously from the
display hints the frequency chart already has (bucket count, bucket
share, sampled-window total, queue depth) instead of doing a streaming
LRANGE scan on the request thread. Operators confirm based on the
identity-defining (queue, task, repoid, commit) filter, not on a
precise count, so the approximation "will tombstone ~X (~Y% of queue
at depth Z)" is sufficient for the human in the loop. The chunked
background-job worker that ALL three actions (dry-run / keep-one /
clear-all) fan out to still computes the exact match count as it
walks, so the audit log entry stays precise.

Why: the previous synchronous streaming-count walk held the gunicorn
worker for many seconds per click on a 200-500k-deep queue, and a
separate lazy-preview attempt (PR #907) still costs network + cache
round-trips per render. The chart already has these numbers — we just
needed to pass them through.

Chart form (`_frequency_chart.html`):

- Switched from POST to GET (the four `bucket_*` / `total_*` hints
  are display-only, non-secret, idempotent, bookmarkable; CSRF is
  unnecessary).
- Added four hidden inputs: `bucket_count`, `bucket_pct`,
  `total_depth`, `total_visible`.

Confirmation view (`clear_by_filter_view`):

- GET path now does ZERO Redis I/O. Removed the queryset materialise
  + `_CLEAR_BY_FILTER_SAMPLE_SIZE` slice, the `streaming_celery_count`
  call, and the synthetic-target `_substitute_filter_any` wiring (the
  view never calls `celery_broker_clear` directly anymore so the
  synthetic target is moot — the chunked job builds its own filter
  from the URL params).
- Added `_coerce_int` / `_coerce_float` helpers for graceful URL-hint
  parsing; missing / malformed hints fall back to "Approximate count
  not available" rather than 500.
- All three POST actions (`dry_run`, `clear_keep_one`, `clear_all`)
  now uniformly fan out to `start_celery_broker_clear_job` and 302
  to the progress page. Dry-run no longer keeps the synchronous
  shape — the chunked worker reports the exact count there too.

Confirmation template (`clear_by_filter.html`):

- Replaced the "Matched: N message(s)" header + sample-targets table
  with a static filter-rows fieldset (queue / task / repo-with-link /
  truncated-commit-with-tooltip) and an approximate-count callout.
- Callout shape: "Will tombstone approximately X messages (~Y% of
  the queue at depth ~Z)" when all four hints are present, or
  "Approximate count not available — the chunked clear will compute
  the exact match count as it scans" otherwise.
- Kept the typed-confirmation pattern and the three submit actions
  exactly as-is; removed the conditional rendering that hid
  `clear_keep_one` for sub-2 buckets (we no longer know the count
  on render).

Migration note: the chart-form method change (POST → GET) means
operators with stale tabs may submit one final POST after this
deploys. The view continues to read scope params from POST when
present so those POSTs still work.
`streaming_celery_count` was a thin dry-run wrapper over
`_streaming_celery_clear` whose only caller was the synchronous
preview path in `clear_by_filter_view`. With the preview now
rendered from chart-supplied display hints (no Redis I/O at all),
the helper is dead code: the chunked clear job calls
`_streaming_celery_clear` directly with `dry_run=True` for its
own count pass, and per-message clear paths use materialised
`CeleryBrokerQueue` rows instead of a streaming triple.

Also tightens the `_FilterWildcard` / `_substitute_filter_any`
docstrings to reflect the single remaining call site
(`_run_celery_broker_clear_job_body`).
Replaces the destructive-action assertions in
`CeleryBrokerClearByFilterViewTest` with a focused contract suite
that pins the new view's behaviour:

* GET with chart hints renders the approximate-count callout and
  makes ZERO Redis calls (asserted via a `MagicMock` proxy on
  `redis_admin_conn.get_connection` whose `method_calls` must be
  empty after the request).
* GET without hints renders "Approximate count not available",
  still no Redis calls.
* GET with malformed hints (non-numeric, divide-by-zero) coerces to
  `None` via `_coerce_int` / `_coerce_float` and falls back to the
  "not available" branch instead of raising.
* GET resolves the `core.Repository` admin link when `repoid` is
  set so the chart's repo cell and the confirmation page link
  round-trip to the same URL.
* All three POST actions (`clear_all`, `clear_keep_one`, `dry_run`)
  return 302 to `…/clear-by-filter/job/<uuid>/` (the chunked
  background-job worker handles the actual clearing).
* Typed-confirm gate still applies to both destructive actions.
* Legacy `action=confirm` shim still 302s to the progress page.
* Refusal of unscoped clears keeps the per-bucket flow non-
  overlapping with `clear_by_scope`.
* Superuser gating preserved.

End-to-end exercise of the chunked worker (queue actually drained,
audit-log shape, `keep_one` survivor placement, etc.) lives in
`test_celery_broker_clear_job.py`. The destructive-mutation tests
that used to live on `CeleryBrokerClearByFilterViewTest` were
racing the chunked worker thread anyway -- this PR resolves that
race by shifting their coverage to the dedicated chunked-job
test file (where the existing `_wait_for_job` helper waits for
worker completion).

Also drops the two `streaming_celery_count` tests
(`test_streaming_celery_count_returns_full_queue_match_count`,
`test_streaming_celery_count_wildcard_task_name_matches_any_task`)
because the helper itself is gone -- the chunked-job worker calls
`_streaming_celery_clear` directly, and the underlying wildcard
semantic is already covered by
`test_celery_broker_clear_synthetic_target_with_wildcard_task_name_drains_all_matches`.

`test_clear_by_filter_view_dry_run_preview_unchanged` in
`test_celery_broker_clear_job.py` updated to
`test_clear_by_filter_view_dry_run_redirects_to_progress_page`:
asserts the new 302-to-progress shape and that
`start_celery_broker_clear_job` is invoked with `dry_run=True`.
Updates the URLs table row for `clear-by-filter/` and the
"chart-driven targeted clear" section to reflect that the
confirmation page now:

* Renders fully synchronously with **zero Redis I/O** on GET.
* Receives `bucket_count` / `bucket_pct` / `total_visible` /
  `total_depth` from the chart's per-row form, extrapolates
  `count ≈ bucket_count / total_visible * total_depth`, and
  surfaces a "approximately N (~Y% of queue at depth Z)"
  callout.
* Falls back to "Approximate count not available" when the
  hints are absent or malformed (hand-crafted URL).
* Spawns the chunked worker for **all three** POST actions
  (`dry_run` included) -- the destructive 200-render shape is
  gone; everything 302s to the progress page.

Also amends the per-bucket "Clear queue…" bullet under
"Frequency chart" to mention the new hint passthrough so the
docs explain why the page renders instantly even when the
queue is hundreds of thousands deep.
Two trivial whitespace nits surfaced by `ruff format --check`:
the helper-block separator after `_coerce_float` and an inlined
`reverse(...)` call in `clear_by_filter_view`. No semantic
change.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 25d9949. Configure here.

when match_count >= 2. Clears every match EXCEPT the
lowest-index one; "drop duplicate retries but leave one
running."
* `clear_all` — destructive (red text). Clears every match.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

POST form drops chart hints on confirmation failure

Medium Severity

The POST form carries queue_name, task_name, repoid, and commitid as hidden inputs but omits the four chart-hint fields (bucket_count, bucket_pct, total_visible, total_depth). When a destructive POST fails typed confirmation, params is request.POST, so _coerce_int(params.get("bucket_count")) returns None for all four hints. The approx dict becomes None and the template falls back to "Approximate count not available" — even though the operator just saw the approximate count on the initial GET. The operator loses impact context at exactly the moment they need to re-confirm.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 25d9949. Configure here.

@sentry
Copy link
Copy Markdown
Contributor

sentry Bot commented May 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.90%. Comparing base (6fb832f) to head (25d9949).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #909   +/-   ##
=======================================
  Coverage   91.89%   91.90%           
=======================================
  Files        1316     1316           
  Lines       50586    50576   -10     
  Branches     1625     1625           
=======================================
- Hits        46485    46480    -5     
+ Misses       3795     3790    -5     
  Partials      306      306           
Flag Coverage Δ
apiunit 94.92% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@codecov-notifications
Copy link
Copy Markdown

codecov-notifications Bot commented May 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

thomasrockhu-codecov added a commit that referenced this pull request May 2, 2026
…t numbers)

Conflict resolution notes:
- services.py docstrings for `_substitute_filter_any` and the
  chunked clear job: combined #908's "dry-run vs live-clear must
  agree" framing with #909's more general "rule survives a future
  field/truthiness change" framing. Both are accurate for the
  post-merge call sites; the merged wording covers both modes.
- test_celery_broker_queue.py: took #909's structure (no preview
  test class) wholesale because the preview surface is gone.
  #908's `_post_clear` thread-join helper is unnecessary in the
  view tests because the post-#909 view tests only assert on the
  302 redirect URL, not on post-worker queue state. End-to-end
  drain assertions live in `test_celery_broker_clear_job.py`,
  which #908 already wired up with a thread-join helper that
  survives this merge.
@thomasrockhu-codecov
Copy link
Copy Markdown
Contributor Author

Folded into #905; keeping this open as the per-feature review record.

thomasrockhu-codecov added a commit that referenced this pull request May 2, 2026
… maths

Two CI failures on the consolidated branch HEAD:

1. `test_streaming_clear_verify_before_lset_skips_drifted_entry`
   was a left-over from PR #899 that asserted the verify-before-
   LSET guard skipped drifted slots. PR #908 deletes that guard
   in `_streaming_celery_clear` (the whole point of the
   restoration), so the test now sees `total_lset == 1` for a
   drifted slot and fails. The merge picked up #909's copy of
   the test file (it was the conflict-resolution baseline) which
   re-introduced the test. Apply #908's deletion verbatim and
   keep its replacement comment in place.

2. `test_typed_confirm_failure_preserves_chart_hint_callout`
   asserted `round(78500 / 20000 * 211052) == 828381`. The
   correct value is 828379 (the docstring arithmetic was off-
   by-two). The test logic and the production behaviour are
   both correct; the expected literal is what was wrong.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants