feat(router): allow fallback cdn url to be specified#2807
feat(router): allow fallback cdn url to be specified#2807
Conversation
WalkthroughAdds CDN fallback support: config and schema gain fallback fields; a new IsCDNFallbackEligible helper decides when to retry; CDN clients (cache warmup, persisted operations, PQL manifest, router config) accept fallback endpoints, centralize HTTP helpers, and conditionally retry against fallbacks. Tests added for all paths. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies" Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
Router-nonroot image scan passed✅ No security vulnerabilities found in image: |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2807 +/- ##
===========================================
+ Coverage 41.70% 59.61% +17.90%
===========================================
Files 791 239 -552
Lines 113325 26001 -87324
Branches 8768 0 -8768
===========================================
- Hits 47266 15501 -31765
+ Misses 65695 9003 -56692
- Partials 364 1497 +1133
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
router/core/cache_warmup_cdn.go (1)
77-92: Avoid treating a fallback miss as a successful warmup no-op.With a primary 503 and a fallback 404,
fetchOperationsJSONreturnsbody == nil, err == nil, soLoadItemsexits with(nil, nil). That hides the outage and silently skips warmup when the backup is stale. Prefer only accepting the fallback when it actually returns a body; otherwise keep the primary failure semantics.🔧 Suggested guard
- if err != nil && c.cdnFallbackURL != nil && httpclient.IsCDNFallbackEligible(resp, err) { + if err != nil && c.cdnFallbackURL != nil && httpclient.IsCDNFallbackEligible(resp, err) { + primaryErr := err log.Warn("Primary CDN failed, attempting fallback CDN", zap.Error(err), zap.String("fallback_url", c.cdnFallbackURL.String()), ) span.AddEvent("cdn.fallback", trace.WithAttributes( semconv.HTTPURL(c.cdnFallbackURL.String()), )) _, body, err = c.fetchOperationsJSON(ctx, log, c.cdnFallbackURL) + if err != nil { + return nil, fmt.Errorf("primary CDN failed: %w; fallback CDN failed: %v", primaryErr, err) + } + if body == nil { + return nil, fmt.Errorf("primary CDN failed: %w; fallback CDN returned no cache warmup config", primaryErr) + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@router/core/cache_warmup_cdn.go` around lines 77 - 92, The current logic accepts a fallback response that returns body == nil && err == nil (e.g., primary 503 and fallback 404) and causes LoadItems to return (nil, nil); instead, change the fallback handling in the block around fetchOperationsJSON so that if the fallback fetch returns a nil body (even with nil err) you do not treat it as a successful no-op — preserve the primary failure semantics by returning the original primary error (or a new error indicating fallback miss) from LoadItems; update the code that calls c.fetchOperationsJSON (referencing c.cdnFallbackURL, fetchOperationsJSON, resp, err and LoadItems) to check for body == nil after the fallback call and return the primary err (or an explicit fallback-miss error) instead of proceeding to the nil success path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@router/internal/persistedoperation/operationstorage/cdn/client.go`:
- Around line 111-126: The current logic replaces the original primary CDN error
with the fallback result even when the fallback returns a 404 (or other error),
causing transient primary failures to be masked as
PersistentOperationNotFoundError; change the flow in the block that calls
cdn.doPersistedOperation so you first save the primary error/resp (e.g.,
primaryErr := err, primaryResp := resp), call the fallback only when
cdn.cdnFallbackURL != nil && httpclient.IsCDNFallbackEligible(resp, err), then
inspect the fallback outcome and only assign err/resp/body from
cdn.doPersistedOperation if the fallback call succeeded; if the fallback returns
an error (including PersistentOperationNotFoundError) leave the original
primaryErr as the returned error but augment the returned error or log with
fallback context (fallback URL and fallback error) for observability; update the
function around cdn.cdnFallbackURL, httpclient.IsCDNFallbackEligible,
cdn.doPersistedOperation and add a regression test that simulates primary 503
and fallback 404 to assert the original primary error is returned (not a
persisted-operation miss).
In `@router/internal/persistedoperation/pqlmanifest/fetcher_test.go`:
- Around line 272-292: The test races on the plain bool fallbackCalled because
the HTTP handler goroutine writes it while the test goroutine reads it; change
fallbackCalled to a concurrency-safe flag (e.g. declare var fallbackCalled
atomic.Bool and import "sync/atomic") and in the fallback server handler call
fallbackCalled.Store(true) and in assertions use require.False(t,
fallbackCalled.Load()), and apply the same change to the other test at the noted
range (lines 295-315) so both handlers and assertions synchronize correctly.
In `@router/internal/persistedoperation/pqlmanifest/fetcher.go`:
- Around line 79-85: The current logic replaces the primary fetch error with the
fallback result even when the fallback fails; update the block that calls
f.doFetch(ctx, currentRevision, f.cdnFallbackURL) so that you only replace
resp/body/err with the fallback outcome if the fallback succeeded (e.g., HTTP
success or nil err); if the fallback returns an error or non-success status keep
the original primary err/resp/body and wrap or annotate that error with fallback
context (include f.cdnFallbackURL and the fallback error/status in the message)
so the primary failure (e.g., 503) is preserved; modify the handling around
httpclient.IsCDNFallbackEligible, f.doFetch, and the err/resp variables
accordingly and add a regression test exercising primary 503 followed by
fallback 404/401 to assert the returned error remains the primary outage
annotated with fallback info.
In `@router/pkg/config/config.go`:
- Around line 668-671: The schema validation fails for the new FallbackURL
fields—update config.schema.json so the top-level "cdn" object and the
"storage_providers.cdn" item schema include "fallback_url" as an allowed
property; specifically add "fallback_url" (string/uri as appropriate) to the
properties for the CDN schema entries that currently describe CDNConfiguration
and CDNStorageProvider so LoadConfig validation will accept cdn.fallback_url and
storage_providers.cdn[].fallback_url before unmarshalling.
---
Nitpick comments:
In `@router/core/cache_warmup_cdn.go`:
- Around line 77-92: The current logic accepts a fallback response that returns
body == nil && err == nil (e.g., primary 503 and fallback 404) and causes
LoadItems to return (nil, nil); instead, change the fallback handling in the
block around fetchOperationsJSON so that if the fallback fetch returns a nil
body (even with nil err) you do not treat it as a successful no-op — preserve
the primary failure semantics by returning the original primary error (or a new
error indicating fallback miss) from LoadItems; update the code that calls
c.fetchOperationsJSON (referencing c.cdnFallbackURL, fetchOperationsJSON, resp,
err and LoadItems) to check for body == nil after the fallback call and return
the primary err (or an explicit fallback-miss error) instead of proceeding to
the nil success path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 96dfcf76-26a7-43f6-b29d-5a76b1839569
📒 Files selected for processing (14)
router/core/cache_warmup_cdn.gorouter/core/cache_warmup_cdn_test.gorouter/core/graph_server.gorouter/core/init_config_poller.gorouter/core/router.gorouter/internal/httpclient/fallback.gorouter/internal/httpclient/fallback_test.gorouter/internal/persistedoperation/operationstorage/cdn/client.gorouter/internal/persistedoperation/operationstorage/cdn/client_test.gorouter/internal/persistedoperation/pqlmanifest/fetcher.gorouter/internal/persistedoperation/pqlmanifest/fetcher_test.gorouter/pkg/config/config.gorouter/pkg/routerconfig/cdn/client.gorouter/pkg/routerconfig/cdn/client_test.go
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
router/pkg/routerconfig/cdn/client.go (2)
177-180:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDrop the raw signature from the success log.
This logs a header-derived verification value on every successful poll. It adds little diagnostic value and unnecessarily persists CDN response metadata in info logs.
Suggested fix
cdn.logger.Info("Config signature validation successful", zap.String("federatedGraphID", cdn.federatedGraphID), - zap.String("signature", configSignature), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@router/pkg/routerconfig/cdn/client.go` around lines 177 - 180, The success log in the CDN polling flow currently emits the raw config signature (the zap.String("signature", configSignature) field in the cdn.logger.Info call inside the client code), which should be removed; update the cdn.logger.Info invocation in router/pkg/routerconfig/cdn/client.go (the block that logs "Config signature validation successful" in the CDN client) to omit the "signature" field—keep contextual fields like federatedGraphID only, or if you need the signature for debugging, log it at Debug level instead of Info.
130-141:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTreat fallback
304 Not Modifiedas a successful fallback result.
doGetRouterConfigrepresents304asconfigpoller.ErrConfigNotModified, so a primary transient failure followed by fallback304currently goes down the combined-error path. That turns an authoritative “no update” response into a poll failure.Suggested fix
- fallbackResp, fallbackBody, fallbackErr := cdn.doGetRouterConfig(ctx, version, cdn.cdnFallbackURL) - if fallbackErr == nil { + fallbackResp, fallbackBody, fallbackErr := cdn.doGetRouterConfig(ctx, version, cdn.cdnFallbackURL) + if fallbackErr == nil || errors.Is(fallbackErr, configpoller.ErrConfigNotModified) { resp, body, err = fallbackResp, fallbackBody, nil + if fallbackErr != nil { + err = fallbackErr + } } else { return nil, fmt.Errorf("primary CDN failed: %w; fallback CDN also failed: %v", primaryErr, fallbackErr) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@router/pkg/routerconfig/cdn/client.go` around lines 130 - 141, When handling a failed primary CDN in the cdn.doGetRouterConfig retry block, treat a fallbackErr equal to configpoller.ErrConfigNotModified as a successful fallback (no update) instead of an overall failure; specifically, after calling cdn.doGetRouterConfig(ctx, version, cdn.cdnFallbackURL) check if fallbackErr == nil OR fallbackErr == configpoller.ErrConfigNotModified and in either case set resp, body, err = fallbackResp, fallbackBody, nil (or clear err) so a 304 from the fallback is treated as success; update the branch that currently returns a combined error (using primaryErr and fallbackErr) to only return an error when fallbackErr is a real failure other than configpoller.ErrConfigNotModified.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@router/internal/persistedoperation/operationstorage/cdn/client.go`:
- Around line 111-126: When a fallback CDN succeeds the trace span still shows
Error because we never reset the span status; after the fallback call in the
cdn.doPersistedOperation branch (where primaryErr and fallbackErr are used and
cdn.cdnFallbackURL is invoked) add a call to span.SetStatus(codes.Ok, "") right
after confirming fallbackErr == nil so the span reflects success; ensure the
codes symbol from the tracing package is available in the file or imported if
missing.
In `@router/pkg/config/config.schema.json`:
- Around line 2334-2337: Update the JSON schema description for the
"fallback_url" property (and similarly for
storage_providers.cdn[*].fallback_url) to mention that the fallback CDN is used
not only on server errors (5xx) and rate limiting (429) but also on
transport/network/connection failures (e.g., timeouts, DNS or TCP errors);
modify the description string in config.schema.json to explicitly list these
trigger conditions so it accurately reflects the implementation.
---
Duplicate comments:
In `@router/pkg/routerconfig/cdn/client.go`:
- Around line 177-180: The success log in the CDN polling flow currently emits
the raw config signature (the zap.String("signature", configSignature) field in
the cdn.logger.Info call inside the client code), which should be removed;
update the cdn.logger.Info invocation in router/pkg/routerconfig/cdn/client.go
(the block that logs "Config signature validation successful" in the CDN client)
to omit the "signature" field—keep contextual fields like federatedGraphID only,
or if you need the signature for debugging, log it at Debug level instead of
Info.
- Around line 130-141: When handling a failed primary CDN in the
cdn.doGetRouterConfig retry block, treat a fallbackErr equal to
configpoller.ErrConfigNotModified as a successful fallback (no update) instead
of an overall failure; specifically, after calling cdn.doGetRouterConfig(ctx,
version, cdn.cdnFallbackURL) check if fallbackErr == nil OR fallbackErr ==
configpoller.ErrConfigNotModified and in either case set resp, body, err =
fallbackResp, fallbackBody, nil (or clear err) so a 304 from the fallback is
treated as success; update the branch that currently returns a combined error
(using primaryErr and fallbackErr) to only return an error when fallbackErr is a
real failure other than configpoller.ErrConfigNotModified.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a9fe466b-f430-4e1d-999e-8a4355690fa1
📒 Files selected for processing (9)
router/core/cache_warmup_cdn.gorouter/core/cache_warmup_cdn_test.gorouter/internal/persistedoperation/operationstorage/cdn/client.gorouter/internal/persistedoperation/operationstorage/cdn/client_test.gorouter/internal/persistedoperation/pqlmanifest/fetcher.gorouter/internal/persistedoperation/pqlmanifest/fetcher_test.gorouter/pkg/config/config.schema.jsonrouter/pkg/routerconfig/cdn/client.gorouter/pkg/routerconfig/cdn/client_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- router/pkg/routerconfig/cdn/client_test.go
- router/core/cache_warmup_cdn.go
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
router/pkg/routerconfig/cdn/client.go (1)
80-93:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard
optsbefore dereferencing optional fields.
NewClientreadsopts.FallbackEndpointbefore any nil check. Passingnilforoptswill panic in this exported constructor.🔧 Suggested fix
func NewClient(endpoint string, token string, opts *Options) (routerconfig.Client, error) { if token == "" { return nil, errors.New("token is required for CDN config provider") } + + if opts == nil { + opts = &Options{} + } u, err := url.Parse(endpoint) if err != nil { return nil, fmt.Errorf("invalid CDN URL %q: %w", endpoint, err) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@router/pkg/routerconfig/cdn/client.go` around lines 80 - 93, NewClient dereferences opts (opts.FallbackEndpoint) without guarding for nil; update NewClient to first check if opts == nil and either set opts = &Options{} or handle nil explicitly before accessing any fields (e.g., before parsing FallbackEndpoint), so all subsequent uses of opts (including FallbackEndpoint) are safe; reference the NewClient function and the opts.FallbackEndpoint field when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@router/pkg/routerconfig/cdn/client.go`:
- Around line 80-93: NewClient dereferences opts (opts.FallbackEndpoint) without
guarding for nil; update NewClient to first check if opts == nil and either set
opts = &Options{} or handle nil explicitly before accessing any fields (e.g.,
before parsing FallbackEndpoint), so all subsequent uses of opts (including
FallbackEndpoint) are safe; reference the NewClient function and the
opts.FallbackEndpoint field when making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cd74e327-e429-487d-ba0e-ba9453dc4068
📒 Files selected for processing (3)
router/internal/persistedoperation/operationstorage/cdn/client.gorouter/pkg/config/config.schema.jsonrouter/pkg/routerconfig/cdn/client.go
✅ Files skipped from review due to trivial changes (1)
- router/pkg/config/config.schema.json
Summary by CodeRabbit
New Features
Tests
Checklist
Open Source AI Manifesto
This project follows the principles of the Open Source AI Manifesto. Please ensure your contribution aligns with its principles.