Fix ArtifactoryReferenceToken detector: tighten regex, emit unverified when no domain#4945
Conversation
Adds a detector for Together AI API keys (tgp_v1_ format). Verifies keys via GET /v1/models endpoint.
…us codes, deduplicate - Previously tokens with no co-located domain were silently dropped. Now an unverified result is always emitted so the token is never lost. - Switch to common.SaneHttpClient() consistent with all other detectors. - Properly drain response body with io.Copy(io.Discard) to avoid connection leaks. - Treat 403 as determinately invalid (was falling into error bucket). - Deduplicate keys and domains before processing to avoid duplicate results. - Extract verification into verifyGrafanaServiceAccount() helper.
…d when no domain
- Tighten regex from cmVmdGtu[a-zA-Z0-9]{54,60} to cmVmdGtuOj[a-zA-Z0-9]{54}
Reference tokens always decode to reftkn:01:<expiry>:<random> making
the full prefix cmVmdGtuOj (base64 of 'reftkn:') and total length
always exactly 64 chars. Variable length range was incorrect.
- Update keyword from cmVmdGtu to cmVmdGtuOj for more specific
pre-filtering and fewer false positives.
- Emit unverified result when no jfrog.io domain is found in the chunk.
Previously tokens without a co-located domain were silently dropped.
|
Akshara Sivaprasad seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
| tokenPat = regexp.MustCompile(`\b(cmVmdGtu[A-Za-z0-9]{56})\b`) | ||
| // Reference tokens are base64-encoded strings of "reftkn:01:<expiry>:<random>" | ||
| // Fixed format: prefix "cmVmdGtuOj" (base64 of "reftkn:") + exactly 54 base64 chars = 64 total | ||
| tokenPat = regexp.MustCompile(`\b(cmVmdGtuOj[a-zA-Z0-9]{54})\b`) |
There was a problem hiding this comment.
The Base64 encoding of reftkn: should be cmVmdGtuOg==, which doesn’t seem to match what’s included in this PR. Am I missing something here?
From what I can tell, the original implementation already accounts for this, as noted in the comments:
// Reference tokens are base64-encoded strings starting with "reftkn:01|<version>:<expiry>:<random>"
// The base64 encoding of "reftkn" is "cmVmdGtu", total length is always 64 charactersHave you encountered a different format in real-world usage?
There was a problem hiding this comment.
Tightened regex using observed token structure and exact-length matching to reduce overmatching and improve precision
Co-authored-by: asivaprasad09 <asivaprasad09@users.noreply.github.com>
Co-authored-by: asivaprasad09 <asivaprasad09@users.noreply.github.com>
Co-authored-by: asivaprasad09 <asivaprasad09@users.noreply.github.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 8f8b539. Configure here.
| defaultClient = common.SaneHttpClient() | ||
|
|
||
| // Together AI API key pattern: tgp_v1_ followed by 43 characters (alphanumeric, underscore, hyphen) | ||
| keyPat = regexp.MustCompile(`\b(tgp_v1_[A-Za-z0-9_-]{43})\b`) |
There was a problem hiding this comment.
Trailing \b fails to match tokens ending with hyphen
Medium Severity
The regex \b(tgp_v1_[A-Za-z0-9_-]{43})\b includes - in the character class but uses \b (word boundary) at the end. Since - is a non-word character, any valid token whose 43-character suffix ends with - will fail to match when followed by whitespace, punctuation, or end-of-string — the most common contexts in secret scanning. About 1/64 of randomly generated keys would be silently missed.
Reviewed by Cursor Bugbot for commit 8f8b539. Configure here.


Summary
Problem
The existing
ArtifactoryReferenceTokendetector had two issues:Incorrect regex — pattern
cmVmdGtu[a-zA-Z0-9]{54,60}used a variable length range (54-60) which is wrong. Reference tokens always decode toreftkn:01:<expiry>:<random>making the full base64 prefixcmVmdGtuOj(encoding ofreftkn:) and total token length always exactly 64 chars.Silent drop — if no
*.jfrog.iodomain was found in the same chunk as the token, the detector emitted no result at all. Tokens stored separately from their domain were completely invisible.Changes
cmVmdGtu[a-zA-Z0-9]{54,60}→cmVmdGtuOj[a-zA-Z0-9]{54}(exact 64 chars, correct prefix)cmVmdGtu→cmVmdGtuOjfor more specific pre-filtering and fewer false positives*.jfrog.iodomain is found in the same chunk, verification is attempted as beforeBefore vs After
cmVmdGtu[a-zA-Z0-9]{54,60}cmVmdGtuOj[a-zA-Z0-9]{54}cmVmdGtucmVmdGtuOjTest plan
cmVmdGtuOjAxOjE4MDg1NDE4OTQ6VWxMVVpVQ1d2dkFISEEzR0EyUldjUUp1MjRk(decodes toreftkn:01:1808541894:...)demo.jfrog.io→ verification attempted, correctly returns unverified on 401Note
Medium Risk
Updates multiple secret detectors’ regexes and result emission/verification behavior, which can change detection coverage and verification network traffic patterns. Risk is moderate due to potential false positive/negative shifts and new external API verification calls.
Overview
Improves secret detection coverage and correctness across detectors. The
ArtifactoryReferenceTokendetector now matches the fixed 64-char base64 format (more specific prefix/keyword) and emits an unverified result when no*.jfrog.iodomain is present, instead of dropping the token.Hardens Grafana service account detection and verification. The
GrafanaServiceAccountregex is tightened, deduping of keys/domains is added, tokens are emitted unverified when no*.grafana.netdomain is found, and verification is refactored to treat403as verified (authenticated but unauthorized), with new unit tests.Adds Together AI API key scanning. Introduces a new
TogetherAIdetector (pattern + optional verification againstapi.together.xyz) and wires it into defaults andDetectorTypeproto/Go enum, with tests and benchmarks.Reviewed by Cursor Bugbot for commit 8f8b539. Bugbot is set up for automated code reviews on this repo. Configure here.