Skip to content

Fix line numbers for duplicate secrets within a chunk#4910

Open
amanfcp wants to merge 8 commits intomainfrom
INS-30
Open

Fix line numbers for duplicate secrets within a chunk#4910
amanfcp wants to merge 8 commits intomainfrom
INS-30

Conversation

@amanfcp
Copy link
Copy Markdown
Contributor

@amanfcp amanfcp commented Apr 22, 2026

Summary

When the same secret appears multiple times in a chunk, every finding was reported with the line number of the first occurrence. This fixes it so each finding reports the line where it actually sits.

Closes #2502

Root cause

FragmentLineOffset in pkg/engine/engine.go uses bytes.Cut(chunk.Data, secret), which always splits at the first match. Because each detectors.Result only carries the secret bytes (not its byte position in the chunk), every duplicate result went through this first-match lookup and inherited the same line number.

Duplicates survive to this function under normal scans. CleanResults deduplication only runs under --only-verified, or when a detector's ShouldCleanResultsIrrespectiveOfConfiguration() returns true. By default, any detector that emits one result per regex match hits the bug.

Fix

Localized to the engine. No detector changes required.

  1. pkg/detectors/detectors.go: add a private chunkOffset / chunkOffsetSet pair to Result with SetChunkOffset / ChunkOffset / HasChunkOffset accessors.
  2. pkg/engine/engine.go: add AssignDuplicateLineOffsets(chunk, results), which groups results by secret value, walks the chunk with bytes.Index to find each successive occurrence, and stamps the offset onto the corresponding result. Called once per result batch inside detectChunk, before results are dispatched. Unique secrets are skipped (zero overhead).
  3. FragmentLineOffset: gains a fast path. When result.HasChunkOffset() is true, compute the line directly from the pre-assigned offset and run the ignore-tag check against that occurrence's line. The original bytes.Cut logic is preserved as a fallback for any caller that didn't go through AssignDuplicateLineOffsets, so the change is backward compatible.
  4. Test fixtures: pkg/detectors/datadogapikey/datadogapikey_test.go and pkg/custom_detectors/custom_detectors_test.go compared detectors.Result values via cmp.Diff with IgnoreFields enumerating the then-existing unexported fields. Switched them to cmpopts.IgnoreUnexported(detectors.Result{}) so they tolerate the new private fields and are future-proof against similar additions.

A side effect worth calling out: the trufflehog:ignore check now runs per-occurrence instead of always inspecting the first occurrence's line. That's a real behavioral improvement and is covered explicitly by a test.

Why not fix it in detectors?

Threading byte offsets through every detector's FromData would touch 700+ implementations and require switching most of them from regex.FindAllStringSubmatch to FindAllStringIndex. The engine is the single place in the pipeline that has both the full chunk data and the full result batch, so a localized engine-level fix is the minimal correct change.

Performance

  • Result struct: 160 → 176 bytes (+16 bytes for the int64 + bool + alignment padding). Results are short-lived per-chunk, so there is no meaningful memory pressure.
  • AssignDuplicateLineOffsets: O(N) with one map allocation per call. The inner bytes.Index loop only runs for groups with duplicates.

End-to-end verification

Reproduced the reporter's scenario against a fresh filesystem scan using a CustomRegex detector and a file containing FAKE_SECRET_ABC123XYZ on lines 2, 5, and 8:

Binary Reported lines
baseline line=2, line=2, line=2
with this fix line=2, line=5, line=8

Test plan

  • TestFragmentLineOffset_DuplicateSecrets: regression test for the bug (fails without the fix, passes with it)
  • TestAssignDuplicateLineOffsets: unit test covering unique secrets, duplicates, and result ordering
  • TestFragmentLineOffset_DuplicateSecretsWithIgnoreTag: verifies per-occurrence trufflehog:ignore handling
  • Existing TestFragmentLineOffset and TestFragmentLineOffsetWithPrimarySecret* still pass (fallback path unchanged)
  • Full go test ./pkg/engine/... and go test ./pkg/detectors/... pass
  • Manual CLI verification against a CustomRegex detector matches the expected line numbers

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

Note

Medium Risk
Touches core engine result line-number assignment by adding per-result chunk offsets; incorrect offset calculation could shift reported lines or ignore-tag behavior, but changes are localized and well-covered by new tests.

Overview
Fixes incorrect line-number reporting when the same secret appears multiple times within a single chunk by precomputing and storing a per-result byte offset.

detectors.Result gains private chunk-offset state, detectChunk now calls AssignDuplicateLineOffsets before dispatch, and FragmentLineOffset uses the offset (when set) to compute the correct occurrence’s line and apply trufflehog:ignore per occurrence. Tests are added for duplicate-secret line offsets (including ignore-tag cases), and a couple detector tests switch to cmpopts.IgnoreUnexported(detectors.Result{}) to tolerate the new private fields.

Reviewed by Cursor Bugbot for commit 485455a. Bugbot is set up for automated code reviews on this repo. Configure here.

@amanfcp amanfcp requested a review from a team April 22, 2026 16:54
@amanfcp amanfcp requested review from a team as code owners April 22, 2026 16:54
@amanfcp amanfcp requested a review from a team April 22, 2026 16:54
Comment thread pkg/engine/engine.go Outdated
Copy link
Copy Markdown
Contributor

@mustansir14 mustansir14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this long overdue bug!

@amanfcp amanfcp requested a review from mustansir14 April 23, 2026 16:46
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit c6ed135. Configure here.

Comment thread pkg/engine/engine.go Outdated
Comment thread pkg/detectors/detectors.go
@amanfcp amanfcp added the review/product-eng Team integrations reviewed, awaiting product-eng review label Apr 24, 2026
Copy link
Copy Markdown

@gaborbernat gaborbernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validated the fix end-to-end against a real internal repo (DCA/PreppyService preppy/modules/config.py) that had 4 identical [FAIL] values at lines 87, 95, 103, 111.

Before (upstream/main): all 4 reported as line 87
After (this PR): correctly reported as 87, 95, 103, 111

Tested with trufflehog filesystem --json --config=... --include-detectors=CustomRegex --no-verification. Both the synthetic case (3 identical secrets) and the real-world case pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/product-eng Team integrations reviewed, awaiting product-eng review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Line of code calculation is wrong for sequential identical secrets

4 participants