Go: Fix extractor to extract root internal test files#21826
Go: Fix extractor to extract root internal test files#21826AriehSchneier wants to merge 6 commits into
Conversation
When CODEQL_EXTRACTOR_GO_OPTION_EXTRACT_TESTS=true is set, the Go extractor was incorrectly skipping internal test files (package foo) at repository roots when the project contains nested test packages. Root Cause: The extractor selected package variants by longest ID string, but this heuristic fails when nested packages have tests. For a package like "github.com/go-git/go-git/v6", packages.Load returns multiple variants: 1. "github.com/go-git/go-git/v6" (19 files, production only) 2. "github.com/go-git/go-git/v6 [github.com/go-git/go-git/v6.test]" (39 files, production + 20 root tests) ← Should select this 3. "github.com/go-git/go-git/v6 [github.com/go-git/go-git/v6/plumbing/format/packfile.test]" (19 files, test dependency) ← Was incorrectly selected (longest string) The old logic selected variant github#3 (76 chars) over github#2 (68 chars), causing 20 root test files to be missing from the database. Fix: Replace string length comparison with a better heuristic that prefers: 1. Exact test packages (e.g., "pkg [pkg.test]") over nested dependencies 2. Packages with more Syntax nodes (more files to extract) 3. String length as a tiebreaker This ensures the extractor selects the variant with the most complete test coverage, particularly for root-level internal tests. Testing: - Added comprehensive unit tests covering the selection logic - Tests simulate the real-world go-git scenario - All tests pass Impact: Root-level external tests (package foo_test) were already extracted correctly. This fix ensures internal tests (package foo) at the root are now also extracted when they exist alongside nested test packages. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This test verifies that root internal test files (package foo, not foo_test) are correctly extracted when the repository has both: 1. Root-level internal tests (main_test.go with package main) 2. Nested packages with tests (nested/nested_test.go) This scenario reproduces the bug that was fixed: the old extractor would select the wrong package variant and miss root internal test files. The test ensures: - main_test.go (root internal test) is extracted - nested/nested_test.go (nested test) is extracted - All test functions from both files are present in the database This prevents regression of the bug fix. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
There is one check failing on CI. Below is the relevant part of the output. It seems you have to update |
|
By the way, the reasoning behind the current algorithm is given in this commit message. I don't think nested tests were accounted for. |
owen-mc
left a comment
There was a problem hiding this comment.
Thank you for this PR. It seems to fix a real problem in the go extractor. The quality of the PR is very high. I only have a few suggestions for minor improvements, plus there is the bazel file that needs to be updated, as noted in my earlier comment.
| if !strings.Contains(pkg.ID, " [") { | ||
| return false | ||
| } |
There was a problem hiding this comment.
Are these lines needed? The check below will give the same result. The only possible benefit I can see is efficiency, not constructing the string, but it doesn't seem like that calling strings.Contains instead is definitely more performant.
| if !strings.Contains(pkg.ID, " [") { | |
| return false | |
| } |
| // Build a map from package paths to their best IDs-- | ||
| // in the context of a `go test -c` compilation, we will see the same package more than | ||
| // once, with IDs like "abc.com/pkgname [abc.com/pkgname.test]" to distinguish the version | ||
| // that contains and is used by test code. | ||
| // For our purposes it is simplest to just ignore the non-test version, since the test | ||
| // version seems to be a superset of it. | ||
| longestPackageIds := make(map[string]string) | ||
| // We prefer the version with the most complete test coverage, which is typically: | ||
| // 1. The exact test package (e.g., "pkg [pkg.test]") over nested test dependencies | ||
| // 2. The package with the most Syntax nodes (most files to extract) | ||
| // 3. The longest ID string as a tiebreaker | ||
| bestPackageIds := make(map[string]*packages.Package) | ||
| packages.Visit(pkgs, nil, func(pkg *packages.Package) { | ||
| if longestIDSoFar, present := longestPackageIds[pkg.PkgPath]; present { | ||
| if len(pkg.ID) > len(longestIDSoFar) { | ||
| longestPackageIds[pkg.PkgPath] = pkg.ID | ||
| if bestSoFar, present := bestPackageIds[pkg.PkgPath]; present { | ||
| if isBetterPackage(pkg, bestSoFar) { | ||
| bestPackageIds[pkg.PkgPath] = pkg | ||
| } | ||
| } else { | ||
| longestPackageIds[pkg.PkgPath] = pkg.ID | ||
| bestPackageIds[pkg.PkgPath] = pkg | ||
| } | ||
| }) |
There was a problem hiding this comment.
It seems to me that this would be better as a separate function. Two pieces of evidence:
- It has a lengthy comment explaining what it does.
- There is a test which copy-pastes this code, which risks falling out of sync. Better to have a separate function which the test can call directly.
Note that it does not have to be an exported function, as I believe the test can still access it.
Generated by manually applying the output from CI's Gazelle check. This adds the go_test target for the new extractor_test.go file. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changes based on code review: 1. Remove redundant strings.Contains check in isExactTestPackage The equality check on the next line handles both cases, making the early return unnecessary. 2. Extract package selection logic into selectBestPackages function This reduces code duplication and allows the test to call the actual implementation rather than copying the logic. 3. Add TestSelectBestPackages to test the new function Comprehensive test covering single packages, test vs production, exact vs nested tests, and multiple packages. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Problem
When
CODEQL_EXTRACTOR_GO_OPTION_EXTRACT_TESTS=trueis set, the Go extractor fails to extract internal test files (package foo) from repository roots when the project has nested test packages.Example Impact
For repositories like go-git with 25 root test files:
Root Cause
The extractor selected package variants by longest ID string. For a package like
github.com/go-git/go-git/v6,packages.Loadreturns multiple variants:github.com/go-git/go-git/v6(19 files, production only)github.com/go-git/go-git/v6 [github.com/go-git/go-git/v6.test](39 files, production + 20 root tests) ← Should selectgithub.com/go-git/go-git/v6 [github.com/go-git/go-git/v6/plumbing/format/packfile.test](19 files, test dependency) ← Was selected (76 chars > 68 chars)The old logic selected variant #3 (longest string) over #2, causing 20 root test files to be missing from the database.
Fix
Replace string length comparison with a better heuristic that prefers:
pkg [pkg.test]) over nested test dependenciesChanges
isExactTestPackage()helper to detect exact vs nested test packagesisBetterPackage()with improved selection logiclongestPackageIds→bestPackageIdsthroughoutTesting
Added comprehensive unit tests in
go/extractor/extractor_test.go:TestIsExactTestPackage: 5 test cases for exact vs nested detectionTestIsBetterPackage: 7 test cases covering all selection scenariosTestPackageSelectionRealWorld: Simulates the real-world go-git scenarioAll tests pass ✅
Impact
package foo_test): Already working ✓package foo): Now correctly extracted when nested test packages exist ✓This ensures the extractor selects package variants with the most complete test coverage, particularly benefiting projects with both root-level tests and nested test packages.