Skip to content

Go: Fix extractor to extract root internal test files#21826

Open
AriehSchneier wants to merge 6 commits into
github:mainfrom
AriehSchneier:fix/go-extractor-root-test-files
Open

Go: Fix extractor to extract root internal test files#21826
AriehSchneier wants to merge 6 commits into
github:mainfrom
AriehSchneier:fix/go-extractor-root-test-files

Conversation

@AriehSchneier
Copy link
Copy Markdown

Problem

When CODEQL_EXTRACTOR_GO_OPTION_EXTRACT_TESTS=true is set, the Go extractor fails to extract internal test files (package foo) from repository roots when the project has nested test packages.

Example Impact

For repositories like go-git with 25 root test files:

  • Before: Only 1 test file extracted
  • After: All 25 test files correctly extracted

Root Cause

The extractor selected package variants by longest ID string. For a package like github.com/go-git/go-git/v6, packages.Load returns multiple variants:

  1. github.com/go-git/go-git/v6 (19 files, production only)
  2. github.com/go-git/go-git/v6 [github.com/go-git/go-git/v6.test] (39 files, production + 20 root tests) ← Should select
  3. github.com/go-git/go-git/v6 [github.com/go-git/go-git/v6/plumbing/format/packfile.test] (19 files, test dependency) ← Was selected (76 chars > 68 chars)

The old logic selected variant #3 (longest string) over #2, causing 20 root test files to be missing from the database.

Fix

Replace string length comparison with a better heuristic that prefers:

  1. Exact test packages (e.g., pkg [pkg.test]) over nested test dependencies
  2. More Syntax nodes (more files to extract)
  3. String length as a tiebreaker

Changes

  • Added isExactTestPackage() helper to detect exact vs nested test packages
  • Added isBetterPackage() with improved selection logic
  • Renamed longestPackageIdsbestPackageIds throughout

Testing

Added comprehensive unit tests in go/extractor/extractor_test.go:

  • TestIsExactTestPackage: 5 test cases for exact vs nested detection
  • TestIsBetterPackage: 7 test cases covering all selection scenarios
  • TestPackageSelectionRealWorld: Simulates the real-world go-git scenario

All tests pass ✅

Impact

  • Root external tests (package foo_test): Already working ✓
  • Root internal tests (package foo): Now correctly extracted when nested test packages exist ✓
  • Nested test files: Continue to work as before ✓

This ensures the extractor selects package variants with the most complete test coverage, particularly benefiting projects with both root-level tests and nested test packages.

When CODEQL_EXTRACTOR_GO_OPTION_EXTRACT_TESTS=true is set, the Go
extractor was incorrectly skipping internal test files (package foo)
at repository roots when the project contains nested test packages.

Root Cause:
The extractor selected package variants by longest ID string, but this
heuristic fails when nested packages have tests. For a package like
"github.com/go-git/go-git/v6", packages.Load returns multiple variants:

1. "github.com/go-git/go-git/v6" (19 files, production only)
2. "github.com/go-git/go-git/v6 [github.com/go-git/go-git/v6.test]"
   (39 files, production + 20 root tests) ← Should select this
3. "github.com/go-git/go-git/v6 [github.com/go-git/go-git/v6/plumbing/format/packfile.test]"
   (19 files, test dependency) ← Was incorrectly selected (longest string)

The old logic selected variant github#3 (76 chars) over github#2 (68 chars),
causing 20 root test files to be missing from the database.

Fix:
Replace string length comparison with a better heuristic that prefers:
1. Exact test packages (e.g., "pkg [pkg.test]") over nested dependencies
2. Packages with more Syntax nodes (more files to extract)
3. String length as a tiebreaker

This ensures the extractor selects the variant with the most complete
test coverage, particularly for root-level internal tests.

Testing:
- Added comprehensive unit tests covering the selection logic
- Tests simulate the real-world go-git scenario
- All tests pass

Impact:
Root-level external tests (package foo_test) were already extracted
correctly. This fix ensures internal tests (package foo) at the root
are now also extracted when they exist alongside nested test packages.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@AriehSchneier AriehSchneier requested review from a team as code owners May 11, 2026 04:27
@github-actions github-actions Bot added the Go label May 11, 2026
@AriehSchneier AriehSchneier changed the title Fix Go extractor to extract root internal test files Go: Fix extractor to extract root internal test files May 11, 2026
This test verifies that root internal test files (package foo, not
foo_test) are correctly extracted when the repository has both:
1. Root-level internal tests (main_test.go with package main)
2. Nested packages with tests (nested/nested_test.go)

This scenario reproduces the bug that was fixed: the old extractor
would select the wrong package variant and miss root internal test
files.

The test ensures:
- main_test.go (root internal test) is extracted
- nested/nested_test.go (nested test) is extracted
- All test functions from both files are present in the database

This prevents regression of the bug fix.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@owen-mc
Copy link
Copy Markdown
Contributor

owen-mc commented May 11, 2026

There is one check failing on CI. Below is the relevant part of the output. It seems you have to update go/extractor/BUILD.bazel, or run that bazel command, which will do it for you.

[71](https://github.com/github/codeql/actions/runs/25651837545/job/75313667294?pr=21826#step:3:178)
INFO: Running command line: bazel-bin/go/gen rules_go+/go/tools/go_bin_runner/bin/go _main/go/gazelle _main/go/extractor/cli/go-gen-dbscheme/internal/go-gen-dbscheme_/internal/go-gen-dbscheme
Dbscheme written to file /home/runner/work/codeql/codeql/go/ql/lib/go.dbscheme.
running gazelle /home/runner/.cache/bazel/_bazel_runner/04f3f5922329ba796a4637999a443048/execroot/_main/bazel-out/k8-opt/bin/go/gazelle /home/runner/work/codeql/codeql/go/extractor
adding header to newly generated BUILD files
diff --git a/go/extractor/BUILD.bazel b/go/extractor/BUILD.bazel
index fbc53f20..23158e25 100644
--- a/go/extractor/BUILD.bazel
+++ b/go/extractor/BUILD.bazel
@@ -1,4 +1,4 @@
-load("@rules_go//go:def.bzl", "go_library")
+load("@rules_go//go:def.bzl", "go_library", "go_test")
 load("@rules_java//java:defs.bzl", "java_library")
 load("@rules_pkg//pkg:mappings.bzl", "pkg_files")
 
@@ -60,3 +60,10 @@ pkg_files(
     },
     visibility = ["//go:__pkg__"],
 )
+
+go_test(
+    name = "extractor_test",
+    srcs = ["extractor_test.go"],
+    embed = [":extractor"],
+    deps = ["@org_golang_x_tools//go/packages"],
+)
please run bazel run //go:gen
Error: Process completed with exit code 1.

@owen-mc
Copy link
Copy Markdown
Contributor

owen-mc commented May 11, 2026

By the way, the reasoning behind the current algorithm is given in this commit message. I don't think nested tests were accounted for.

Copy link
Copy Markdown
Contributor

@owen-mc owen-mc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR. It seems to fix a real problem in the go extractor. The quality of the PR is very high. I only have a few suggestions for minor improvements, plus there is the bazel file that needs to be updated, as noted in my earlier comment.

Comment thread go/extractor/extractor.go Outdated
Comment on lines +68 to +70
if !strings.Contains(pkg.ID, " [") {
return false
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these lines needed? The check below will give the same result. The only possible benefit I can see is efficiency, not constructing the string, but it doesn't seem like that calling strings.Contains instead is definitely more performant.

Suggested change
if !strings.Contains(pkg.ID, " [") {
return false
}

Comment thread go/extractor/extractor.go Outdated
Comment on lines 194 to 211
// Build a map from package paths to their best IDs--
// in the context of a `go test -c` compilation, we will see the same package more than
// once, with IDs like "abc.com/pkgname [abc.com/pkgname.test]" to distinguish the version
// that contains and is used by test code.
// For our purposes it is simplest to just ignore the non-test version, since the test
// version seems to be a superset of it.
longestPackageIds := make(map[string]string)
// We prefer the version with the most complete test coverage, which is typically:
// 1. The exact test package (e.g., "pkg [pkg.test]") over nested test dependencies
// 2. The package with the most Syntax nodes (most files to extract)
// 3. The longest ID string as a tiebreaker
bestPackageIds := make(map[string]*packages.Package)
packages.Visit(pkgs, nil, func(pkg *packages.Package) {
if longestIDSoFar, present := longestPackageIds[pkg.PkgPath]; present {
if len(pkg.ID) > len(longestIDSoFar) {
longestPackageIds[pkg.PkgPath] = pkg.ID
if bestSoFar, present := bestPackageIds[pkg.PkgPath]; present {
if isBetterPackage(pkg, bestSoFar) {
bestPackageIds[pkg.PkgPath] = pkg
}
} else {
longestPackageIds[pkg.PkgPath] = pkg.ID
bestPackageIds[pkg.PkgPath] = pkg
}
})
Copy link
Copy Markdown
Contributor

@owen-mc owen-mc May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that this would be better as a separate function. Two pieces of evidence:

  • It has a lengthy comment explaining what it does.
  • There is a test which copy-pastes this code, which risks falling out of sync. Better to have a separate function which the test can call directly.

Note that it does not have to be an exported function, as I believe the test can still access it.

AriehSchneier and others added 2 commits May 11, 2026 20:55
Generated by manually applying the output from CI's Gazelle check.
This adds the go_test target for the new extractor_test.go file.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changes based on code review:

1. Remove redundant strings.Contains check in isExactTestPackage
   The equality check on the next line handles both cases, making
   the early return unnecessary.

2. Extract package selection logic into selectBestPackages function
   This reduces code duplication and allows the test to call the
   actual implementation rather than copying the logic.

3. Add TestSelectBestPackages to test the new function
   Comprehensive test covering single packages, test vs production,
   exact vs nested tests, and multiple packages.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@AriehSchneier AriehSchneier requested a review from owen-mc May 11, 2026 11:38
Comment thread go/ql/integration-tests/root-internal-tests/test.expected Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants