Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Documentation/git-backfill.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,10 @@ OPTIONS
+
You may also use commit-limiting options understood by
linkgit:git-rev-list[1] such as `--first-parent`, `--since`, or pathspecs.
+
Most `--filter=<spec>` options don't work with the purpose of
`git backfill`, but the `sparse:<oid>` filter is integrated to provide a
focused set of paths to download, distinct from the `--sparse` option.

SEE ALSO
--------
Expand Down
8 changes: 5 additions & 3 deletions Documentation/git-pack-objects.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -402,9 +402,11 @@ will be automatically changed to version `1`.
of filenames that cause collisions in Git's default name-hash
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taylor Blau wrote on the Git mailing list (how to reply to this email):

On Mon, May 11, 2026 at 06:13:02PM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
> index b78175fbe1..917045d5c3 100644
> --- a/Documentation/git-pack-objects.adoc
> +++ b/Documentation/git-pack-objects.adoc
> @@ -402,9 +402,9 @@ will be automatically changed to version `1`.
>  	of filenames that cause collisions in Git's default name-hash
>  	algorithm.
>  +
> -Incompatible with `--delta-islands`, `--shallow`, or `--filter`. The
> -`--use-bitmap-index` option will be ignored in the presence of
> -`--path-walk.`
> +Incompatible with `--delta-islands`. The `--use-bitmap-index` option is
> +ignored in the presence of `--path-walk`. Whe `--path-walk` option
> +supports the `--filter=<spec>` form `blob:none`.

s/Whe/The/ ?

Thanks,
Taylor

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 5/11/2026 2:38 PM, Taylor Blau wrote:
> On Mon, May 11, 2026 at 06:13:02PM +0000, Derrick Stolee via GitGitGadget wrote:
>> diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
>> index b78175fbe1..917045d5c3 100644
>> --- a/Documentation/git-pack-objects.adoc
>> +++ b/Documentation/git-pack-objects.adoc
>> @@ -402,9 +402,9 @@ will be automatically changed to version `1`.
>>  	of filenames that cause collisions in Git's default name-hash
>>  	algorithm.
>>  +
>> -Incompatible with `--delta-islands`, `--shallow`, or `--filter`. The
>> -`--use-bitmap-index` option will be ignored in the presence of
>> -`--path-walk.`
>> +Incompatible with `--delta-islands`. The `--use-bitmap-index` option is
>> +ignored in the presence of `--path-walk`. Whe `--path-walk` option
>> +supports the `--filter=<spec>` form `blob:none`.
> 
> s/Whe/The/ ?
Yes, of course. This typo will follow the remaining patches in the context
lines. Will fix in v4.

Thanks,
-Stolee

algorithm.
+
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taylor Blau wrote on the Git mailing list (how to reply to this email):

On Mon, May 11, 2026 at 06:13:04PM +0000, Derrick Stolee via GitGitGadget wrote:
> [...] Blobs whose size cannot be determined (e.g. missing in a partial
> clone) are conservatively included, matching the existing filter
> behavior.

Makes sense, but...

> Notice that this inspection of object sizes requires the content to be
> present in the repository. The odb_read_object_info() call will download
> a missing blob on-demand.

... this says that we do download missing blobs on-demand. Should we be
(temporarily) disabling 'fetch_if_missing' for this phase, or using
odb_read_object_info_extended() with the OBJECT_INFO_SKIP_FETCH_OBJECT
bit set?

I don't know enough about 'git backfill' to know whether the current
behavior is more reasonable than the above suggestion, so please let me
know if I'm missing something here!

The rest looks good to me.

Thanks,
Taylor

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taylor Blau wrote on the Git mailing list (how to reply to this email):

On Mon, May 11, 2026 at 06:13:07PM +0000, Taylor Blau via GitGitGadget wrote:
> @@ -534,6 +545,18 @@ static int setup_pending_objects(struct path_walk_info *info,
>  			free(tagged_blobs);
>  		}
>  	}
> +	if (tagged_trees) {
> +		if (tagged_trees->oids.nr) {
> +			const char *tagged_tree_path = "/tagged-trees";
> +			tagged_trees->type = OBJ_TREE;
> +			tagged_trees->maybe_interesting = 1;
> +			strmap_put(&ctx->paths_to_lists, tagged_tree_path, tagged_trees);
> +			push_to_stack(ctx, tagged_tree_path);
> +		} else {
> +			oid_array_clear(&tagged_trees->oids);
> +			free(tagged_trees);
> +		}
> +	}
>  	if (tags) {
>  		if (tags->oids.nr) {
>  			const char *tag_path = "/tags";

It looks like there is some prior art here for enumerating a sentinel
path for "/tags", but I am curious why we did the same for
directly-listed trees in the presence of --filter=tree:0.

Thanks,
Taylor

Incompatible with `--delta-islands`, `--shallow`, or `--filter`. The
`--use-bitmap-index` option will be ignored in the presence of
`--path-walk.`
Incompatible with `--delta-islands`. The `--use-bitmap-index` option is
ignored in the presence of `--path-walk`. Whe `--path-walk` option
supports the `--filter=<spec>` forms `blob:none`, `blob:limit=<n>`,
`tree:0`, `object:type=<type>`, and `sparse:<oid>`. These supported filter
types can be combined with the `combine:<spec>+<spec>` form.


DELTA ISLANDS
Expand Down
7 changes: 7 additions & 0 deletions Documentation/technical/api-path-walk.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,13 @@ commits.
applications could disable some options to make it simpler to walk
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taylor Blau wrote on the Git mailing list (how to reply to this email):

On Mon, May 11, 2026 at 06:13:01PM +0000, Derrick Stolee via GitGitGadget wrote:
> We can tell that a path is part of the directly-referenced objects if its
> path name starts with '/' (other paths, including root trees never have this
> starting character). Create a path_is_for_direct_objects() to make this
> meaning clear, especially as we add more references in the future as we
> integrate the path-walk API with partial clone filter options.

I don't know that I have anything in the way of a better suggestion, but
I can't help but feel like the `path_is_for_direct_objects()` check is
somewhat brittle as-is.

I am not familiar enough with the path-walk.c internals to come up with
a good suggestion off the top of my head, but I figured I'd raise it
here in case you had thoughts on alternatives.
> diff --git a/path-walk.c b/path-walk.c
> index 6e426af433..59a7670c5b 100644
> --- a/path-walk.c
> +++ b/path-walk.c
> @@ -248,6 +248,16 @@ static int add_tree_entries(struct path_walk_context *ctx,
>  	return 0;
>  }
>
> +/*
> + * Paths starting with '/' (e.g., "/tags", "/tagged-blobs") hold objects that
> + * were directly requested by 'pending' objects rather than discovered during
> + * tree traversal.
> + */
> +static int path_is_for_direct_objects(const char *path)
> +{
> +	return path[0] == '/';

If we do end up keeping this approach, should we have a NULL check on
path itself here? I think that could even be an ASSERT(), since
something has gone wrong if we have a NULL at this point, but I'd rather
die by an assertion rather than a segfault here if so.

Thanks,
Taylor

the objects or to have fewer calls to `path_fn`.
+
Note that objects directly requested as pending objects (such as targets
of lightweight tags or other ref tips) are always emitted to `path_fn`,
even when the corresponding type flag is disabled. Only objects
discovered during the tree walk are subject to these type filters. This
ensures that objects specifically requested through the revision input
are never silently dropped.
+
While it is possible to walk only commits in this way, consumers would be
better off using the revision walk API instead.

Expand Down
8 changes: 5 additions & 3 deletions builtin/backfill.c
Original file line number Diff line number Diff line change
Expand Up @@ -96,9 +96,10 @@ static void reject_unsupported_rev_list_options(struct rev_info *revs)
if (revs->explicit_diff_merges)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taylor Blau wrote on the Git mailing list (how to reply to this email):

On Mon, May 11, 2026 at 06:13:03PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <stolee@gmail.com>
>
> The 'git backfill' command uses the path-walk API in a critical way: it
> uses the objects output from the command to find the batches of missing
> objects that should be requested from the server. Unlike 'git
> pack-objects', we cannot fall back to another mechanism.
>
> The previous change added the path_walk_filter_compatible() method that
> we can reuse here. Use it during argument validation in cmd_backfill().

Makes sense.

> @@ -96,9 +96,8 @@ static void reject_unsupported_rev_list_options(struct rev_info *revs)
>  	if (revs->explicit_diff_merges)
>  		die(_("'%s' cannot be used with 'git backfill'"),
>  		    "--diff-merges");
> -	if (revs->filter.choice)
> -		die(_("'%s' cannot be used with 'git backfill'"),
> -		    "--filter");
> +	if (!path_walk_filter_compatible(&revs->filter))
> +		die(_("cannot backfill with these filter options"));

I was going to suggest that we indicate the type of object filter which
was incompatible, but that gets a little tricky if the incompatible
filter is a child of a LOFC_COMBINE filter.

Resolving that does not seem worth our while, so I think that what you
wrote here is more than sufficient.

Thanks,
Taylor

die(_("'%s' cannot be used with 'git backfill'"),
"--diff-merges");
if (revs->filter.choice)
die(_("'%s' cannot be used with 'git backfill'"),
"--filter");
if (!path_walk_filter_compatible(&revs->filter))
die(_("cannot backfill with these filter options"));
if (revs->filter.blob_limit_value)
die(_("cannot backfill with blob size limits"));
}

static int do_backfill(struct backfill_context *ctx)
Expand All @@ -108,6 +109,7 @@ static int do_backfill(struct backfill_context *ctx)

if (ctx->sparse) {
CALLOC_ARRAY(info.pl, 1);
info.pl_sparse_trees = 1;
if (get_sparse_checkout_patterns(info.pl)) {
path_walk_info_clear(&info);
return error(_("problem loading sparse-checkout"));
Expand Down
23 changes: 13 additions & 10 deletions builtin/pack-objects.c
Original file line number Diff line number Diff line change
Expand Up @@ -4754,7 +4754,7 @@ static int add_objects_by_path(const char *path,
return 0;
}

static void get_object_list_path_walk(struct rev_info *revs)
static int get_object_list_path_walk(struct rev_info *revs)
{
struct path_walk_info info = PATH_WALK_INFO_INIT;
unsigned int processed = 0;
Expand All @@ -4777,8 +4777,9 @@ static void get_object_list_path_walk(struct rev_info *revs)
result = walk_objects_by_path(&info);
trace2_region_leave("pack-objects", "path-walk", revs->repo);

if (result)
die(_("failed to pack objects via path-walk"));
path_walk_info_clear(&info);

return result;
}

static void get_object_list(struct rev_info *revs, struct strvec *argv)
Expand Down Expand Up @@ -4841,8 +4842,13 @@ static void get_object_list(struct rev_info *revs, struct strvec *argv)
fn_show_object = show_object;

if (path_walk) {
get_object_list_path_walk(revs);
} else {
if (get_object_list_path_walk(revs)) {
warning(_("failed to pack objects via path-walk"));
path_walk = 0;
}
}

if (!path_walk) {
if (prepare_revision_walk(revs))
die(_("revision walk setup failed"));
mark_edges_uninteresting(revs, show_edge, sparse);
Expand Down Expand Up @@ -5177,7 +5183,7 @@ int cmd_pack_objects(int argc,

if (path_walk) {
const char *option = NULL;
if (filter_options.choice)
if (!path_walk_filter_compatible(&filter_options))
option = "--filter";
else if (use_delta_islands)
option = "--delta-islands";
Expand All @@ -5190,10 +5196,7 @@ int cmd_pack_objects(int argc,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <stolee@gmail.com>
>
> When 'git pack-objects' has the --path-walk option enabled, it uses a
> different set of revision walk parameters than normal. For once,

"once" -> "one" (or "instance")?

> --objects was previously assumed by the path-walk API and was not needed
> to be added. We also needed --boundary to allow discovering
> UNINTERESTING objects to use as delta bases.
>
> We will be updating the path-walk API soon to work with some filter
> options. However, the revision machinery will trigger a fatal error:
>
>   fatal: object filtering requires --objects
>
> The fix is easy: add the --objects option as an argument. This has no
> effect on the path-walk API but does simplify the revision option
> parsing for the objects filter.
>
> We can remove the comment about "removing" the options because they were
> never removed and instead not added. We still need to disable using
> bitmaps.

In the old code, there was a valid reason why bitmaps were not used
(i.e., "--objects" not enabled), but that no longer holds (i.e., now
we add "--objects" ourselves).  Do we need to give an updated
rationale to keep bitmap disabled?

> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
>  builtin/pack-objects.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index dd2480a73d..4338962904 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -5190,10 +5190,7 @@ int cmd_pack_objects(int argc,
>  	}
>  	if (path_walk) {
>  		strvec_push(&rp, "--boundary");
> -		 /*
> -		  * We must disable the bitmaps because we are removing
> -		  * the --objects / --objects-edge[-aggressive] options.
> -		  */
> +		strvec_push(&rp, "--objects");
>  		use_bitmap_index = 0;
>  	} else if (thin) {
>  		use_internal_rev_list = 1;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 5/3/2026 8:49 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> When 'git pack-objects' has the --path-walk option enabled, it uses a
>> different set of revision walk parameters than normal. For once,
> 
> "once" -> "one" (or "instance")?

Yes, "one". Sorry for the typo.

>> --objects was previously assumed by the path-walk API and was not needed
>> to be added. We also needed --boundary to allow discovering
>> UNINTERESTING objects to use as delta bases.
>>
>> We will be updating the path-walk API soon to work with some filter
>> options. However, the revision machinery will trigger a fatal error:
>>
>>   fatal: object filtering requires --objects
>>
>> The fix is easy: add the --objects option as an argument. This has no
>> effect on the path-walk API but does simplify the revision option
>> parsing for the objects filter.
>>
>> We can remove the comment about "removing" the options because they were
>> never removed and instead not added. We still need to disable using
>> bitmaps.
> 
> In the old code, there was a valid reason why bitmaps were not used
> (i.e., "--objects" not enabled), but that no longer holds (i.e., now
> we add "--objects" ourselves).  Do we need to give an updated
> rationale to keep bitmap disabled?

>>  	if (path_walk) {
>>  		strvec_push(&rp, "--boundary");
>> -		 /*
>> -		  * We must disable the bitmaps because we are removing
>> -		  * the --objects / --objects-edge[-aggressive] options.
>> -		  */
>> +		strvec_push(&rp, "--objects");
>>  		use_bitmap_index = 0;
>>  	} else if (thin) {
This old comment is perhaps confusing things. The important thing here
is to disable bitmaps with 'use_bitmap_index = 0;' (though perhaps not
for long [1]).

[1] https://lore.kernel.org/git/f50f8df01a9f216d5b4388b2fe4ff58077b574f3.1777853408.git.me@ttaylorr.com/

The path-walk API itself disables the objects walk for the revision
machinery in walk_objects_by_path():

	info->revs->blob_objects = info->revs->tree_objects = 0;

This allows the path-walk API to rely on the revision walk for a
_commits only_ walk and then have the path-walk API handle the trees
and blobs.

The reason we need to add "--objects" now is to allow for parsing the
"--filter" option without the revision logic complaining.

Thanks,
-Stolee

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taylor Blau wrote on the Git mailing list (how to reply to this email):

On Mon, May 11, 2026 at 06:12:59PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <stolee@gmail.com>
>
> When 'git pack-objects' has the --path-walk option enabled, it uses a
> different set of revision walk parameters than normal. For once,

s/once/one/ ? Not sure.

> --objects was previously assumed by the path-walk API and was not needed

s/was not/did not/ ? Also not sure.

> ---
>  builtin/pack-objects.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)

Looks good.

Thanks,
Taylor

if (path_walk) {
strvec_push(&rp, "--boundary");
/*
* We must disable the bitmaps because we are removing
* the --objects / --objects-edge[-aggressive] options.
*/
strvec_push(&rp, "--objects");
use_bitmap_index = 0;
} else if (thin) {
use_internal_rev_list = 1;
Expand Down
Loading
Loading