-
Notifications
You must be signed in to change notification settings - Fork 186
pack-objects: integrate --path-walk and some --filter options #2101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: en/backfill-fixes-and-edges
Are you sure you want to change the base?
Changes from all commits
0840110
5cc6383
77329cf
50933cc
b2deb7f
da191e2
a1ab704
2360a5b
d9f5a98
c9efff0
b221ea4
16bd3c3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -402,9 +402,11 @@ will be automatically changed to version `1`. | |
| of filenames that cause collisions in Git's default name-hash | ||
| algorithm. | ||
| + | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Taylor Blau wrote on the Git mailing list (how to reply to this email): On Mon, May 11, 2026 at 06:13:04PM +0000, Derrick Stolee via GitGitGadget wrote:
> [...] Blobs whose size cannot be determined (e.g. missing in a partial
> clone) are conservatively included, matching the existing filter
> behavior.
Makes sense, but...
> Notice that this inspection of object sizes requires the content to be
> present in the repository. The odb_read_object_info() call will download
> a missing blob on-demand.
... this says that we do download missing blobs on-demand. Should we be
(temporarily) disabling 'fetch_if_missing' for this phase, or using
odb_read_object_info_extended() with the OBJECT_INFO_SKIP_FETCH_OBJECT
bit set?
I don't know enough about 'git backfill' to know whether the current
behavior is more reasonable than the above suggestion, so please let me
know if I'm missing something here!
The rest looks good to me.
Thanks,
TaylorThere was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Taylor Blau wrote on the Git mailing list (how to reply to this email): On Mon, May 11, 2026 at 06:13:07PM +0000, Taylor Blau via GitGitGadget wrote:
> @@ -534,6 +545,18 @@ static int setup_pending_objects(struct path_walk_info *info,
> free(tagged_blobs);
> }
> }
> + if (tagged_trees) {
> + if (tagged_trees->oids.nr) {
> + const char *tagged_tree_path = "/tagged-trees";
> + tagged_trees->type = OBJ_TREE;
> + tagged_trees->maybe_interesting = 1;
> + strmap_put(&ctx->paths_to_lists, tagged_tree_path, tagged_trees);
> + push_to_stack(ctx, tagged_tree_path);
> + } else {
> + oid_array_clear(&tagged_trees->oids);
> + free(tagged_trees);
> + }
> + }
> if (tags) {
> if (tags->oids.nr) {
> const char *tag_path = "/tags";
It looks like there is some prior art here for enumerating a sentinel
path for "/tags", but I am curious why we did the same for
directly-listed trees in the presence of --filter=tree:0.
Thanks,
Taylor |
||
| Incompatible with `--delta-islands`, `--shallow`, or `--filter`. The | ||
| `--use-bitmap-index` option will be ignored in the presence of | ||
| `--path-walk.` | ||
| Incompatible with `--delta-islands`. The `--use-bitmap-index` option is | ||
| ignored in the presence of `--path-walk`. Whe `--path-walk` option | ||
| supports the `--filter=<spec>` forms `blob:none`, `blob:limit=<n>`, | ||
| `tree:0`, `object:type=<type>`, and `sparse:<oid>`. These supported filter | ||
| types can be combined with the `combine:<spec>+<spec>` form. | ||
|
|
||
|
|
||
| DELTA ISLANDS | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -48,6 +48,13 @@ commits. | |
| applications could disable some options to make it simpler to walk | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Taylor Blau wrote on the Git mailing list (how to reply to this email): On Mon, May 11, 2026 at 06:13:01PM +0000, Derrick Stolee via GitGitGadget wrote:
> We can tell that a path is part of the directly-referenced objects if its
> path name starts with '/' (other paths, including root trees never have this
> starting character). Create a path_is_for_direct_objects() to make this
> meaning clear, especially as we add more references in the future as we
> integrate the path-walk API with partial clone filter options.
I don't know that I have anything in the way of a better suggestion, but
I can't help but feel like the `path_is_for_direct_objects()` check is
somewhat brittle as-is.
I am not familiar enough with the path-walk.c internals to come up with
a good suggestion off the top of my head, but I figured I'd raise it
here in case you had thoughts on alternatives.
> diff --git a/path-walk.c b/path-walk.c
> index 6e426af433..59a7670c5b 100644
> --- a/path-walk.c
> +++ b/path-walk.c
> @@ -248,6 +248,16 @@ static int add_tree_entries(struct path_walk_context *ctx,
> return 0;
> }
>
> +/*
> + * Paths starting with '/' (e.g., "/tags", "/tagged-blobs") hold objects that
> + * were directly requested by 'pending' objects rather than discovered during
> + * tree traversal.
> + */
> +static int path_is_for_direct_objects(const char *path)
> +{
> + return path[0] == '/';
If we do end up keeping this approach, should we have a NULL check on
path itself here? I think that could even be an ASSERT(), since
something has gone wrong if we have a NULL at this point, but I'd rather
die by an assertion rather than a segfault here if so.
Thanks,
Taylor |
||
| the objects or to have fewer calls to `path_fn`. | ||
| + | ||
| Note that objects directly requested as pending objects (such as targets | ||
| of lightweight tags or other ref tips) are always emitted to `path_fn`, | ||
| even when the corresponding type flag is disabled. Only objects | ||
| discovered during the tree walk are subject to these type filters. This | ||
| ensures that objects specifically requested through the revision input | ||
| are never silently dropped. | ||
| + | ||
| While it is possible to walk only commits in this way, consumers would be | ||
| better off using the revision walk API instead. | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -96,9 +96,10 @@ static void reject_unsupported_rev_list_options(struct rev_info *revs) | |
| if (revs->explicit_diff_merges) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Taylor Blau wrote on the Git mailing list (how to reply to this email): On Mon, May 11, 2026 at 06:13:03PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <stolee@gmail.com>
>
> The 'git backfill' command uses the path-walk API in a critical way: it
> uses the objects output from the command to find the batches of missing
> objects that should be requested from the server. Unlike 'git
> pack-objects', we cannot fall back to another mechanism.
>
> The previous change added the path_walk_filter_compatible() method that
> we can reuse here. Use it during argument validation in cmd_backfill().
Makes sense.
> @@ -96,9 +96,8 @@ static void reject_unsupported_rev_list_options(struct rev_info *revs)
> if (revs->explicit_diff_merges)
> die(_("'%s' cannot be used with 'git backfill'"),
> "--diff-merges");
> - if (revs->filter.choice)
> - die(_("'%s' cannot be used with 'git backfill'"),
> - "--filter");
> + if (!path_walk_filter_compatible(&revs->filter))
> + die(_("cannot backfill with these filter options"));
I was going to suggest that we indicate the type of object filter which
was incompatible, but that gets a little tricky if the incompatible
filter is a child of a LOFC_COMBINE filter.
Resolving that does not seem worth our while, so I think that what you
wrote here is more than sufficient.
Thanks,
Taylor |
||
| die(_("'%s' cannot be used with 'git backfill'"), | ||
| "--diff-merges"); | ||
| if (revs->filter.choice) | ||
| die(_("'%s' cannot be used with 'git backfill'"), | ||
| "--filter"); | ||
| if (!path_walk_filter_compatible(&revs->filter)) | ||
| die(_("cannot backfill with these filter options")); | ||
| if (revs->filter.blob_limit_value) | ||
| die(_("cannot backfill with blob size limits")); | ||
| } | ||
|
|
||
| static int do_backfill(struct backfill_context *ctx) | ||
|
|
@@ -108,6 +109,7 @@ static int do_backfill(struct backfill_context *ctx) | |
|
|
||
| if (ctx->sparse) { | ||
| CALLOC_ARRAY(info.pl, 1); | ||
| info.pl_sparse_trees = 1; | ||
| if (get_sparse_checkout_patterns(info.pl)) { | ||
| path_walk_info_clear(&info); | ||
| return error(_("problem loading sparse-checkout")); | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4754,7 +4754,7 @@ static int add_objects_by_path(const char *path, | |
| return 0; | ||
| } | ||
|
|
||
| static void get_object_list_path_walk(struct rev_info *revs) | ||
| static int get_object_list_path_walk(struct rev_info *revs) | ||
| { | ||
| struct path_walk_info info = PATH_WALK_INFO_INIT; | ||
| unsigned int processed = 0; | ||
|
|
@@ -4777,8 +4777,9 @@ static void get_object_list_path_walk(struct rev_info *revs) | |
| result = walk_objects_by_path(&info); | ||
| trace2_region_leave("pack-objects", "path-walk", revs->repo); | ||
|
|
||
| if (result) | ||
| die(_("failed to pack objects via path-walk")); | ||
| path_walk_info_clear(&info); | ||
|
|
||
| return result; | ||
| } | ||
|
|
||
| static void get_object_list(struct rev_info *revs, struct strvec *argv) | ||
|
|
@@ -4841,8 +4842,13 @@ static void get_object_list(struct rev_info *revs, struct strvec *argv) | |
| fn_show_object = show_object; | ||
|
|
||
| if (path_walk) { | ||
| get_object_list_path_walk(revs); | ||
| } else { | ||
| if (get_object_list_path_walk(revs)) { | ||
| warning(_("failed to pack objects via path-walk")); | ||
| path_walk = 0; | ||
| } | ||
| } | ||
|
|
||
| if (!path_walk) { | ||
| if (prepare_revision_walk(revs)) | ||
| die(_("revision walk setup failed")); | ||
| mark_edges_uninteresting(revs, show_edge, sparse); | ||
|
|
@@ -5177,7 +5183,7 @@ int cmd_pack_objects(int argc, | |
|
|
||
| if (path_walk) { | ||
| const char *option = NULL; | ||
| if (filter_options.choice) | ||
| if (!path_walk_filter_compatible(&filter_options)) | ||
| option = "--filter"; | ||
| else if (use_delta_islands) | ||
| option = "--delta-islands"; | ||
|
|
@@ -5190,10 +5196,7 @@ int cmd_pack_objects(int argc, | |
| } | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Junio C Hamano wrote on the Git mailing list (how to reply to this email): "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Derrick Stolee <stolee@gmail.com>
>
> When 'git pack-objects' has the --path-walk option enabled, it uses a
> different set of revision walk parameters than normal. For once,
"once" -> "one" (or "instance")?
> --objects was previously assumed by the path-walk API and was not needed
> to be added. We also needed --boundary to allow discovering
> UNINTERESTING objects to use as delta bases.
>
> We will be updating the path-walk API soon to work with some filter
> options. However, the revision machinery will trigger a fatal error:
>
> fatal: object filtering requires --objects
>
> The fix is easy: add the --objects option as an argument. This has no
> effect on the path-walk API but does simplify the revision option
> parsing for the objects filter.
>
> We can remove the comment about "removing" the options because they were
> never removed and instead not added. We still need to disable using
> bitmaps.
In the old code, there was a valid reason why bitmaps were not used
(i.e., "--objects" not enabled), but that no longer holds (i.e., now
we add "--objects" ourselves). Do we need to give an updated
rationale to keep bitmap disabled?
> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
> builtin/pack-objects.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index dd2480a73d..4338962904 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -5190,10 +5190,7 @@ int cmd_pack_objects(int argc,
> }
> if (path_walk) {
> strvec_push(&rp, "--boundary");
> - /*
> - * We must disable the bitmaps because we are removing
> - * the --objects / --objects-edge[-aggressive] options.
> - */
> + strvec_push(&rp, "--objects");
> use_bitmap_index = 0;
> } else if (thin) {
> use_internal_rev_list = 1;There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Derrick Stolee wrote on the Git mailing list (how to reply to this email): On 5/3/2026 8:49 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> When 'git pack-objects' has the --path-walk option enabled, it uses a
>> different set of revision walk parameters than normal. For once,
>
> "once" -> "one" (or "instance")?
Yes, "one". Sorry for the typo.
>> --objects was previously assumed by the path-walk API and was not needed
>> to be added. We also needed --boundary to allow discovering
>> UNINTERESTING objects to use as delta bases.
>>
>> We will be updating the path-walk API soon to work with some filter
>> options. However, the revision machinery will trigger a fatal error:
>>
>> fatal: object filtering requires --objects
>>
>> The fix is easy: add the --objects option as an argument. This has no
>> effect on the path-walk API but does simplify the revision option
>> parsing for the objects filter.
>>
>> We can remove the comment about "removing" the options because they were
>> never removed and instead not added. We still need to disable using
>> bitmaps.
>
> In the old code, there was a valid reason why bitmaps were not used
> (i.e., "--objects" not enabled), but that no longer holds (i.e., now
> we add "--objects" ourselves). Do we need to give an updated
> rationale to keep bitmap disabled?
>> if (path_walk) {
>> strvec_push(&rp, "--boundary");
>> - /*
>> - * We must disable the bitmaps because we are removing
>> - * the --objects / --objects-edge[-aggressive] options.
>> - */
>> + strvec_push(&rp, "--objects");
>> use_bitmap_index = 0;
>> } else if (thin) {
This old comment is perhaps confusing things. The important thing here
is to disable bitmaps with 'use_bitmap_index = 0;' (though perhaps not
for long [1]).
[1] https://lore.kernel.org/git/f50f8df01a9f216d5b4388b2fe4ff58077b574f3.1777853408.git.me@ttaylorr.com/
The path-walk API itself disables the objects walk for the revision
machinery in walk_objects_by_path():
info->revs->blob_objects = info->revs->tree_objects = 0;
This allows the path-walk API to rely on the revision walk for a
_commits only_ walk and then have the path-walk API handle the trees
and blobs.
The reason we need to add "--objects" now is to allow for parsing the
"--filter" option without the revision logic complaining.
Thanks,
-StoleeThere was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Taylor Blau wrote on the Git mailing list (how to reply to this email): On Mon, May 11, 2026 at 06:12:59PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <stolee@gmail.com>
>
> When 'git pack-objects' has the --path-walk option enabled, it uses a
> different set of revision walk parameters than normal. For once,
s/once/one/ ? Not sure.
> --objects was previously assumed by the path-walk API and was not needed
s/was not/did not/ ? Also not sure.
> ---
> builtin/pack-objects.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
Looks good.
Thanks,
Taylor |
||
| if (path_walk) { | ||
| strvec_push(&rp, "--boundary"); | ||
| /* | ||
| * We must disable the bitmaps because we are removing | ||
| * the --objects / --objects-edge[-aggressive] options. | ||
| */ | ||
| strvec_push(&rp, "--objects"); | ||
| use_bitmap_index = 0; | ||
| } else if (thin) { | ||
| use_internal_rev_list = 1; | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taylor Blau wrote on the Git mailing list (how to reply to this email):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Derrick Stolee wrote on the Git mailing list (how to reply to this email):