Skip to content

Use git ls-files -s instead of ls-tree for full-tree enumeration#2013

Draft
tyrielv wants to merge 1 commit into
microsoft:masterfrom
tyrielv:tyrielv/ls-files-optimization
Draft

Use git ls-files -s instead of ls-tree for full-tree enumeration#2013
tyrielv wants to merge 1 commit into
microsoft:masterfrom
tyrielv:tyrielv/ls-files-optimization

Conversation

@tyrielv

@tyrielv tyrielv commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

When no previous commit exists to diff against (sourceTreeSha == null), DiffHelper.PerformDiff runs git ls-tree -r -t HEAD to enumerate all blobs and trees. This walks every tree object — very slow on large repos.

Replace with git ls-files -s, which reads the git index instead of walking tree objects. The index is already materialized in GVFS-mounted repos, making this significantly faster.

The optimization only applies when the target tree matches HEAD (i.e., the index reflects the tree we need). This is always the case for gvfs prefetch, which resolves HEAD as its target (PrefetchVerb.LoadBlobPrefetchArgsRevParse(HEAD)). For other callers like FastFetch force-checkout (which can target a non-HEAD commit), the code falls back to ls-tree to preserve correctness.

Benchmark (repo with ~2.5M files)

Approach Time Speedup
git ls-tree -r -t HEAD (before) ~24s baseline
git ls-files -s (after) ~6.5s 3.7×
libgit2 in-process tree walk ~8.2s 2.9×
libgit2 in-process index read ~12.9s 1.9×

Also benchmarked libgit2 alternatives: in-process recursive tree walk (2.9× faster than ls-tree) and in-process index read (1.9× — marshaling overhead). git ls-files -s was the fastest and simplest option.

Changes

  • GitProcess.cs — New LsFilesStaging() method that runs git ls-files -s
  • DiffTreeResult.cs — New ParseFromLsFilesStagingLine() parser for the <mode> <sha> <stage>\t<path> format
  • DiffHelper.csPerformDiff now uses ls-files -s when sourceTreeSha == null and targetTreeSha matches HEAD's tree (verified via libgit2). Falls back to ls-tree otherwise.
  • DiffTreeResultTests.cs — 7 new unit tests for the parser

Safety

  • TargetMatchesHeadTree() resolves HEAD's tree SHA via libgit2 and compares to the requested targetTreeSha. Only uses the index-based path when they match.
  • Falls back to ls-tree if the index is unavailable, HEAD can't be resolved, or the target differs from HEAD.
  • ls-files -s only returns file entries (not tree entries). Tree entries from ls-tree were only used for directory creation, which FlushStagedQueues handles from file paths anyway.

@tyrielv tyrielv force-pushed the tyrielv/ls-files-optimization branch from 6976987 to b10a10b Compare June 9, 2026 20:33
When no previous commit exists to diff against (sourceTreeSha == null),
DiffHelper.PerformDiff previously ran 'git ls-tree -r -t HEAD' which walks
all tree objects. On a large repo with ~2.5M files, this takes ~24s.

Replace with 'git ls-files -s' which reads the index instead of walking
tree objects. Benchmarked at ~6.5s on the same repo — a 3.7x speedup.

The optimization is only applied when targetTreeSha matches HEAD's tree,
since ls-files reads the index (which reflects HEAD). When they differ
(e.g., FastFetch checking out a non-HEAD commit), falls back to ls-tree
to preserve correctness.

Also falls back to ls-tree if ls-files fails (e.g., index does not exist
on fresh git init before first checkout).

Assisted-by: Claude Opus 4.6
Signed-off-by: Tyrie Vella <tyrielv@gmail.com>
@tyrielv tyrielv force-pushed the tyrielv/ls-files-optimization branch from b10a10b to d4988aa Compare June 9, 2026 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant