Description
It should be possible for various sub-types of RevCommit and RevWalk to access and use the information stored in the commit graph. This is mostly relevant for EGit's SWTWalk and SWTCommit. These types cannot currently use the information (see additional context for why) from the commit graph, meaning that for example:
- An incremental implementation of the topological sorting akin to git cannot use commit graph generation numbers to produce commits incrementally. The entire history must be processed first
- The Changed Path Bloom filter cannot be used and a TreeWalk is performed for every commit in
TreeRevFilter.include()
both leading to a bad performance of EGit's History View
Motivation
Dicussion #243 describes the performance problems we are currently facing in regards to EGits History View when viewing the changes of the currently selected resource. I am currently working on improving the performance of JGit's topological sorting by adding an incremental implementation similar to git itself (here).
During Development I noticed, that even though I generated a commit graph (git commit-graph write --changed-paths) and enabled commit graphs in the git config (core.commitGraph), my implementation of git's topological sorting algorithm (which uses generation numbers obtained from the commit graph) only ever saw Constants.COMMIT_GENERATION_UNKNOWN for getGeneration() (since the data from the commit graph is never parsed for SWTCommits).
Additionally the Changed Path Bloom filter was never used and a Tree Walk was performed for every commit.
Alternatives considered
One possibility is replacing the default implementation in RevCommit with concrete lookups in the commit graph, this has the following downsides:
- A new
CommitData object would be instantiated for every invocation of getGeneration() and getChangedPathFilter()
- Even if commit graph data is available, any
RevCommit that is not a RevCommitCG would still be parsed from the commit object instead of the commit graph (See RevCommit.parseCanonical() and RevCommitCG.parseInGraph())
My workaround for the incremental topological sorting algorithm is to query the RevWalk.commitGraph() if RevCommit.getGeneration() does not return a usable value: https://github.com/vi-eclipse/jgit/blob/06b19880b694b20a7e76f388345a80fc64243359/org.eclipse.jgit/src/org/eclipse/jgit/revwalk/TopoSortPendingGenerator.java#L287-L305
This approach is not possible for ChangedPathTreeFilter.shouldTreeWalk(), since it cannot access the package-private RevWalk.commitGraph()
Additional context
RevCommit currently offers two APIs that provide information stored in the Commit Graph:
getGeneration() for determining the Generation Number of a commit
getChangedPathFilter() for retrieving a bloom filter for the modified paths
RevCommit itself always returns Constants.COMMIT_GENERATION_UNKNOWN for getGeneration() and null for getChangedPathFilter(), these Methods are only overridden in RevCommitCG. RevCommitCG instances are created by a RevWalk when a position in the commit graph can be found for the desired commit.
|
private RevCommit createCommit(AnyObjectId id, int graphPos) { |
|
if (graphPos >= 0) { |
|
return new RevCommitCG(id, graphPos); |
|
} |
|
return new RevCommit(id); |
|
} |
However, this only applies when strictly using
RevWalk, when using a subtype of
RevWalk (i.e.
GitHistoryWalk), none of the information present in the commit graph is used, even though it could be used to optimize both topological sorting and filtering by Changed paths.
Subtypes of
RevCommit, which are located in a different package (such as
PlotCommit), cannot currently override and implement
getGeneration() and
getChangedPathFilter() because:
RevCommit.getGeneration() has package-private visibility
- Generation Numbers (and other information from the CommitData) for
RevCommitCG are parsed by overriding the RevCommit.parseCanonical() method, which also has package-private visibility
- Even if subtypes of
RevCommit override and implement getChangedPathFilter(), they cannot retrieve the Commit Graph via the RevWalk.commitGraph() method, since it has package-private visibility
Description
It should be possible for various sub-types of
RevCommitandRevWalkto access and use the information stored in the commit graph. This is mostly relevant for EGit'sSWTWalkandSWTCommit. These types cannot currently use the information (see additional context for why) from the commit graph, meaning that for example:TreeRevFilter.include()both leading to a bad performance of EGit's History View
Motivation
Dicussion #243 describes the performance problems we are currently facing in regards to EGits History View when viewing the changes of the currently selected resource. I am currently working on improving the performance of JGit's topological sorting by adding an incremental implementation similar to git itself (here).
During Development I noticed, that even though I generated a commit graph (
git commit-graph write --changed-paths) and enabled commit graphs in the git config (core.commitGraph), my implementation of git's topological sorting algorithm (which uses generation numbers obtained from the commit graph) only ever sawConstants.COMMIT_GENERATION_UNKNOWNforgetGeneration()(since the data from the commit graph is never parsed forSWTCommits).Additionally the Changed Path Bloom filter was never used and a Tree Walk was performed for every commit.
Alternatives considered
One possibility is replacing the default implementation in RevCommit with concrete lookups in the commit graph, this has the following downsides:
CommitDataobject would be instantiated for every invocation ofgetGeneration()andgetChangedPathFilter()RevCommitthat is not aRevCommitCGwould still be parsed from the commit object instead of the commit graph (SeeRevCommit.parseCanonical()andRevCommitCG.parseInGraph())My workaround for the incremental topological sorting algorithm is to query the
RevWalk.commitGraph()ifRevCommit.getGeneration()does not return a usable value: https://github.com/vi-eclipse/jgit/blob/06b19880b694b20a7e76f388345a80fc64243359/org.eclipse.jgit/src/org/eclipse/jgit/revwalk/TopoSortPendingGenerator.java#L287-L305This approach is not possible for
ChangedPathTreeFilter.shouldTreeWalk(), since it cannot access the package-privateRevWalk.commitGraph()Additional context
RevCommitcurrently offers two APIs that provide information stored in the Commit Graph:getGeneration()for determining the Generation Number of a commitgetChangedPathFilter()for retrieving a bloom filter for the modified pathsRevCommititself always returnsConstants.COMMIT_GENERATION_UNKNOWNforgetGeneration()andnullforgetChangedPathFilter(), these Methods are only overridden inRevCommitCG.RevCommitCGinstances are created by aRevWalkwhen a position in the commit graph can be found for the desired commit.jgit/org.eclipse.jgit/src/org/eclipse/jgit/revwalk/RevWalk.java
Lines 1772 to 1777 in 42f6237
However, this only applies when strictly using
RevWalk, when using a subtype ofRevWalk(i.e.GitHistoryWalk), none of the information present in the commit graph is used, even though it could be used to optimize both topological sorting and filtering by Changed paths.Subtypes of
RevCommit, which are located in a different package (such asPlotCommit), cannot currently override and implementgetGeneration()andgetChangedPathFilter()because:RevCommit.getGeneration()has package-private visibilityRevCommitCGare parsed by overriding theRevCommit.parseCanonical()method, which also has package-private visibilityRevCommitoverride and implementgetChangedPathFilter(), they cannot retrieve the Commit Graph via theRevWalk.commitGraph()method, since it has package-private visibility