This post is from the CollabNet VersionOne blog and has not been updated since the original publish date.
Deconstructing a Git Branch — A Guided Tour (Part 4 of 5)
Today, we continue the saga of Deconstructing a Branch in Git. Last time out, I took a survey of graphical tools, and concluded that the only one likely to help in this particular situation is gitk. As I summarized,
Unfortunately, though it’s very powerful, it’s also quite overwhelming: there’s a lot of power here, but more investment in making it available than making it accessible.
So before I dove into the actual work, I spent some time figuring out how to find my way around. Also, as mentioned last week, these examples will use a repository full of nothing in particular, and you might find int useful to download the same repo (along with its GPG signature), so you can play along.
Where to begin?
As with many tools, you begin with gitk by establishing some context: a repository, a checkout, and a working directory. Most Git commands assume, at least by default, that they should operate on “the repo/checkout/working-directory right here.” If you’re using a general GUI tool, like say TortoiseGIT, “right here” means the directory contents shown on your screen. If you’re using the command-line tools, then you use your shell to “cd” into the appropriate spot.
Having established your “right here,” launch gitk. From the shell command line, just type “gitk” and press return. What you get is something like this:
You might first want to play around a bit with the window controls, and in particular notice that both the main horizontal divider, and all the vertical ones, can be dragged to relay the window. Here’s the same window again, but with what might be a more accessible allocation of the space:
Now, what are all these sections?
- The upper-left corner displays the version tree (what I call the “tree view”), including branches (places where the tree splits) and merges (places where two branches come together). It’s organized newest-on-top, and it contains all the history that led up to the particular files-and-versions that were “right here” when the tool was launched–and only that history: changes that haven’t contributed to your chosen “right here” aren’t there. This area will be the primary focus for the real work of branch deconstruction, and we’ll get back to here soon, but first let’s look the rest over.
- The upper-right area is line-for-line matched with the versions listed in the upper left, showing the author and date of each change. You can use this as a supplemental control, by clicking on an author or date to select a version in the tree view.
- Area 3 shows the SHA1 hash ID of the selected version. When gitk is launched, that will l be the HEAD of the branch you have checked out, but if you click around in the tree view a bit you’ll see this field change to follow (in fact, nearly everything in the display tracks the tree view in this way). If you’ve worked with Git at all, you’ve noticed these hashes are central to the user interface (and guessed that they’re central to the implementation as well).
- Area 4 is a tool that searches the commit logs. Try typing “conflict” into the large text field there. You’ll see that several rows in the tree view turn bold–the ones with “conflict” in their description. You can now click one of those to focus on that change, or use the “next” and “prev” buttons to select among the matching ones. So if you’re trying to find the commit that broke the log in feature, then you can just search for “break log in,” right? Well, maybe not … but you get the general drift. In my data set, the commit logs aren’t very informative, but try searching for “merge” or “conflict”.
- Area 5 is another search widget, but this one searches the text currently displayed in area 6. This won’t switch you to new versions, only navigate around in the display of a particular change. So, for example, once you’ve used the Area-4 search to find the change containing the log text “break log in,” you can use the Area-5 search to find the comment that was inserted into the text, “This breaks log in.” Or, in my data set, you might search each change to see what changes it made in a particular file. The files, in this repo, have names which are Greek letters spelled out in English: alpha, beta, gamma, delta, and so on, so try searching various changes for iota or mu or omega.
- Area 6 displays the patch that constitutes a particular change. There are some controls for how to display it, and how much of it to show. There’s also…
- Area 7, which chooses between having Area 6 do what I just said, or just having it display the contents of each file in the working copy, regardless of whether it happens to be affected by this change.
Where’d everybody go????
Let’s go back to Area 1, the tree view:
I’ll include the SHA1 hashes as annotations; if you click around in your own gitk and get lost, just paste these into the Area 3 ID field.
Line 1, SHA1 485e478cb2798cbde8e85b8857e7d0328832ddc6
Recalling that this is listed newest-first, and from a branch br2 “right here,” we can immediately understand
- Why the first line is first (it’s the last change contained in the history)
- Why it’s labelled br2 (that’s the branch where the change was made, as you can see from the comment)
Line 2, SHA1 fe4336f41dd739e973941525470e3f9b13ce9210
The second change adds a slight nuance: it’s second because it’s the second-latest change contained in the history (it was merged to br2, our “right here”), and it’s labelled br1 because that’s where the change was made. But the nuance: this is not necessarily the latest change that has been made on br1: there may be later changes that have not yet made it to br2. They aren’t shown, because our “right here” is the HEAD of br2. Only changes that are contained in “right here” show up in this display.
Line 5, SHA1 b26237b43b7aca9f905576a2b925328450d87da3
Skipping ahead just a bit, look at the line with the “master” label. That’s a change that was made on the master branch, and it’s the latest master-branch change that’s present in our “right here” anchor point, which all makes it very familiar. But what is that little dot doing on the left-most line? I thought that was the br2 line! Is gitk confused?
Well, no. But it’s telling you either a bit more, or a bit less, of the truth than you were expecting to hear. There are two things contributing to this surprise display.
The “Branch” fantasy
We tend to think of a “branch” as something solid and persistent, a steady presence we can rely on. The truth, however, in any versioning system, is a bit different: the contents of “the branch” at any moment may consist of work that was originally done on many branches. This is the goal and point of merging, after all. There’s a conventional fiction inherent in ideas like “the changes contained in a branch.” While at any moment in time it’s unambiguous to talk about “the branch,” digging back into its past will lead you to a complex tree of changes (actually, a “Directed Acyclic Graph,” or DAG, but let’s not quibble). It may lead you into and out of many branches. It’s completely answerable “which change was made on which branch,” but it becomes very mystical to talk about “which change is on which branch”: a given change will often be on many branches. In Git terms: at any moment in time, the HEAD of any given branch is clear, but the branch-based identity of any given change may be elusive.
Most version control systems make some effort to represent the branch in some persistent way, which is nice for the humans trying to make sense of it all. But, in any version control system, the notion of a linear, stable, solid “branch” is a synthetic fantasy. And worse than that: in order to provide this comforting fantasy, branch-perserving versioning systems have to downplay or obscure some information, such as the original authorship of a given change: during a merge, a new “version” is created for the notion of “add this to the branch,” and the change comes to be “owned” by whoever did the merge, rather than whoever did the original work.
Git doesn’t work that way. A given change, regardless of its peregrinations through the branches, retains its original identity, authorship, date, and so on. This is very nice, from the perspective of properly attributing contributions to a large open-source project like Linux (the design target for Git). But from the standpoint of “people who think in terms of branches,” it gets a bit awkward. In this case, gitk is telling you the whole, unvarnished truth: There Ain’t No Such Things As Branches. But I could wish it would take the time to pretend there were ….
The Missing Merges
But gitk is also telling you a bit less than it could. In the Area 1 display, you can see two kinds of changes: plain old changes (with commentary like “make some changes”), and merges that resulted in conflicts that had to be fixed (“conflict resolution”). But if you’d been watching over my shoulder as I did all this, you’d notice that there were often changes of a third type: conflict-free merges (which Git calls “fast-forward” merges). A fast-forward merge occurs when the two branches to be merged are identical except for some number of additions on only one branch. For example,
- Sprout branch-one off of master
- Change branch-one (and don’t change master)
- Merge branch-one back into master
Since no changes were made to master, this is really easy to do: just apply those step-2 changes to master in the same way as they were applied to branch-one. This is a fast-forward merge.
Gitk is not showing the fast-forward merges, but they’re hiding there, wherever the branch picture gets confusing.
Coming up next
All this logic-chopping and nuanced reading of surprising displays will become crucial to the process we came here to do: deconstructing a polluted branch and reconstructing the branch as it ought to be.
How ’bout you?
Do you have any experience with gitk, or other GUI tools for managing complex Git branch structures? Can you fill in any of the holes in my understanding? Let us all know, in the comments!
Follow my series by clicking below: