This post is from the CollabNet VersionOne blog and has not been updated since the original publish date.
Subversion merge reintegrate
This post covers several topics related to merging with Subversion 1.5. The goal is to explain the problem of reflective merges, how the new reintegrate option helps, and also some problems that currently exist with that option.
This is easiest to explain with an example. Suppose you are working on a feature branch copied from your trunk. During the development process you regularly merge ”all” new changes from trunk to your branch so that the branch stays in "synch" with the work occurring on trunk. When you eventually merge your branch back to trunk, that is called a reflective (or cyclic) merge. This type of merge is problematic for Subversion, let’s look at why.
Revisions and Merge Tracking
Subversion is based on revisions. Every commit to the repository creates a new revision and a merge is ultimately just a commit. Subversion 1.5’s merge tracking feature is all about recording which revisions were merged to which paths. In the previous reflective merge example, this is too coarse of a solution to always yield the right results. Recall that in our example we regularly merged changes from trunk to our branch. Each of those merges winds up as a commit (revision) on our branch. When it comes time to merge our branch back to trunk, all merge tracking can do is help decide whether to include or exclude those "synched" revisions. The problem is that neither answer is always right.
If we exclude the revisions that were merges from trunk, then we also exclude any work we did to resolve conflicts as part of those merges. Even worse, we might have carelessly committed unrelated changes as part of the merge (it happens) so those changes would also be excluded.
If we include those synched revisions, then we merge back changes that already exist in trunk. This yields unnecessary and confusing conflicts.
The only way to truly solve this is to invent a new merge algorithm in Subversion that does not rely simply on revisions. In discussing the problem it was suggested that this might require designing a new repository format as well. As part of the 1.5 development cycle some preliminary work was done to explore a new algorithm within our existing design. Ultimately we decided a new algorithm was not in scope for this release and to ship without it (the release cycle was long enough as is). We hope to revisit this work for 1.6 or a future release to see if it pans out as a solution.
Reintegrate to the Rescue
Setting aside the issue of new algorithms for a moment, it turns out Subversion already has a technique that does the right thing with reflective merges. It is the one we used prior to 1.5 and is called a 2-URL merge. Going back to our original example, suppose the last time we merged trunk to our branch, we merged all the changes up to revision 100. To properly merge our branch back to trunk, we start with a trunk working copy and run this command:
svn merge url://trunk@100 url://feature-branch .
This tells Subversion to calculate the differences between trunk at r100 and the feature-branch at HEAD, and merge those differences into our working copy. Since this is essentially just producing a diff, it includes our conflict resolution work, and does not include any changes that exist in both places. In other words, we get the result we want.
The new reintegrate option is a shorthanded version of the 2-URL merge. With it you can just run this command:
svn merge --reintegrate url://feature-branch .
Internally, when you use this option, it calculates the url://trunk@100 part and then executes the EXACT SAME merge API that the 2-URL merge does. This is important to remember when discussing some of the problems with this new option, because you can generally use the old 2-URL syntax to resolve the problem. In other words, reintegrate is just a new syntax plus some safety checks (more on this later). If these checks fail and cannot easily be corrected, then you can use the older syntax.
One big problem in general with the 2-URL merge process is that it will do whatever you tell it. You can very easily make mistakes that the command won’t catch. Suppose we left off the @100 in our example and that r101 and r102 were committed to trunk since we last synchronized with it. When the 2-URL merge runs, it calculates the differences between trunk@HEAD and branch@HEAD. To the merge process, it would look like the branch removed the changes that happened in r101 and r102 and the merge would remove them from your working copy! Depending on the change in those revisions, you might not notice this and wind up committing the removal of those changes as part of the merge. There are a lot of other variations on this problem (such as getting the URL wrong), but most of those problems are at least more obvious and you could just revert the merge.
Because of this problem with the 2-URL merge process, when the reintegrate option was added we decided it should do some safety checks before it will run. Some of these are fairly tame, such as making sure the working copy is at a single revision, with no switched children, and is not a sparse checkout (i.e. working copy depth is infinity). If reintegrate errors out because of any of these problems, you can generally just use svn update or switch to "fix" your working copy. Since reintegrate needs to calculate the base URL and revision for the merge, it also does an ancestry check to ensure the merge source is related to the merge target.
The most controversial reintegrate check is that the merge source does not have any subtree mergeinfo. Mergeinfo (technically the versioned property svn:mergeinfo) stores merge tracking information. Normally mergeinfo is set only on the merge target. Subtree mergeinfo occurs when a merge target has some subtree that was previously a merge target itself (e.g. we merged from a file in trunk to a file in our branch, creating mergeinfo on that file. This type of merge is fittingly called a subtree merge). When reintegrating the branch back to trunk the reintegrate safety checks fail because the branch has subtree mergeinfo. The reason for this check is that a 2-URL merge in this situation doesn’t always give the right results.
Problems with Reintegrate and Renamed Files
Most of the problems with reintegrate stem from this check for subtree mergeinfo. When this check fails, the error you see on the command line is something like this:
svn: Cannot reintegrate from 'url://feature-branch' yet:
Some revisions have been merged under it that have not been merged
into the reintegration target; merge them first, then retry.
The biggest problem is that, unlike when the other checks fail, both this problem itself and the way out of it are less obvious. If you really did some subtree merges then these checks save you from making a mistake. Unfortunately there is a very common case people are running into where it only ”appears” like a subtree merge was performed: When you rename/move something, mergeinfo is created on the new path and this blocks reintegrate from working. The reasons why copy/move create mergeinfo are beyond the scope of this post, but suffice it to say that we think in the majority of cases the mergeinfo needn’t be created. So the fix we will look at (hopefully in time for 1.5.1) is to be smarter about when copy/move creates mergeinfo. If those subcommands minimize their creation of mergeinfo, then that would greatly reduce the occurrence of this specific problem when using reintegrate (at least for your future copy/move operations).
If you are running into this problem today, there are a couple of things you can do.
- You might want to just manually remove the subtree mergeinfo. You can do this by running the command:
svn propdel svn:mergeinfo FOO
You could run this command after the move, and before you commit, so that the problem never exists in the first place. Or it can be run later to fix the problem after you realize you have it.
Generally, the only time mergeinfo needs to be created is when copying/moving a path such that its nearest parent with mergeinfo is different in the path’s source and destination. This is because a path without explicit mergeinfo (when a path has the svn:mergeinfo property set on it it is said to have explicit mergeinfo) simply inherits the mergeinfo of its nearest parent with explicit mergeinfo. In other words, the svn:mergeinfo property is inheritable and if a copied path inherits mergeinfo from the ”same” place in both its source and destination, then there is nothing gained by setting explicit mergeinfo on the destination and you can safely delete that mergeinfo.
In many cases, all mergeinfo exists on the root of your branch. If you move a path within the branch, then its nearest parent with mergeinfo (the root of the branch) does not change and the path does not need new mergeinfo. If however, you previously did a subtree merge and then move a path from outside the subtree into the subtree (or vice-versa) then the path inherits different mergeinfo. In this case the destination path should keep its explicit mergeinfo.
- You can also use the old 2-URL merge syntax in this situation.
Additional Reintegrate Problems
Closely related to the previous problem is when subtree mergeinfo is merged into branches or carried into branches when the branch is created (rather than being created in the branch as above). If you rename a path in trunk (which currently creates mergeinfo), every branch you then create from trunk from then onward also has this subtree mergeinfo. Alternatively, if you copy trunk to a branch, then copy a path within trunk (again creating mergeinfo), then synch up your branch, then the branch get’s subtree mergeinfo merged into it. In either case this effectively means you can never use reintegrate, at least not unless you can delete the svn:mergeinfo property from the subtree’s that have it.
So in addition to being smarter about creating the mergeinfo, we need to examine whether reintegrate can safely ignore subtree mergeinfo in some cases and allow the reintegrate merge to proceed. For example, what if we create a branch from trunk and some subtree mergeinfo comes into the branch when it is created. As long as the subtree mergeinfo that exists on the branch remains equivalent with what is in trunk, it’s probably safe to allow a reintegration merge. Of course these sorts of checks start to get real complicated so we are less sure there are easy solutions here, but hopefully there are some that might make their way into 1.6. One solution that might be worth considering is whether we ought to do this check in the first place?
Branch Management and Reintegrate
An important thing to point out, and I am not sure if the documentation currently does, is that once a branch is reintegrated, it should really be deleted. If more work needs to happen, create it again with a fresh copy. There are two reasons for this:
- Remember our example with reflective merges? Once you merge your branch to trunk, if you later want to synch all the changes with trunk back to the branch, you are now in this same reflective merge scenario with your branch! You can no longer use the simple "svn merge url://trunk" syntax you were using previously so you have to use 2-URL merges to get your trunk changes into your branch.
- If you do not merge trunk changes back to your branch because of the previous point, then when you go to reintegrate again, it will still see r100 (to use our example), as the last trunk revision merged to the branch. So when it diffs trunk@100 with your current branch, it will not "see" that in r103 you merged the branch to trunk and it will try to merge everything again.
The bottom line is that if you recreate your branches you will not run into this. svn delete followed by svn copy runs fast, and does not take up any repository disk space. You do not even have to do anything special to your working copy. So just do it!
This post has largely been focused on some problems we have been seeing and were aware of when we released 1.5. I wanted to get this post out so that we can aid users in understanding the root of the problems. That said, I do not want to leave you with an impression of "doom and gloom". The fact is that overall the new 1.5 merge process works quite well and is a substantial improvement over previous versions. And while the current reintegrate solution for reflective merges can only take us so far without a new merge algorithm, if you understand the process, you can develop a workflow that largely avoids the noted limitations. There are also a few other other merge bugs we didn’t cover here but fixes for these are targeted for the the upcoming 1.5.1 release (which is tentatively scheduled for release late this month).
It is also worth noting that now that merge tracking is "in the wild", the development community has real world data and use cases to work with, making it a lot easier to evaluate new algorithms and bug fixes. This should help us get fixes out on the 1.5.x line more quickly. The lack of large, real world repositories using merge tracking was a problem during the development of 1.5.0 as those simply did not exist (outside of Subversion’s own repository).
Finally, I just want to point out that I have turned off comments on this post. I suspect this post will raise a number of questions and I’d just prefer to handle these in a proper discussion forum. I have created a thread to post comments and questions related to this post in the Subversion forum on openCollabNet. Please post your comments and questions to that thread.