This post is from the CollabNet VersionOne blog and has not been updated since the original publish date.
Sparse Directories in Subversion 1.5
The last few weeks we blogged a lot about the Merge Tracking feature in Subversion 1.5. Of course there are several other great new features coming. Let’s look at what else is new.
The Subversion 1.5 release notes (which are not final of course) mention these new features for 1.5:
- Merge Tracking
- Sparse checkouts
- WebDAV transparent write-through proxy
- Cyrus SASL support for ra_svn and svnserve
- Copy/move improvements: peg revisions, ‘svn mv file1 file2; svn mv file2 file3’, ‘svn cp *.c dir’
- Cancellation improvements
- Changelist support
- FSFS sharding
- Command-line client improvements
- JavaHL bindings improvements
- Many improved APIs
If you are new to this blog and want to find out more about Merge Tracking, check out our Merge Tracking Early Adopter Program. Over the next few weeks we’ll blog about some of the other new Subversion features. Not about everything though, other people are blogging about 1.5 as well and we’ll link to them. For example: Malcolm Rowe blogged about mod_dav_svn improvements, tree-structured FSFS repositories and backing up FSFS repositories, Subversion 1.5 style.
In this post we’ll talk about sparse directories.
When you first checkout a Subversion repository, or a directory within that repository, you get the whole directory with everything underneath it. In large projects that can be a problem because the files are copied over the network and that can take some time. This is very contrary to how Subversion subsequently sends small deltas over the network with minimal use of network resources. Also, do you want all these files cluttering your disk?
Subversion 1.5 introduces Sparse Directories, giving you more control over what to checkout and how svn update works. You can read the specs here. But I don’t learn from reading, I need to play. So let’s play (but let’s be mindful of the fact that SVN 1.5 is not feature complete yet).
I started with downloading the Subversion 1.5 pre-release binaries from the Merge Tracking Early Adopter Program and setup a repository:
- Download the binaries (Windows in my case).
- Copy .exe and .dll files to c:svn and add c:svn to %PATH%.
- Create repository (svnadmin create repo) at c:svn (repository is c:svnrepo).
Download the repository dump file that comes with the Merge Tracking beta.
- Load the dumpfile (svnadmin load c:svnrepo < c:svnmergetracking.dump).
- Create directory for working copy.
- Checkout the repository (svn checkout file:///c:/svn/repo/trunk)
The main directory of the trunk of the repository contains one file and a few sub-directories:
Suppose I’m another developer (I mimicked that with working copy wc2) and I have no need for the subdirectories. With current releases of SVN, I can use the -N switch (for: non-recursive) to checkout only the main directory of the trunk and not the sub-directories:
I now have the topmost directory of trunk and the file in it, but not the sub-directories. For a big repository, this will reduce the amount of time for the checkout, and keep your working copy cleaner and subsequent updates will only pull in files added to the directory you checked out. But –N is all the control you have.
SVN 1.5 will be more flexible by making the -N switch redundant and replacing it if for a –depth option. According to the spec, the possible values for –depth are:
- –depth=empty: Updates will not pull in any files or subdirectories not already present.
- –depth=files: Updates will pull in any files not already present, but not subdirectories.
- –depth=immediates: Updates will pull in any files or subdirectories not already present; the subdirectories will have depth=empty.
- –depth=infinity: Updates will pull in any files or subdirectories not already present; the subdirectories will have depth=infinity. Equivalent to today’s default update behavior.
Now I am developer number 3 with wc3 as working copy. Let’s checkout the repository with -–depth=empty:
A trunk directory was added to wc3 as well as the .svn administrative directory (in other words: a real working copy was created) but other than that, the trunk is empty.
I added a file (test1.txt) and a subdir to the repository using wc2. According the spec, if I now do svn update, no files should be added to wc3:
Well, no surprises: no files.
Let’s create a file in wc3, add and commit it (test2.txt). Then back to wc1, svn update and make some changes to it. Also add a file test3.txt, commit. Now update wc3.
Test2.txt has the change, but no test3.txt was added to wc3 (remember: –depth=empty means svn update will only pull in files or subdirectories already present).
Now let’s play with –depth=files.
All files from the top directory of the trunk are pulled in, but not the sub-directories. Now I added test4.txt and test5.txt to the trunk of wc1 and added a directory with a file in it, then committed.
An svn update on wc4 pulls in the test4.txt files and test5.txt but not the sub-directory and the file in it.
At this point I had a question. Suppose I only want test4.txt. I’d like an option to just pull in test4.txt and not get test5.txt as well. Does Subversion 1.5 support that?
Let’s play with –depth=immediates. I guess “immediates” means “neighbors”: if you are a file in the same directory where I am, you are a neighbor, if you are a subdirectory in my directory then you are a neighbor too, but if you are a file or directory inside a subdirectory, then you are not an immediate neighbor. Let’s try that.
I created wc5 with svn checkout file:///c:/svn/repo/trunk –depth=immediates
It checks out the files in trunk, the subdirs but not the files in these subdirs. Next, in wc1 I added test6.txt to the highest directory of the trunk, a subdir and test7.txt in that subdir.
The new file in the main directory of the trunk and the subdirectory come in, but the file in the subdirectory does not, that directory is in fact created at –depth=empty.
The last value of –depth is “infinity”
svn checkout file:///c:/svn/repo/trunk –depth=infinity
It is the same as svn checkout file:///c:/svn/repo/trunk. So, why have it? Consistency between the commands. The –depth option does not only apply to checkout but also to update. For instance, if you have checked out a directory at –depth=empty you can still update it with everything that is in the repository by using:
svn update –depth=infinity
Oh, so that did not work. Well, the beta of SVN 1.5 from the Merge Tacking Early Adopter Program is a few weeks old and apparently updating at deeper depths is fixed by now (we’ll update the beta soon).
This did work though in the build I used:
The other way around also works. I created wc7 and did a checkout with depth infinity and then used wc1 to commit a file and directory with file in it (you know the drill by now). Updating wc7 like this:
svn update –depth=immediates
includes the file and the subdir (the immediates), but not the file in that subdir.
There are a couple of other commands affected by sparse directories:
- svn info will list the depth of the working copy.
- svn switch can take the –depth option and updates the url that the working copy points to with the depth you specify.
- svn status will only report the status of files and directories within the depth specified. E.g.: if you create a subdir and a file in it, then svn status –depth=immediates only reports on the new directory (a neighbor), not the file in the subdir.
- SVN help commit tells me that commits will also take –depth and allow you to e.g. only commit files in the current directory: svn commit –depth=files. It does not work yet but that is a defect.
Let me close with a note on compatibility. Subversion client and server release levels could be an issue with sparse directory support. However, I understand that by the time Subversion 1.5 releases the client will be clever enough to figure out what to ignore. In other words, if the server is pre-1.5 it will throw the whole kitchen sink with all the dirty dishes at the client but the client will determine what it needs for the checkout. Sparse directory support will work just fine.
What’s a use case for sparse directories? Checking out a single file (with –depth=empty and then doing an update on the one file you want) will certainly be useful for many people. And here is a use case that I will use. We use Subversion to manage the html content on openCollabNet. The pages are distributed over a number of repositories on a single the server (each project has its own repository). Within a repository I typically manage the top directory and might give subdirs to others, for instance to store binaries for download, articles or other stuff. By checking out the repositories with –depth=files, I’ll only get what I need, not the files that other people manage in their subdirectories.
Trivia: when were sparse directories first discussed?
Go here for the answer.