This post is from the CollabNet VersionOne blog and has not been updated since the original publish date.
Multiple Subversion repositories?
On Wednesday CM Crossroads and CollabNet hosted a webinar: Subversion in the Enterprise, presented by C. Michael Pilato and Bob Jenkins from CollabNet plus Terrence Cordes, SCM manager at Reuters. Terry gave some great insights into deploying Subversion across global teams; I’ll post a link to a recording of this webinar here soon. Because the presenters only got to a few of the questions that were asked, we will answer some of the remaining ones in this blog over the next few weeks. Here is the first one, asked by a couple of people:
Do you recommend multiple repositories for distributed teams due to WAN performance?
Subversion itself does not support synchronized repositories that are concurrently used for read and write. Subversion uses one central server. When it was designed, the WAN was kept in mind and networking with low bandwidth requirements is built into the system.
Subversion’s working copy model means that the developer works on his or her code without needing to be connected to the server. You only need a connection with the server in a few cases, for instance for a commit or when updating your local working copy with changes from the repository. When data is sent back and forth, only differences are sent.
Mike Pilato actually touched on this during the webinar. If you make a small change to a large file and commit that change, only the change is sent across the network, not the entire file. This minimizes band-width requirements. Subversion only needs to send the entire files across the wire the first time a developer checks out the repository.
The conclusion is that WAN performance is not an issue when considering Subversion, assuming your network is reliable.
Subversion does have some support for multiple repositories. With version 1.4 svnsync was introduced. This utility lets you replicate your repositories into any number of read-only copies. There are several usage models for this, with back-ups being the most common.
But there are other usages as well. For example, at EclipseCon I met some people from the Philippines who were asking about using multiple repositories to get around network downtime (we’ve all heard about the recent big internet outage in Asia). Their development team is in Los Angeles but build and test happens in the Philippines. This company can set up a main repository with read-write access in the US and use svnsync to make a remote copy for the build and test team. Should the international network go down, they can access the local read-only copy to make a build.
You can find out more about svnsync by typing “svnsync help” at the command prompt or check the online version of Version Control with Subversion. The authors are updating this book for release 1.4 and have a chapter on svnsync (I cannot give you the exact url of the svnsync chapter, due to daily builds the url changes all the time)
If you want to use Subversion and really need multiple read-write repositories, there is a solution: svk (its primary author is Chia-liang Kao). svk is a decentralized version control system built on top of Subversion. It supports things like repository mirroring and disconnected operation. Some people will prefer this but before you choose a distributed repository solution make sure you really need it. It does have some advantages for developers if they are often working disconnected from the network, for example: like Subversion they can work disconnected but additionally they can commit to a local repository. However, it can come at the cost of higher administration overhead, fatter bandwidth requirements and more server infrastructure. Subversion’s centralized model is easier to deploy and maintain and, if you don’t need a distributed model, will have lower cost of ownership.