This post is from the CollabNet VersionOne blog and has not been updated since the original publish date.
How Subversion conserves disk space
I wanted to share something from our March openCollabNet Technical Newsletter. If you do not get our newsletter yet, sign up for openCollabNet. It only takes a minute.
To keep the size of the repository as small as possible, Subversion uses deltification, also called "deltified storage". Deltification is the encoding of a chunk of data as a collection of differences against some other data. If two files are very similar, deltification results in storage savings because only the changes are stored, not the entire file.
This works differently depending on what filesystem back-end you use. In BDB (Berkeley Database) fulltexts are found at the tips of each distinct line of a file’s history. When a change occurs, the new version is stored as fulltext, then the previous version is rewritten as a delta against that new version. FSFS stores deltas in the opposite direction so that old versions never need to be rewritten. When a file is changed, the new version is stored as a delta against an older version.
Most source code files change frequently and Subversion’s performance would degrade if it had to use every individual delta to re-create a file that has changed many times. Subversion uses "skip-deltas" to improve performance. Skip-deltas are deltas that are calculated not against the immediate next or previous version, but against a version that’s closer in the chain of deltas to a fulltext representation of the file. This way the version of a file can be re-created using less deltas than when a delta for each individual change would be needed.
For repositories created with Subversion 1.4 or later, space savings increase further because the delta chunks are stored using a compression algorithm.