This post is from the CollabNet VersionOne blog and has not been updated since the original publish date.
Subversion’s Operational Logging: What It Is, and What It Ain’t
Subversion’s operational logging feature was introduced in Subversion 1.3 as a way to allow mod_dav_svn to record one-line summaries of the high-level Subversion operations that it serviced. Similar functionality was added to svnserve in Subversion 1.6. Originally, the feature was desired not because there was an absence of server-side logging, but because the logging provided by Apache was just too low-level for most folks to understand. The underlying protocols in use by mod_dav_svn (HTTP+WebDAV) are not Subversion-specific, or even Subversion-esque. The protocol is stateless, generic, chatty, and more-or-less incomprehensible to a person whose title tends more towards “Subversion Administrator” than “HTTP/WebDAV Server Administrator”. Operational logging was intended to help that person make sense of Subversion’s WebDAV traffic.
Over the past few years, I’ve witnessed no small amount of confusion about what this feature really does and does not provide. The primary cause of the confusion often boils down to some simple misunderstandings about how the feature accommodates Subversion’s WebDAV interactions, and the limitations of its design. I hope to clarify some of that in this post.
A Subversion client (just as any other WebDAV client) typically communicates with Apache/mod_dav_svn not by using some magic, single request that represents “a high-level operation”, but via a whole series of requests, each designed to accomplish some portion of that high-level operation. Those requests might all arrive on a single Apache TCP/IP connection, or they may use many connections. They may arrive in serial, or portions of them may arrive in parallel. They may arrive in rapid succession, or they may arrive occasionally over the course of an unbounded period of time. As a Subversion user, you’re content if they arrive at all and the tool does what you’ve asked it to do, but the details of these arrivals do play a role in the server-side operational logging feature.
Many of the request types (especially the read-only ones) are common to several (or all) of the typical Subversion client operations. An OPTIONS request, for example, is used in pretty much all Subversion client/server connections to negotiate features. But the request type and target URL alone generally aren’t enough information to discern which high-level operation the user has invoked. For example, PROPFINDs come in all shapes and sizes and are used all over the place, sometimes for what might be considered sub-tasks (calculating resource URLs and such), but sometimes to fulfill the principle task initiated by the user. REPORT requests also come in different flavors, some of them used for relatively simple information queries on the server, but some of them for what is arguably the bulk of a high-level operation. It is via REPORT requests (perhaps with some supplemental, smaller requests) that a Subversion client fetches the directory layout, file contents, and versioned properties needed to service a checkout, update, diff, or merge. Another type of REPORT request services a log operation by providing the entirety of the requested revision metadata.
Apache logs all of these requests via its standard logging mechanisms. The challenge for mod_dav_svn is to find a way to distill a series of requests into a concise description of an operation that a human could (and would care to) comprehend. As originally proposed, this feature would involve Subversion clients simply providing this high-level operation information to the Apache server as part of the requests sent to handle that operation. There would be APIs for clients to use to share with the underlying Repository Access (RA) layer exactly what the client was trying to do, and the RA layer would pass that information to the server, and the server would log it while servicing the requests. This idea was shot down for a couple of reasons, though. First, a modified Subversion client could be tweaked to lie about what it was and wasn’t doing, leaving administrators with unreliable logs (which they might not even know are unreliable). Secondly, because WebDAV is a stateless protocol, there would be additional work required to avoid logging the same high-level operation multiple times, once per low-level WebDAV request. So the feature was re-proposed with the design that was eventually implemented—mod_dav_svn would only attempt to log a high-level operation summary for certain request types, namely those that can be unambiguously recognized. In other words, rather than logging what the client claimed that the user was trying to do, the server would log what the server positively knew it had already done.
Now, generally speaking, there are one or two key requests in most of the Subversion client operations which can be identified meaningfully by the server and logged. A MERGE request is the final step in any operation that seeks to create a new revision in the repository, so when one of those requests succeeds, mod_dav_svn will log the fact that a commit has occurred. The REPORT request types previously mentioned are pretty meaty, and tend to be the central request used to service the most common, high-level, get-data-from-the-server types of client operations. These generate an operation log entry because the server can say with reliability, “I just transmitted a response to a request for an update report” or “I just transmitted a response to a request for a revision log report”. Maybe the client is actually using those responses to perform update or log operations, or maybe it isn’t—the server can’t know.
Unfortunately, because the protocol is often driven in small steps that aren’t in themselves unique to a particular operation and, again, because what is logged is the server’s operation and not the client’s, the logged results aren’t always as precise as you might like. A client action might result in more than one operation being logged, or none at all. Or, the server log might say (effectively, not literally), “I sent the properties for a file”, but whether that was the result of the user running ‘svn propget’, ‘svn propedit’, ‘svn info’, or something else cannot be determined. Or the client might run ‘svn export’ and the server log says, “Well, what I sent could have been used for checkout or export, but I’m not sure which.” Depending on how the server is driven by the client (details I’ll not dive into here), a checkout might log a single line (“Yup, I did a checkout or export!”) or it might log hundreds (“Yup, I did a checkout or export! And I sent a file’s contents. And another file’s contents. And another file’s…. “). Likewise, except in very specific cases, those somewhat generic PROPFINDs are not logged via the operational logging mechanism because the server can’t reliably know what high-level request they might be part of (if any at all) and because, lacking that information, logging these requests would just be redundant with what is already recorded in the primary Apache logs.
Someone reading this might be tempted to assume that the subliminal message here is that Subversion’s operational logging is without value. That’s not the case. At a minimum, it offers something that Apache’s primary logging mechanism cannot: the ability to disambiguate requests based on the parameters of the request found in its body. To Apache, a REPORT against a default VCC URL is a REPORT against a default VCC URL; as mod_dav_svn processes the body of that REPORT request, though, it can tell the difference between an update-style report request, a log request, a mergeinfo request, and so on. Apache also can’t tell you the Subversion-specific bits about what it has done: the access_log will note that a MERGE completed successfully, but only the Subversion operational log can say that this resulted in r34867 being committed in the repository.
Is there room for improvement here? Absolutely. I’d like to see the idea of a client-transmitted operation revisited in earnest, but implemented in a way that allows for easy validation against the server’s records of what it actually did. In the meantime, though, I hope that a healthy understanding of the benefits and limitations of the feature as it exists today will prove useful when swimming through logs and attempting to “go all CSI on” events of the past.