Last Updated Dec 06, 2012 — Enterprise Agile Planning expert
Enterprise Agile Planning

Can history be changed… and why should I care?

In Git version control system it is possible to change history (e.g. to remove accidentally added files which are big/confidential/infringing), or completely delete a branch without a trace (e.g. remove already merged or abandoned feature branches) . This can be useful, but may be dangerous.
What if someone does that by accident or with malicious intent? In corporate settings, surprises are usually not a good thing, especially when it comes to data loss/corruption.
Here are example commands which change history in Git:

 git push -f
 git push origin :branch
 git push origin branch :branch2 # note the space before colon

There are ways to keep “history of history”, thus reducing or eliminating the mentioned risks. I will compare those ways for you.

What is Git reflog?

Git reflog is a mechanism built into git, which can be used to determine e.g. what commit was at the head of a branch “2 days ago”. That information is stored in a set of log files under .git/logs.

What is History Protection in Git/Gerrit Integration with CollabNet TeamForge?

In this article I will shorten “Git/Gerrit integration with CollabNet TeamForge” as TeamForge-Git, for the sake of brevity.

In TeamForge-Git, a new feature was recently introduced – History Protection. It offers out-of-the-box, enterprise-ready, site-wide-enforceable protection from Bad Things(TM) happening to the all-important Git history and Git refs.

How do those two approaches compare?

I will describe the differences between the two approaches. In case you want a quick overview – there is a tabular comparison further down this article.
At a shallow glance they might seem similar, but when one looks deeper, one notices, that it is more like apples-to-oranges; or apple-to-blackberry ;).

Purpose and origin

Some people describe reflog as more of a personal tool for a developer to have a fallback when something goes wrong, e.g. when a local branch is deleted accidentally.
In contrast, TeamForge-Git’s History Protection was designed from the beginning as a mechanism that must work in a multi-user enterprise environment on a blessed repository and be remotely accessible and manageable through a Web UI as well as command line clients.

Accessibility, UI, Tooling

To configure and access Git reflog, one needs to have access to file system on the server where the “blessed Git repository” is hosted, which is unlikely in big organizations and would keep the server administrators busy.
TeamForge-Git’s History Protection has a web UI, which offers a “self –service” approach. Users with appropriate permissions in TeamForge/Gerrit can, by themselves, inspect, resurrect and permanently delete the rewritten/deleted Git refs, thus eliminating the need to involve server administrators who have filesystem access. Additionally, History Protection creates special (backup) Git refs for ordinary refs which get deleted/rewritten. Those special refs can be inspected/resurrected also using ordinary Git clients.

Eclipse viewing backup refs of TeamForge-Git History Protection

Eclipse viewing backup refs of TeamForge-Git History Protection

What information is stored

Information stored by Git reflog

Git reflog records these changes in the repository:

  • Any push
  • Any merge
  • Any branch creation/deletion
  • Any tag creation/deletion

Example entries in reflog look like that:


  kpradzinski  1354558003 +0100 push
  kpradzinski  1354558072 +0100 push

Two issues baffled me:

  • The second entry actually represents forced push (non-fast-forward). I haven’t found any way to distinguish fast-forward from non-fast-forward. This seems to make reflog ill-suited (maybe even useless) for the purposes of protecting history in an air-tight manner.
  • A non-human-readable timestamp format was chosen, which will complicate things for the human user.

Information stored by TeamForge-Git’s History Protection

If TeamForge-Git’s History Protection is on, then whenever a Git ref gets deleted/rewritten, a descriptive entry is added to the list of deleted/rewritten refs. This is somewhat similar to “Recycle bin” on computer desktop. Gerrit Administrators can permanently delete those entries or “resurrect” them.
Here is how it looks:

TeamForge-Git History Protection

TeamForge-Git History Protection

Additionally, TeamForge-Git’s History Protection creates Audit Log entries whenever git refs get

  • deleted
  • re-written (non-fast-forward)
  • resurrected
  • permanently deleted (which is somewhat similar to “emptying the Recycle bin” on computer desktops).

 Notification

As far as I know, git reflog offers no possibility of notifying users about history rewrites or deleted branches.
TeamForge-Git’s History Protection has a built-in notification mechanism, enabled by default. It uses Gerrit’s e-mail sending infrastructure to notify administrators about history rewrites and deleted git refs.

E-mail notification from TeamForge-Git about branch deletion

E-mail notification from TeamForge-Git about branch deletion

Protection against object pruning and reflog expiration

Git has a mechanism called “garbage collection”, which is used to permanently remove data that is no longer referenced. If this is triggered at an unfortunate time, then some unreferenced (e.g. due to delete/rewrite) commits will get lost.

Git reflog

Reflog expiration and gc pruning settings have to be manually configured by server administrator. The only possibility to not lose commits no longer referenced in a branch is to set both values to “unlimited” which will consume large amounts of disk space, slow down garbage collection and will cut off the option of  permanently deleting specific commits (all or nothing).

TeamForge-Git History Protection

With TeamForge-Git History Protection, commits remain referenced from special Git refs in “recycle bin”, thus will never be pruned by Git garbage collection, unless explicitly permanently removed using Gerrit Web UI by one of the administrators.
Additionally, no need to keep a large ref  log.

Ease of use

Git reflog

It is only manually configurable by administrator having file system access. It has to be configured for each and every repository. Restoring deleted/rewritten refs requires running git commands directly on the server.

TeamForge-Git’s History Protection

It is available out-of-the-box in TeamForge-Git. History protection is enforceable per-repository or for all repositories by setting a site-wide config option. Users with appropriate permissions can restore refs using Gerrit Web UI or Git client. Users with appropriate permissions can use ordinary git clients (e.g. ls-remote, Eclipse) to access (read-only, e.g. to inspect) rewritten/deleted refs which are visible to them under special ref directories refs/delete and refs/rewrite.
Here is how those refs look like in ls-remote:


[ /tmp/demo-project-1/ ] %: git ls-remote
From ssh://dariusz@example:29418/demo-project-1
4f00518c5a4a9d8f6f4a3cfb019ba518fb89e6a8 HEAD
f459ce2b20ef60f404da3f383b0e2a28831e2418 refs/delete/20121130230538-this_branch_is_just_resting--dariusz
4f00518c5a4a9d8f6f4a3cfb019ba518fb89e6a8 refs/heads/master
329bb817c024fcb80c6f4f301b97d9a1985ee2e0 refs/rewrite/20121130230705-master-4f00518c5a4a9d8f6f4a3cfb019ba518fb89e6a8-dariusz

To protect or not to protect…

Having a leaky or nonexistent protection against losing history may, in the milder case, cause some anxiousness for administrator(s), but in more severe cases it may cause real loss or corruption of data, leading to serious problems.

I have to also point out that it is possible to use both approaches at the same time.

Comparison table

Here is the promised table that gives a more compact comparison:

Git reflog

TeamForge Git Integration with “History Protection”

Accessibility Requires direct access to file system on server where ‘blessed Git repository’ is hosted which is very unlikely in huge organizations and  will keep the server administrators busy ‘Self –Service’ approach. Users with appropriate permissions in TeamForge can find out/resurrect deleted/rewritten branches by themselves, decreasing work load of server administrators.
Gerrit Administrators can also permanently delete selected branches/tags.
Signal-to- Noise Ratio reflog  records all changes in the repository

  • Any push (also ordinary fast forward)
  • Any merge
  • Any Branch creation/deletion
  • Any Tag creation / deletion

Finding out about history rewrites/deleted branches is like searching for a needle in a haystack

History Protect has a view which neatly shows:

  • Deleted branches/tags
  • History rewrites (non-fast-forward pushes)

Separately – Audit Log entries whenever branches/tags get

  • Deleted
  • Re-written (non-fast-forward)
  • Resurrected
  • Permanently deleted
Notification No notification Email to Gerrit Administrators
Ease of use Only manually configurable by an administrator having file system access Out-of-the-box in TeamForge-Git
To be configured for every repository Configurable per-repository or for all repositories via a site-wide config option.
Restoring requires running git commands on server Users with appropriate permission can restore history using Gerrit web UI or Git client
Protection against object pruning/reflog expiration RefLog expiration and gc pruning settings have to be manually configured by server administrator. Only possibility to not lose commits no longer referenced in a branch is to set both values to “unlimited”, which will consume big amounts of disk space, slows down garbage collection and does not allow to permanently delete specific commits (all or nothing). Preserved commits will never be pruned by garbage collection, unless permanently removed using Gerrit Web UI.
No need to keep a large ref  log. Garbage collection will run faster since all commits are still referenced in the repo.

You came, you saw, you should comment

Please share your feedback – how do you like this comparison, did I miss something, what were your experiences with reflog, or other things that may have crossed your mind.

Are you ready to scale your enterprise?

Explore

What's New In The World of Digital.ai

May 19, 2023

What is SAFe PI Planning?

PI Planning aims to bring together all the people doing the work and empower them to plan, estimate, innovate, and commit to work that aligns with the business’s high-level goals, vision, and strategy.

Learn More
July 5, 2022

How to bring external data to Digital.ai Agility

Silvia Davis, Sr. Product Marketing Manager at Digital.ai, tells her story of how a positive app experience led to the realization that proper data integration is essential to the entire application lifecycle.

Learn More
April 19, 2022

Happy Anniversary Digital.ai!

This year, Digital.ai turns two! Continue reading for insight on Digital.ai’s journey and what plans we have for the future.

Learn More