Git and Subversion collaboration

Git is great, we all know that, but there are use cases where there completely distributed development model does not shine (see here and here). And while my old git svn mirror of TeX Live subversion was working well, git pull and git svn rebase didn’t work well together, repulling the same changes again and again. Finally, I took the time to experiment and fix this!

Most of the material in this blog is already written up, and the best sources I found are here and here. There practically everything is written down, but when one goes down to business some things work out a bit differently. So here we go.

Aim

Aim of the setup is to be able that several developers can work on a git svn mirror of a central subversion repository. “Work” here means:

  • pull from the git mirror to get the latest changes
  • normal git workflows: branch, develop new features, push new branches to the git mirror
  • commit to the subversion repository using git svn dcommit

and all that with a much redundancy removed as possible.

On solution to this would be that each developer creates his own git-svn mirror. While this is fine in principle, it is error prone, costs lots of time, and everyone has to do git svn rebase etc. We want to be able to use normal git workflows as far as possible.

Layout

The basic layout of our setup is as follows:

The following entities are shown in the above graphics:

  • SvnRepo: the central subversion repository
  • FetchingRepo: the git-svn mirror which does regular fetches and pushes to the BareRepo
  • BareRepo: the central repository which is used by all developers to pull and collaborate
  • DevRepo: normal git clones of the BareRepo on the developers’ computer

The flow of data is also shown in the above diagram:

  • git svn fetch: the FetchingRepo is updated regularly (using cron) to fetch new revisions and new branches/tags from the SvnRepo
  • git push (1): the FetchingRepo pushes changes regularly (using cron) to the BareRepo
  • git pull: developers pull from the BareRepo, can check out remote branches and do normal git workflows
  • git push (2): developers push changes to and creation of new branches to the BareRepo
  • git svn dcommit: developers rebase-merge their changes into the main branch and commit from there to the SvnRepo

Besides the requirement to use git svn dcommit for submitting the changes to the SvnRepo, and the requirement by git svn to have linear histories, everything else can be done with normal workflows.

Procedure

Let us for the following assume that SVNREPO points to the URI of the Subversion repository, and BAREREPO points to the URI of the BareRepo. Furthermore, we refer to the path on the system (server, local) with variables like $BareRepo etc.

Step 1 – preparation of authors-file

To get consistent entries for committers, we need to set up a authors file, giving a mapping from Subversion users to Name/Emails:

svnuser1 = AAA BBB 
svnuser2 = CCC DDD 
...

Let us assume that AUTHORSFILE environment variable points to this file.

Step 2 – creation of fetching repository

This step creates a git-svn mirror, please read the documentation for further details. If the Subversion repository follows the standard layout (trunk, branches, tags), then the following line will work:

git svn clone --prefix="" --authors-file=$AUTHORSFILE -s $SVNREPO

The important part here is the --prefix one. The documentation of git svn says here:

Setting a prefix (with a trailing slash) is strongly encouraged in any case, as your SVN-tracking refs will then be located at “refs/remotes/$prefix/”, which is compatible with Git’s own remote-tracking ref layout (refs/remotes/$remote/). Setting a prefix is also useful if you wish to track multiple projects that share a common repository. By default, the prefix is set to origin/.

Note: Before Git v2.0, the default prefix was “” (no prefix). This meant that SVN-tracking refs were put at “refs/remotes/*”, which is incompatible with how Git’s own remote-tracking refs are organized. If you still want the old default, you can get it by passing –prefix “” on the command line.

While one might be tempted to use a prefix of “svn” or “origin”, both of which I have done, this will complicate (make impossible?) later steps, in particular the synchronization of git pull with git svn fetch.

The original blogs I mentioned in the beginning were written before the switch to default=”origin” was made, so this was the part that puzzled me and I didn’t understand why the old descriptions didn’t work anymore.

Step 3 – cleanup of the fetching repository

By default, git svn creates and checks out a master branch. In this case, the Subversion repositories “master” is the “trunk” branch, and we want to keep it like this. Thus, let us checkout the trunk branch and remove the master, after entering the FetchingRepo, do

cd $FetchingRepo
git checkout trunk
git checkout -b trunk
git branch -d master

The two checkouts are necessary because the first will leave you with a detached head. In fact, no checkout would be fine, too, but git svn does not work over bare repositories, so we need to checkout some branch.

Step 4 – init the bare BareRepo

This is done in the usual way, I guess you know that:

git init --bare $BareRep

Step 5 – setup FetchingRepo to push all branches and push them

The cron job we will introduce later will fetch all new revisions, including new branches. We want to push all branches to the BareRepo. This is done by adjusting the fetch and push configuration, after changing into the FetchingRepo

cd $FetchingRepo
git remote add origin $BAREREPO
git config remote.origin.fetch '+refs/remotes/*:refs/remotes/origin/*'
git config remote.origin.push 'refs/remotes/*:refs/heads/*'
git push origin

What has been done is that fetch should update the remote branches, and push should pull the remote branches to the BareRepo. This ensures that new Subversion branches (or tags, which are nothing else then branches) are also pushed to the BareRepo.

Step 6 – adjust the default checkout branch in the BareRepo

By default the master branch is cloned/checked out in git, but we don’t have a master branch, but “trunk” plays its role. Thus, let us adjust the default in the BareRepo:

cd $BareRepo
git symbolic-ref HEAD refs/heads/trunk

Step 7 – developers branch

Now we are ready to use the bare repo, and clone it onto one of the developers machine:

git clone $BAREREPO

But before we can actually use this item, we need to make sure that git commits sent to the Subversion repository have the same user name and email for the committer. The reason for this is that the commit hash is computed from various information including the name/email (see details here). Thus we need to make sure that the git svn dcommit at the DeveloperRepo and the git svn fetch on the FetchingRepo create the very same hash! Thus, each developer needs to set up an authorsfile with at least his own entry:

cd $DeveloperRepo
echo 'mysvnuser = My Name '  > .git/usermap
git config svn.authorsfile '.git/usermap'

Important: the line for mysvnuser must exactly match the one in the original authorsfile from Step 1!

The final step is to allow the developer to commit to the SvnRepo by adding the necessary information to the git configuration:

git svn init -s $SVNREPO

Warning: Here we rely on two items: First, that the git clone initializes the default origin for the remote name, and second, that git svn init uses the default prefix “origin”, as discussed above.

If this is too shaky for you, the other option is to define the remote name during clone, and use that for the prefix:

git clone -o mirror $BAREREPO
git svn init --prefix=mirror/ -s $SVNREPO

This way the default remote will be “mirror” and all is fine.

Note: Upon your first git svn usage in the DeveloperRepo, as well as always after a pull, you will see messages like:

Rebuilding .git/svn/refs/remotes/origin/trunk/.rev_map.c570f23f-e606-0410-a88d-b1316a301751 ...
rNNNN = 1bdc669fab3d21ed7554064dc461d520222424e2
rNNNM = 2d1385fdd8b8f1eab2a95d325b0d596bd1ddb64f
...

This is a good sign, meaning that git svn does not re-fetch the whole set of revisions, but reuses the one pulled from the BareRepo and only rebuilds the mapping, which should be fast.

Updating the FetchingRepo

Updating the FetchingRepo should be done automatically using cron, the necessary steps are:

cd $FetchingRepo
git svn fetch --all
git push

This will fetch all revisions, and pushes the default configured branches, that are all remote heads to the BareRepo.

Note: If a Developer first commits a change to the SvnRepo using git svn dcommit and before the FetchingRepo updated the BareRepo (i.e., before the next cron run) also uses git pull, he will see something like:

$ git pull
From preining.info:texlive2
 + 10cc435f163...953f9564671 trunk      -> origin/trunk  (forced update)
Already up to date.

This is due to the fact that the remote head is still behind the local head, which can easily be seen by looking at the output of git log: Before the FetchingRepo updated the BareRepo, one would see something like:

$ git log
commit 3809fcc9aa6e0a70857cbe4985576c55317539dc (HEAD -> trunk)
Author: ....

commit eb19b9e6253dbc8bdc4e1774639e18753c4cd08f (origin/trunk, origin/HEAD)
...

and afterwards all of the three refs would point to the same top commit. This is nothing to worry and normal behavior. In fact, the default setup for fetching remotes is to force pull.

Protecting the trunk branch

I found myself sometimes pushing wrongly to trunk instead of using svn dcommit. This can be avoided by posing restriction on pushing. With gitolite, simply add a rule

- refs/heads/trunk = USERID

to the repo stanza of your mirror. When using Git(Lab|Hub) there are options to protect branches.

A more advanced restriction policy would be users to require that created branches are within a certain namespace. For example, a gitolite rule

repo yoursvnmirror
    RW+      = fetching-user
    RW+ dev/ = USERID
    R        = USERID

would only allow the FetchingRepo (identified by fetching-user) to push everywhere, but myself (USERID) to push/rewind/delete etc only branches starting with “dev/”, but read everything.

Workflow for developers

The recommended workflow compatible with this setup is

  • use git pull to update the local developers repository
  • use only branches that are not created/update via git-svn
  • on commit time, (1) rebase you branch on trunk, (2) merge (fast forward) your branch into trunk, (3) commit your changes with git svn dcommit
  • rinse and repeat

More detailed discussion and safety measure as laid out in the git-svn documentation apply as well, worth reading!

2 Responses

  1. 2018/06/08

    […] Git and Subversion collaboration […]

  2. 2022/02/08

    […] the SVN repository is still active, then you would need to use git svn, as described in this article, in order to do git svn fetch/git svn […]

Leave a Reply

Your email address will not be published. Required fields are marked *