TeX Live development: svn and git
TeX Live development has a long history. First we were using perforce for a long time, after which we changed to subversion which is still the main development platform and the main central repository. This blog post describes how I moved my local development environment to git-svn, and discuss the pros and cons of the move.
TeX Live is a huge project, we currently have about 200k files, amounting to over 6Gb data without CMS overhead. TeX Live’s development is tightly coupled to subversion. Our packages have revision numbers based on the latest change revision of all the contained files. Since subversion has a linear history and central repository, this guarantees that – in contrast to version numbers provided by authors – the revision numbers are strictly increasing. This allows the TeX Live Manager to manage updates, besides other things. Furthermore, the history is long with around 30k commits.
As nice and well suited to our development model subversion has proven itself, it still lacks several features I am missing badly, most of all easy branching. As mentioned in the previous blog post Preparing for TeX Live 2013 release, I am working parallel on different projects within the TeX Live infrastructure. Managing this in subversion turned out as a pain.
So I decided to move to git, in particular git-svn, i.e., using git as frontend and svn as server storage area. There are many tutorials and explanations on that around the internet, no need to repeat. What I want to collect here are the problems I faced (and hoping that people will provide suggestions!).
The first attempt – importing partial history
In a first attempt, actually for the last 6 month, I have used a git-svn checkout where I only imported the history beginning from around 2012-07, by using the -r option to git svn.
This worked out quite nicely, with reasonable quick operation time on usual git commands. At least until I tried to update our TLTREE.pm module which is responsible to represent the svn tree and compute the revision numbers of packages, to also do the same within a git-svn checkout. Obviously I wanted to be able to update our database (tlpdb) from within the git directory without the need to have an extra svn checkout. It failed, because to correctly represent it we need for every file the last-change-revision, and for that I need more or less the whole history.
There we go, try a full checkout
Current attempt – importing the full history
So I shot up – on an external disk – the git svn import again, this time importing all of TeX Live. And then I waited … and waited … and waited. It took a full 2 days (!) to get everything.
And after it has finished I was surprised to find that there is no way I can work with it at all. All the git operation took ages, might it be status, or change a branch. That is nothing I could bare.
So, on to search the internet, and it looks that the binary files are the biggest problem, i.e., using delta diffs for binary files. As a consequence, I have for now declared several files binary in my .git/info/attributes file:
Master/bin/*/* binary *.pdf binary *.pfb binary
that hits several links/scripts that are in the binary directories, but that is hopefully less a problem. After that I called git gc (no idea if that is enough), but it turned out that after the garbage collection run, now I am getting decent performance, comparable to svn, without the need of an internet connections for things like commits.
As this change was rather recent, I will update now and then on my experiences with git-svn for TeX Live.