I found the command "git annex lock" very slow (much slower than the initial "git annex add" with SHA1), for a not so big directory, when run in a big repo. It seems that each underlying git command is not fast, so I thought it would be better to run them once with all files as arguments. I had to stop the lock command, and ran "git checkout ." (I did not change any file), is this a correct alternative?

Thanks a LOT for this software, one that I missed since a long time (but wasn't able to write)!

Rafaël

Running git checkout by hand is fine, of course.

Underlying problem is that git has some O(N) scalability of operations on the index with regards to the number of files in the repo. So a repo with a whole lot of files will have a big index, and any operation that changes the index, like the git reset this needs to do, has to read in the entire index, and write out a new, modified version. It seems that git could be much smarter about its index data structures here, but I confess I don't understand the index's data structures at all. I hope someone takes it on, as git's scalability to number of files in the repo is becoming a new pain point, now that scalability to large files is "solved". ;)

Still, it is possible to speed this up at git-annex's level. Rather than doing a git reset followed by a git checkout, it can just git checkout HEAD -- file, and since that's one command, it can then be fed into the queueing machinery in git-annex (that exists mostly to work around this git malfescence), and so only a single git command will need to be run to lock multiple files.

I've just implemented the above. In my music repo, this changed an lock of a CD's worth of files from taking ctrl-c long to 1.75 seconds. Enjoy!

(Hey, this even speeds up the one file case greatly, since git reset -- file is slooooow -- it seems to scan the entire repository tree. Yipes.)

Comment by http://joey.kitenet.net/ Tue May 31 18:51:13 2011

Nice! So if I understand correctly, 'git reset -- file' was there to discard staged (but not commited) changes made to 'file', before checking out, so that it is equivalent to directly 'git checkout HEAD -- file' ? I'm curious about the "queueing machinery in git-annex": does it end up calling the one git command with multiple files as arguments? does it correspond to the message "(Recording state in git...)" ? Thanks!

@Rafaël , you're correct on all counts.
Comment by http://joey.kitenet.net/ Tue May 31 21:54:23 2011
Comments on this page are closed.