I found the command "git annex lock" very slow (much slower than the initial "git annex add" with SHA1), for a not so big directory, when run in a big repo. It seems that each underlying git command is not fast, so I thought it would be better to run them once with all files as arguments. I had to stop the lock command, and ran "git checkout ." (I did not change any file), is this a correct alternative?
Thanks a LOT for this software, one that I missed since a long time (but wasn't able to write)!
Rafaël
Running
git checkout
by hand is fine, of course.Underlying problem is that git has some O(N) scalability of operations on the index with regards to the number of files in the repo. So a repo with a whole lot of files will have a big index, and any operation that changes the index, like the
git reset
this needs to do, has to read in the entire index, and write out a new, modified version. It seems that git could be much smarter about its index data structures here, but I confess I don't understand the index's data structures at all. I hope someone takes it on, as git's scalability to number of files in the repo is becoming a new pain point, now that scalability to large files is "solved". ;)Still, it is possible to speed this up at git-annex's level. Rather than doing a
git reset
followed by a git checkout, it can justgit checkout HEAD -- file
, and since that's one command, it can then be fed into the queueing machinery in git-annex (that exists mostly to work around this git malfescence), and so only a single git command will need to be run to lock multiple files.I've just implemented the above. In my music repo, this changed an lock of a CD's worth of files from taking ctrl-c long to 1.75 seconds. Enjoy!
(Hey, this even speeds up the one file case greatly, since
git reset -- file
is slooooow -- it seems to scan the entire repository tree. Yipes.)Nice! So if I understand correctly, 'git reset -- file' was there to discard staged (but not commited) changes made to 'file', before checking out, so that it is equivalent to directly 'git checkout HEAD -- file' ? I'm curious about the "queueing machinery in git-annex": does it end up calling the one git command with multiple files as arguments? does it correspond to the message "(Recording state in git...)" ? Thanks!