Currently the hashed directories in .git-annex allow for upper and lower case directory names... on linux (or any case sensitive filesystem) the directory names such as 'Gg' and 'GG' are different and unique. However on systems like OSX (and probably windows if it is ever supported) the directory names 'Gg' is the same as 'GG'
In one of the annex'd repos that I have this has occured...
$ git add -i staged unstaged path 1: unchanged +1/-1 .git-annex/GM/GV/WORM-s183630166-m1301072171--somefile.log 2: unchanged +1/-1 .git-annex/Gm/GV/WORM-s183630166-m1301072171--somefile.log
this has somewhat confused git when it tries to stage/merge files, I didn't notice this at first, but it is definately a problem for someone using case insensitive filesystems like the default OSX HFS+ formats or vfat/fat32.
I feel a bit stupid to not have considered case-insensative filesystems. They are just so far from where I have lived for 20 years that it's hard to keep them in mind.
I guess that git-annex has issues with git when staging/commiting logs is somehow a consequence (or cause?) of this, but I don't quite understand how this is causing git to fail to stage files, or stage the same file twice under different capitalizations. git-annex always will run git add on the path with the "correct" capitalization. So unless something else has added the path with the other capitalization (perhaps git add .git-annex manually?) I don't understand how you get to this state. --Joey
I think I got myself into this situation when I copied some files over from a HFS+ partition to a GPFS network share (which is pretty posix compliant) over samba. It probably is related to the git-annex has issues with git when staging/commiting logs. I thought they were unique enough to have two bug reports logged as one is a git behavioural thing and the other is git-annex specific.
If you copied
.git/
over, perhaps you got a git repo without core.ignorecase set right for the filesystem it landed on?I usually git clone or do a fresh repository and pull things in, I was also unaware of this ignorecase setting as well.
Something like this might reproduce it:
# mkdir test; cd test; git init # git config core.ignorecase false # mkdir Foo # touch Foo/bar # git add Foo/bar # git add foo/bar # git add fOo/bar # git status # touch foo/other # git add fOo/other # git status
And then either git commit or git clone would probably get confused if it thought 3 distinct files had been committed. --Joey
Doing the above test on a HFS+ partition yields this
## with ignorecase=false commit bb024c6fd7482b2d10f60ae899cb7a949aca1ad8 Author: Jimmy Tang Date: Sun Mar 27 18:40:24 2011 +0100 commit diff --git a/Foo/bar b/Foo/bar new file mode 100644 index 0000000..e69de29 diff --git a/fOo/bar b/fOo/bar new file mode 100644 index 0000000..e69de29 diff --git a/fOo/other b/fOo/other new file mode 100644 index 0000000..e69de29 diff --git a/foo/bar b/foo/bar new file mode 100644 index 0000000..e69de29
and without changing ignorecase
commit 909a089158ffb98f8e91f98905e2bfdc7234666f Author: Jimmy Tang Date: Sun Mar 27 18:46:57 2011 +0100 commit diff --git a/Foo/bar b/Foo/bar new file mode 100644 index 0000000..e69de29 diff --git a/Foo/other b/Foo/other new file mode 100644 index 0000000..e69de29
Closing this bug, as it seems I have dealt with it adequately now. done --Joey
I think I know how I got myself into this mess... I was on my mac workstation and I had just pulled in a change set from another repo on a linux workstation after I had a made a bunch of moves. here's a bit of a log of what happened...
So, there is evidence here of a circumstance caused by the other bug, as I suspected.
I don't think that manual
git commit -a
caused the problem. I suspect it was a subsequentgit add
that caused git to follow the wrong case paths and add the files in the wrong place. Ie, when you run "git add .git-annex", it recurses into.git-annex/Gm/
, and adds files using that case, that were previously added from.git-annex/GM/
.For completeness, can you verify this repo's core.ignorecase setting?
I hate that you are stuck using loop filesystems to work around this bug. If my guess is correct, you don't need to, as long as you avoid manually running "git add .git-annex". I take this bug seriously. While I'm currently very involved in adding Amazon S3 support to git-annex (which will take days more of solid work), I do plan to make a loop filesystem of my own, probably vfat, so I can try and reproduce this on a case-insensative filesystem. If you could confirm my above hypothesis, that would speed things up for me.
It's possible I will have to tweak the hash directories. Hopefully if so, I will only tweak them for new keys; if I had to do a v3 backend just to fix this stupid thing, I'd be sad -- upgrading all my offline disks from v1 to v2 took me many days.
I also failed to mention, that in the case when i have stray log files after what has happened in comment 2, I get this left over after a commit when git is confused...
Up until now I have just been updating the status of the staged files by hand and commiting it on my mac x00, this probably isn't helping. I'd rather not lose the tracking information.
Alright, I have created a case-insensative HFS+ filesystem here on my linux laptop.
I have not been able to trick git into staging the same file with 2 different capitalizations yet.
It might be helpful if you can send me a copy of a git repository where 'git add -i' shows the same file staged with two capitalizations. Leaving out .git/annex of course. (joey@kitenet.net; a tarball would probably work)
It seems that
git add
only started properly working on case insensative filesystems quite recently. The commit in question is 5e738ae820ec53c45895b029baa3a1f63e654b1b, "Support case folding for git add when core.ignorecase=true", which was first released in git 1.7.4, January 30, 2011. If you don't yet have that version, that could explain the problem entirely. In about half an hour (dialup!) I will have downloaded an older git and will see if I can reproduce the problem with it.git 1.7.4 does not make things better. With it, if I add first "X/foo" and then "x/bar", it commits "X/bar".
That will certianly cause problems when interoperating with a repo clone on a case-sensative filesystem, since git-annex there will not see the location log that git committed to the wrong case directory.
It's possible there is some interoperability problem when pulling from linux like you did, onto HFS+, too. I am not quite sure. Ah, I did find one.. if I clone the repo with "X/foo" in it to a case-sensative filesystem, and add a "x/foo" there, and pull that commit back to HFS+, git says:
Aha -- that lets me reproduce your problem with the same file being staged twice with different capitalizations, too:
And modified files that git refuses to commit, which entirely explains strong>commiting logs.
I think git is frankly, buggy. It seems I will need to work around this by stopping using mixed case hashing for location logs.
I've posted about this on the git mailing list. It's possible that these bugs, which can be shown to affect things other than just git-annex, will be fixed in git.
I will wait a while to see. But am considering making git-annex use all-lowercase hash dirs for the log files. Maybe it could first look for .git-annex/aaaa/bbbb/foo.log, but also look for, read, and merge in any info from .git-annex/Aa/Bb/foo.log. And always write to the new style filenames. This would avoid confusing git with changes to mixed-case files, and avoid another massive transition.
I have pushed out a preliminary fix. The old mixed-case directories will be left where they are, and still read from by git-annex. New data will be written to new, lower-case directories. I think that once git stops seeing changes being made to mixed-case, colliding directories, the bugs you ran into won't manifest any more.
You will need to find a way to get your git repository out of the state where it complains about uncommitted files (and won't let you commit them). I have not found a reliable way to do that; git reset --hard worked in one case but not in another. May need to clone a fresh git repository.
Let me know how it works out.
.git-annex/??
if you want to, then runninggit annex fsck --fast
in each of your clones would regenerate the data using only the lower-case hash directories.I meant to say in it wasn't reliable when I was following the instructions for "Comment 12". I did find that just doing a "git annex copy -t externalusb ." then a "git annex drop ." from the root of my cloned and "none trusted" annexed repos to be more reliable, it just means I temporarily need a load of space to get myself out of my earlier mess.
On testing this bug fix, I found a minor behavioural issue with git annex copy -f REMOTE . doesn't work as expected
I also ran into problems on a case-insensitive HFS+ file system, it seems. I tried following the instructions in comment 12:
However, I still see upper and lower case directories in .git-annex. Did I misunderstand that they should all be lower case now?
I think the correct steps should be, make a backup first :) then ...
I eventually migrated all of my own annex'd repos and I no longer have the old hashed directories but the new ones in the form
I did lose some tracking information but not data (as far as I can see for now), but that was quickly fixed by pushing and pulling to my bare repo which tracks most of my data.
I also found that it worked a bit more reliably for me on the copies of repos that were located on case sensitive filesystems, but I guess that was expected.
Joey, sorry, I got it wrong. I thought upgrading git didn't help and you adjusted things in git-annex instead.
Anyway, can I get around upgrading on all hosts by reformatting the drive to case-sensitive HFS+? Or will I have to upgrade git (currently version 1.7.2.5) eventually anyway?
Hi,
(I'm new to git and git annex, so please forgive any mistakes I make...)
My repo is messed up right now. The fact that I copied the repo with rsync -a back and forth from a case insensitive filesystem to a case sensitive one, probably didn't help.
I believe the annexed files in .git/annex/objects/ are still using a mixed case directory hashing scheme. That's the problem I'm having. The symlinks point to the wrong case and are now broken. I don't think the latest versions of git-annex changed that (it only changed the hashing under .git-annex, right?).
Even if I clean up my repo, I think I'm still going to have a problem because I have one repo on an OS X case insensitive filesystem and my other repos on case sensitive Linux filesystems. Potentially the directory name under .git/annex/objects will have a different case. Then the symlink might have a different case than my Linux FS. Does git-annex track changes in git by the contents of the symlink? In which case the case difference would show up as a change even though there is no change?
Is it possible to change the directory hashing scheme under .git/annex/objects to use lowercase names?
@seqq git-annex always uses the same case when creating and accessing the files pointed to by the symlinks. So it will not matter if it's used on a case-insensative, or case-insensative but preserving system like OSX.
You need to fix up the cases of the files in .git/annex/objects to what it expects. I'm not sure what would be the best way to do that. The method described in recover data from lost+found might work well.