Conversation moved from recover data from lost+found to a proper bug. --Joey

(Unfortunatly that scrambled the comment creation times and thus order.)

Added a message done --Joey

I followed this to re-inject files which git annex fsck listed as missing.

For everyone of those files, I get

git-annex-shell: key is already present in annex
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(601) [sender=3.0.8]

when trying to copy the files to the remote.

-- Richard

Sounds like you probably didn't commit after the fsck, or didn't push so the other repository did not know the first had the content again -- but I'm not 100% sure.
Comment by http://joey.kitenet.net/ Thu May 12 01:01:34 2011

As my comment from work is stuck in moderation:

I ran this twice:

git pull && git annex add . && git annex copy . --to <remote> --fast --quiet && git commit -a -m "$HOST $(date +%F--%H-%M-%S-%Z)" && git push

but nothing changed

Hmm. Old versions may have forgotten to git add a .git-annex location log file when recovering content with fsck. That could be another reason things are out of sync.

But I'm not clear on which repo is trying to copy files to which.

(NB: If the files were recovered on a bare git repo, fsck cannot update the location log there, which could also explain this.)

Comment by http://joey.kitenet.net/ Sat May 14 16:13:58 2011

Version: 0.20110503

My local non-bare repo is copying to a remote bare repo.

I have been recovering in a non-bare repo.

If there is anything I can send you to help... If I removed said files and went through http://git-annex.branchable.com/bugs/No_easy_way_to_re-inject_a_file_into_an_annex/ -- would that help?

Well, focus on a specific file that exhibits the problem. What does git annex whereis say about it? Is the content actually present in annex/objects/ on the bare repository? Does that contradict whereis?
Comment by http://joey.kitenet.net/ Sat May 14 19:23:45 2011

It exists locally, whereis tells me it exists locally and locally, only.

The object is not in the bare repo.

The file might have gone missing before I upgraded my annex backend version to 2. Could this be a factor?

What you're describing should be impossible; the error message shown can only occur if the object is present in the annex where git-annex-shell recvkey is run. So something strange is going on.

Try reproducing it by running on the remote system, git-annex-shell recvkey /remote/repo.git $key .. if you can reproduce it, I guess the next thing to do will be to strace the command and see why it's thinking the object is there.

Comment by http://joey.kitenet.net/ Sun May 15 00:09:34 2011
Just to make sure: How do I get $key? What I did was look at the path in the object store of the local repo and see if that exact same path & file existed in the remote.
The key is the basename of the symlink target.
Comment by http://joey.kitenet.net/ Sun May 15 16:47:53 2011

It seems the objects are in the remote after all, but the remote is unaware of this fact. No idea where/why the remote lost that info, but.. Anyway, with the SHA backends, wouldn't it make sense to simply return "OK" and update the annex logs accordingly, no?

Local:

% ls -l foo
lrwxrwxrwx 1 richih richih 312 Apr  3 01:18 foo -> .git/annex/objects/gG/VW/SHA512-s80781--cef3966a19c7435acceb8fbfbff1feebe6decab7c81a0c197f00932cf9ef0eac330784cc3f0d211bd4acf56a6d16daaebe9b598aa4dfd5bfec73f4e6df3f0491/SHA512-s80781--cef3966a19c7435acceb8fbfbff1feebe6decab7c81a0c197f00932cf9ef0eac330784cc3f0d211bd4acf56a6d16daaebe9b598aa4dfd5bfec73f4e6df3f0491
% 

Remote:

% git-annex-shell recvkey <remote> SHA512-s80781--cef3966a19c7435acceb8fbfbff1feebe6decab7c81a0c197f00932cf9ef0eac330784cc3f0d211bd4acf56a6d16daaebe9b598aa4dfd5bfec73f4e6df3f0491
git-annex-shell: key is already present in annex
% strace git-annex-shell recvkey /base/git-annex/fun SHA512-s80781--cef3966a19c7435acceb8fbfbff1feebe6decab7c81a0c197f00932cf9ef0eac330784cc3f0d211bd4acf56a6d16daaebe9b598aa4dfd5bfec73f4e6df3f0491 2>&1 | grep SHA512-s80781--cef3966a19c7435acceb8fbfbff1feebe6decab7c81a0c197f00932cf9ef0eac330784cc3f0d211bd4acf56a6d16daaebe9b598aa4dfd5bfec73f4e6df3f0491
stat64("/base/git-annex/fun/annex/objects/gG/VW/SHA512-s80781--cef3966a19c7435acceb8fbfbff1feebe6decab7c81a0c197f00932cf9ef0eac330784cc3f0d211bd4acf56a6d16daaebe9b598aa4dfd5bfec73f4e6df3f0491/SHA512-s80781--cef3966a19c7435acceb8fbfbff1feebe6decab7c81a0c197f00932cf9ef0eac330784cc3f0d211bd4acf56a6d16daaebe9b598aa4dfd5bfec73f4e6df3f0491", {st_mode=S_IFREG|0444, st_size=80781, ...}) = 0
% ls -l /base/git-annex/fun/annex/objects/gG/VW/SHA512-s80781--cef3966a19c7435acceb8fbfbff1feebe6decab7c81a0c197f00932cf9ef0eac330784cc3f0d211bd4acf56a6d16daaebe9b598aa4dfd5bfec73f4e6df3f0491/SHA512-s80781--cef3966a19c7435acceb8fbfbff1feebe6decab7c81a0c197f00932cf9ef0eac330784cc3f0d211bd4acf56a6d16daaebe9b598aa4dfd5bfec73f4e6df3f0491
-r--r--r-- 1 richih richih 80781 2011-04-01 12:44 /base/git-annex/fun/annex/objects/gG/VW/SHA512-s80781--cef3966a19c7435acceb8fbfbff1feebe6decab7c81a0c197f00932cf9ef0eac330784cc3f0d211bd4acf56a6d16daaebe9b598aa4dfd5bfec73f4e6df3f0491/SHA512-s80781--cef3966a19c7435acceb8fbfbff1feebe6decab7c81a0c197f00932cf9ef0eac330784cc3f0d211bd4acf56a6d16daaebe9b598aa4dfd5bfec73f4e6df3f0491
% 

So, it appears that you're using git annex copy --fast. As documented that assumes the location log is correct. So it avoids directly checking if the bare repo contains the file, and tries to upload it, and the bare repo is all like "but I've already got this file!". The only way to improve that behavior might be to let rsync go ahead and retransfer the file, which, with recovery, should require sending little data etc. But I can't say I like the idea much, as the repo already has the content, so unlocking it and letting rsync mess with it is an unnecessary risk. I think it's ok for --force to blow up if its assumptions turn out to be wrong.

If you use git annex copy without --fast in this situation, it will do the right thing.

Comment by http://joey.kitenet.net/ Sun May 15 19:40:47 2011

Yes, makes sense. I am so used to using --fast, I forgot a non-fast mode existed. I still think it would be a good idea to fall back to non-fast mode if --fast runs into an error from the remote, but as that is well without my abilities how about this patch?

From 4855510c7a84eb5d28fdada429580a8a42b7112a Mon Sep 17 00:00:00 2001
From: Richard Hartmann <richih.mailinglist@gmail.com>
Date: Sun, 15 May 2011 22:20:42 +0200
Subject: [PATCH] Make error in RecvKey.hs suggest possible solution

---
 Command/RecvKey.hs |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Command/RecvKey.hs b/Command/RecvKey.hs
index 126608f..b917a1c 100644
--- a/Command/RecvKey.hs
+++ b/Command/RecvKey.hs
@@ -27,7 +27,7 @@ start :: CommandStartKey
 start key = do
    present <- inAnnex key
    when present $
-       error "key is already present in annex"
+       error "key is already present in annex. If you are running copy, try without '--fast'"

    ok <- getViaTmp key (liftIO . rsyncServerReceive)
    if ok
-- 
1.7.4.4

Or, even better, wouldn't it make sense to have SHA backends always default to --fast and only use non-fast when any snags are hit, use non-fast mode for that file.

Though if we continue here, we should probably move this to its own page.

PS: Just to make this clear, I am using a custom alias for all my copying needs and thus didn't even see that I used --fast. :p
Comments on this page are closed.