I ran git-annex (git version) on three machines with ghc-7.0.2 for about a month, but recently (no more than a week ago) I've started getting this error for every file on "git annex get":

git-annex-shell: internal error: evacuate(static): strange closure type 30799
    (GHC version 7.0.2 for i386_unknown_linux)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

There were no changes to ghc or it's modules, so I assume something has changed in git-annex itself.

strace shows "git annnex get" (on "host1") performing following exec's:

[pid  9481] execve("/usr/bin/rsync", ["rsync", "-p", "--progress", "--inplace", "-e", "'ssh' 'user@host2' 'git-annex-shell ''sendkey'' ''/remote/path'' ''SHA1-s6654080--abd8edec20648ade69351d68ae1c64c8074a6f0b'' ''--'''", ":", "/local/path/.git/annex/tmp/SHA1-s6654080--abd8edec20648ade69351d68ae1c64c8074a6f0b"], [/* 41 vars */]) = 0
[pid  9482] execve("/usr/bin/ssh", ["ssh", "user@host2", "git-annex-shell 'sendkey' '/remote/path' 'SHA1-s6654080--abd8edec20648ade69351d68ae1c64c8074a6f0b' '--'", "", "rsync", "--server", "--sender", "-vpe.Lsf", "--inplace", ".", ""], [/* 41 vars */] <unfinished ...>

I've tried running the second command directly from the shell and got the same error message from a remote GHC. Adding strace before git-annex-shell to remote command yielded something like this in the end:

stat64("/local/path.git", 0xb727d610) = -1 ENOENT (No such file or directory)
stat64("/local/path.git", 0xb727d6b0) = -1 ENOENT (No such file or directory)
waitpid(7525, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 7525
chdir("/home/user")                  = 0
rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
write(2, "git-annex-shell: internal error: ", 33git-annex-shell: internal error: ) = 33
...

Note that "/local/path" here is not what's specified in rsync arguments at all, and git repo with files-to-be-fetched on "host2" is in "/remote/path", but "/local/path" is present in git remotes there since I mount it via nfs from "host1" (yes, to the same path as it's there):

[remote "nfs"]
  url = /local/path
  fetch = +refs/heads/*:refs/remotes/nfs/*
  push = refs/heads/*:refs/remotes/host2/*
  annex-uuid = 0a4e14ba-5236-11e0-9004-7f24452c0f05

If I comment that remote out from "/remote/path/.git/config", "git annex get" works fine. The only git-command git-annex-shell seem to exec there (on "host2") is "git config --list", so it's shouldn't be git trying to do something with it's remotes - it's git-annex itself, right?

Anyways, looks like a simple path-joining error, if "/local/path.git" should be "/local/path/.git" there.

I'm actually quite confused about what it's trying to do with that path. Connect from "host1" to "host2" just to connect back to "host1"? What for, when it should just fetch files from "host2"?

git-annex (and git-annex shell) always start up by learning what git remotes are locally configured, and this includes checking them to try to look up their annex.uuid setting.

Since git will, given a remote like "url = /foo", first look in "/foo.git" for a bare git repository, so too does git-annex. I do not think this is a path joining error. That seems likely to be a red herring. --Joey

Not sure if it's a bug or I'm doing something wrong, but if git-annex really need to check something in git remotes' paths, error message (the one at the top of this post) can be a more descriptive, I guess. Something like "error: failed to do something with git remote X on a remote host" would've been a lot less confusing than that GHC thing.

Thanks!

I've never seen anything like this error message. I don't know if the problem is caused by building with GHC 7, or what. You didn't say what OS you're using. Searching for the error message, it seems to involve Mac OS X.

For example: http://hackage.haskell.org/trac/ghc/ticket/3771

The error "strange closure type" indicates some kind of memory corruption, which can have many different causes, from bugs in the GC to hardware failures.

You said that you'd been using git-annex built with that version of GHC successfully before. Perhaps you could use git bisect to see if you can identify a point in git-annex's history where this started happening? Since you can reproduce the problem by just running git-annex-shell at the command line with the right parameters, it should be easy to bisect it.

Probably your best bet will be changing to a different version or build of GHC.. --Joey


forwarded to GHC upstream; closing done --Joey

Hm, if path's ok, guess there's no way around git-bisect indeed. Wonder if there's some kind of ccache for haskell...

OS is linux, amd64 on "host1" and i386 on "host2" where git-annex-shell is crashing. I'll try to come up with a commit, thanks for clarifications.

Comment by http://fraggod.pip.verisignlabs.com.pip.verisignlabs.com/ Sun Apr 3 04:45:49 2011

Completed git-bisect twice, getting roughly the same results:

828a84ba3341d4b7a84292d8b9002a8095dd2382 is the first bad commit
commit 828a84ba3341d4b7a84292d8b9002a8095dd2382
Author: Joey Hess <joey@kitenet.net>
Date:   Sat Mar 19 14:33:24 2011 -0400

    Add version command to show git-annex version as well as repository version information.

:040000 040000 ed849b7b6e9b177d6887ecebd6a0f146357824f3 1c98699dfd3fc3a3e2ce6b55150c4ef917de96e9 M      Command
:100644 100644 b9c22bdfb403b0bdb1999411ccfd34e934f45f5c adf07e5b3e6260b296c982a01a73116b8a9a023c M      GitAnnex.hs
:100644 100644 76dd156f83f3d757e1c20c80d689d24d0c533e16 d201cc73edb31f833b6d00edcbe4cf3f48eaecb0 M      Upgrade.hs
:100644 100644 5f414e93b84589473af5b093381694090c278e50 d4a58d77a29a6a02daf13cec0df08b5aab74f65e M      Version.hs
:100644 100644 f5c2956488a7afafd20374873d79579fb09b1677 f8cd577e992d38c7ec1438ce5c141eb0eb410243 M      configure.hs
:040000 040000 f9b7295e997c0a5b1dda352f151417564458bd6e a30008475c1889f4fd8d60d4d9c982563380a692 M      debian
:040000 040000 9d87a5d8b9b9fe7b722df303252ffd5760d66f75 08834f61a10d36651b3cdcc38389f45991acdf5e M      doc

contents of final refs/bisect:

bad (828a84ba3341d4b7a84292d8b9002a8095dd2382)
good-33cb114be5135ce02671d8ce80440d40e97ca824
good-942480c47f69e13cf053b8f50c98c2ce4eaa256e
good-ca48255495e1b8ef4bda5f7f019c482d2a59b431

"roughly" because second bisect gave two commits as a result, failing to build one of them (missing .o file on link, guess it's because of -j4 and bad deps in that version's build system):

There are only 'skip'ped commits left to test.
The first bad commit could be any of:
828a84ba3341d4b7a84292d8b9002a8095dd2382
5022a69e45a073046a2b14b6a4e798910c920ee9
We cannot bisect more!

Also noticed that "git-annex-shell ..." command succeeds if ran as root user, while failing from unprivileged one. There are no permission/access errors in "strace -f git-annex-shell ...", so I guess it could be some bug in the GHC indeed.

JIC, logged a whole second bisect operation. Resulting log: http://fraggod.net/static/share/git-annex-bisect.log

Bisect script I've used (git-annex-shell dies with error code 134 - SIGABRT on GHC error):

res=
while true; do
  if  -n "$res" ; then
    cd /var/tmp/paludis/build/dev-scm-git-annex-scm.bak/work/git-annex-scm
    echo "---=== BISECT ($res) ===---"; git bisect "$res" 2>&1; echo '---=== /BISECT ===---'
    cd
    rm -Rf /var/tmp/paludis/build/dev-scm-git-annex-scm
    cp -a --reflink=auto /var/tmp/paludis/build/dev-scm-git-annex-scm{.bak,}
    chown -R paludisbuild: /var/tmp/paludis/build/dev-scm-git-annex-scm
  fi
  res=
  cave resolve -zx1 git-annex --skip-until-phase configure || res=skip
  if  -z "$res" ; then
    cd /remote/path
    sudo -u user git-annex-shell 'sendkey' '/remote/path' 'SHA1-s6654080--abd8edec20648ade69351d68ae1c64c8074a6f0b' '--' rsync --server --sender -vpe.Lsf --inplace . ''
    if  $? -eq 134 ; then res=bad; else res=good; fi
    cd
  fi
done 2>&1 | tee ~/git-annex-bisect.log
Comment by http://fraggod.pip.verisignlabs.com.pip.verisignlabs.com/ Sun Apr 3 06:22:15 2011

Repeated bisect with -j1, just to be sure it's not a random error, and it gave me 828a84ba3341d4b7a84292d8b9002a8095dd2382 again. Guess I'll look through the changes there a bit later and try to revert these until it works.

Not sure if it's repeatable by anyone but me (and hence worth fixing), but here's a bit more of info about the system:

Exherbo linux
Linux sacrilege 2.6.38.2-fg.roam #4 SMP PREEMPT Mon Mar 28 21:08:47 YEKST 2011 i686 GNU/Linux

dev-lang/ghc-7.0.2:7.0.2::installed
dev-haskell/HUnit-1.2.2.3:1.2.2.3::installed
dev-haskell/MissingH-1.1.0.3:1.1.0.3::installed
dev-haskell/QuickCheck-2.4.0.1:2.4.0.1::installed
dev-haskell/array-0.3.0.2:0.3.0.2::installed
dev-haskell/bytestring-0.9.1.7:0.9.1.7::installed
dev-haskell/containers-0.4.0.0:0.4.0.0::installed
dev-haskell/extensible-exceptions-0.1.1.2:0.1.1.2::installed
dev-haskell/filepath-1.2.0.0:1.2.0.0::installed
dev-haskell/hslogger-1.1.3:0::installed
dev-haskell/mtl-2.0.1.0:2.0.1.0::installed
dev-haskell/network-2.3.0.1:2.3.0.1::installed
dev-haskell/old-locale-1.0.0.2:1.0.0.2::installed
dev-haskell/parsec-3.1.0:3.1.0::installed
dev-haskell/pcre-light-0.4:0::installed
dev-haskell/regex-base-0.93.2:0.93.2::installed
dev-haskell/regex-compat-0.93.1:0.93.1::installed
dev-haskell/regex-posix-0.94.4:0.94.4::installed
dev-haskell/syb-0.3:0.3::installed
dev-haskell/transformers-0.2.2.0:0.2.2.0::installed
dev-haskell/utf8-string-0.3.6:0.3.6::installed

(some stuff listed here as ::installed, but contains no files, since these packages detect whether ghc-7.0.2 already comes with the same/newer package version)

Comment by http://fraggod.pip.verisignlabs.com.pip.verisignlabs.com/ Sun Apr 3 06:57:02 2011

Nice work on the bisection. It's obviously a compiler bug. Having two test cases that differ in only as trivial and innocous a commit as 828a84ba3341d4b7a84292d8b9002a8095dd2382 might help a GHC developer track it down.

We should probably forward this as a GHC bug. I hope you can find a different version or build of GHC to build git-annex with.

Comment by http://joey.kitenet.net/ Sun Apr 3 16:06:34 2011

Finally got around to report the issue to GHC tracker.

Looks quite alike (at least to the haskell-illiterate person like me) to a highest-priority issue that's hanging right at the top of the list. There are other similar reports, but they seem to be either related to PowerPC Macs, closed as invalid or due to needinfo inactivity.

Guess any further discussion belongs there, unless ghc developers will bounce it back. Thanks a lot for your help, Joey, and for sharing a great thing that git-annex is.

Comment by http://fraggod.pip.verisignlabs.com.pip.verisignlabs.com/ Thu Apr 7 13:44:36 2011
Comments on this page are closed.