git merge watch_

My cursor has been mentally poised here all day, but I've been reluctant to merge watch into master. It seems solid, but is it correct? I was able to think up a lot of races it'd be subject to, and deal with them, but did I find them all?

Perhaps I need to do some automated fuzz testing to reassure myself. I looked into using genbackupdata to that end. It's not quite what I need, but could be moved in that direction. Or I could write my own fuzz tester, but it seems better to use someone else's, because a) laziness and b) they're less likely to have the same blind spots I do.

My reluctance to merge isn't helped by the known bugs with files that are either already open before git annex watch starts, or are opened by two processes at once, and confuse it into annexing the still-open file when one process closes it.

I've been thinking about just running lsof on every file as it's being annexed to check for that, but in the end, lsof is too slow. Since its check involves trawling through all of /proc, it takes it a good half a second to check a file, and adding 25 seconds to the time it takes to process 100 files is just not acceptable.

But an option that could work is to run lsof after a bunch of new files have been annexed. It can check a lot of files nearly as fast as a single one. In the rare case that an annexed file is indeed still open, it could be moved back out of the annex. Then when its remaining writer finally closes it, another inotify event would re-annex it.

A downside of relying on lsof is that you might be painting yourself into a linux corner: other operating systems might not have a lsof or alternative you can rely on. Especially for Windows this might be a worry.
Comment by http://wiggy.net/ Fri Jun 15 07:19:23 2012

wasn't there some filesystem functionality that could tell you the amount of open file handles on a certain file? I thought this was tracked per-file too. Or maybe i'm just confusing it with the number of hard links (which stat can tell you), anyway something to look into.

Comment by http://dieter-be.myopenid.com/ Fri Jun 15 08:21:37 2012
I would also be reluctant to use lsof for the sake of non-linux systems or systems that don't have lsof. I've only been playing around with the watch branch of my "other" laptop under archlinux. It looks usable, however I would prefer support for OSX before the watch branch gets merged to master ;)

Corner case, but if the other program finishes writing while you are annexing and your check shows no open files, you are left with bad checksum on a correct file. This "broken" file with propagate and the next round of fsck will show that all copies are "bad".

Without verifying if this is viable, could you set the file RO and thus block future writes before starting to annex?

@wichert All this inotify stuff is entirely linux specific AFAIK anyway, so it's find for workarounds to limitations in inotify functionality to also be linux specific.

@dieter I think you're thinking of hard links, filesystems don't track number of open file handles afaik.

@Jimmy, I'm planning to get watch going on freebsd (and hopefully that will also cover OSX), after merging it :)

@Richard, the file is set RO while it's being annexed, so any lsof would come after that point.

Comment by http://joeyh.name/ Fri Jun 15 15:14:52 2012
But Rich is right, and I was thinking the same thing earlier this morning, that delaying the lsof allows the writer to change the file and exit, and only fsck can detect the problem then. Setting file permissions doesn't help once a process already has it open for write. Which has put me off the delayed lsof idea unfortunately. lsof could be run safely during the intial annexing.
Comment by http://joeyh.name/ Fri Jun 15 15:23:21 2012
Comments on this page are closed.