Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 1.8.8
-
e2fsprogs 1.41.90.wc2
-
3
-
11017
Description
After a power loss, an older e2fsck (e2fsprogs 1.41.90.wc2) was run on the OSTs. It found tons of multiply-claimed blocks, including for the /O directory. Here's an example of one of the inodes:
File ... (inode #17825793, mod time Wed Aug 15 19:02:25 2012) has 1 multiply-claimed block(s), shared with 1 file(s): /O (inode #84934657, mod time Wed Aug 15 19:02:25 2012) Clone multiply-claimed blocks? yes Inode 17825793 doesn't have an associated directory entry, it eventually gets put into lost+found.
So the questions are:
- how could this have happened? My slightly-informed-probably-wrong theory is that the journal got corrupted and it replayed some old inodes back into existence. I noticed there were a lot of patches dealing with journal checksums committed after 1.41.90.
- what's the best way to deal with these? Cloning takes forever when you are talking about TB sized files. I tested the delete extended option, and it looks like it deletes both sides of the file. It would be nice if it just deleted the unlinked side. Right now my plan is to create a debugfs script from the read-only e2fsck output, but if there is a better way, that would be good.
Thanks.
I don't think "use other data structure to record duplicate blocks / inodes" was ever mentioned. The data structures themselves are fine. However, in e2fsck pass 1 there is only normally a bitmap kept of in-use blocks, and only if there are collisions in the bitmap (i.e. blocks shared by multiple users) does pass 1b/1c run to track the owning inode(s) of every block. That is done in order to reduce memory usage for block bitmap tracking significantly (by a factor of 32) during normal e2fsck runs. The one potential improvement that I mentioned was to track shared blocks in the superblock or similar (or allow it to be specified on the e2fsck command line) so that the block owners are tracked in pass 1 so that pass 1b doesn't need to scan them again. I'm not sure if that would be a significant improvement, just an idea I had.