Details
-
Technical task
-
Resolution: Fixed
-
Blocker
-
Lustre 2.4.0, Lustre 2.1.3, Lustre 2.1.5
-
The MDT is formatted/modified to have the "extents" feature enabled, but I'm not sure if that is relevant for this part of the problem.
-
6170
Description
It appears that there is a systematic corruption of the ".." entry in the directory, possibly only affecting htree directories when the "dirdata" feature is enabled.
e2fsck 1.42.6.wc2 (10-Dec-2012) nbp1-MDT0000 has been mounted 121 times without being checked, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure [...snip...] Entry '..' in /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles/plt00000 (40930307) is duplicate '..' entry. Fix? no Entry '..' in /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles/plt00000 (40930307) is duplicate '..' entry. Fix? no Entry '..' in /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles/plt00000 (40930307) is a link to directory /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles (40919273). Second entry 'Header' (inode=40967687) in directory inode 40967686 should be '..' Fix? no Entry '..' in /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles/plt00033 (40967686) is duplicate '..' entry. Fix? no Entry '..' in /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles/plt00033 (40967686) is duplicate '..' entry. Fix? no Entry '..' in /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles/plt00033 (40967686) is a link to directory /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles (40919273). Second entry 'Header' (inode=40971040) in directory inode 40971039 should be '..' Fix? no Entry '..' in /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles/plt00029 (40971039) is duplicate '..' entry. Fix? no Entry '..' in /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles/plt00029 (40971039) is duplicate '..' entry. Fix? no Entry '..' in /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles/plt00029 (40971039) is a link to directory /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles (40919273). Second entry 'Header' (inode=44588784) in directory inode 44588782 should be '..' Fix? no Entry '..' in /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles/plt00036 (44588782) is duplicate '..' entry. Fix? no Entry '..' in /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles/plt00036 (44588782) is duplicate '..' entry. Fix? no Entry '..' in /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles/plt00036 (44588782) is a link to directory /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles (40919273). Second entry 'Header' (inode=47195750) in directory inode 47195749 should be '..' Fix? no [...snip...] Pass 3: Checking directory connectivity '..' in /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles/plt00000 (40930307) is <The NULL inode> (0), should be /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles (40919273). Fix? no '..' in /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles/plt00033 (40967686) is <The NULL inode> (0), should be /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles (40919273). Fix? no '..' in /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles/plt00029 (40971039) is <The NULL inode> (0), should be /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles (40919273). Fix? no '..' in /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles/plt00036 (44588782) is <The NULL inode> (0), should be /ROOT/arosen2/vmcluster/nodrp/T20K_alpha1_prodrun/pltfiles (40919273). Fix? no [...snip...]
Rick - You can avoid this problem happening to any more directories by unmounting your MDT, turning off dirdata, and remounting. However, those which are damaged are damaged and e2fsck and recovery from lost+found is your only option. The software fix* will prevent further damage, but it won't help with existing damaged directories.
*About that fix: You're almost certainly seeing the closely related https://jira.hpdd.intel.com/browse/LU-5626, rather than
LU-2638.LU-2638was fixed before release of 2.4, so unless you updated to 2.1 from 1.8 and are still running 2.1 (or 2.2 or 2.3, I guess), it'sLU-5626.LU-5626is similar but subtly differently and did get in to 2.4 and 2.5 as released. Note also that if you're hitting this situation and running without the fix, there's also the possibility of kernel panic'ing your MDS, so the workaround of turning off dirdata temporarily is a very good idea (if you can't get a software update quickly).