Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.4.1
-
lustre 2.4.0-17chaos (github.com/chaos/lustre)
-
3
-
10401
Description
After upgrade of our servers from 2.1 to 2.4, our MDS crashed on LU-2842, and we applied the patch. That patch avoided the LBUG, but now it is clear that there is a more basic problem that we can no longer look up a bunch of the top-level subdirectories in this lustre filesystem.
We are seeing problems like:
2013-09-11 13:01:22 LustreError: 5570:0:(mdt_open.c:1687:mdt_reint_open()) lsc-MDT0000: name purgelogs present, but fid [0x2830891e:0xd1781321:0x0] invalid
It looks to me like the directory entries are still there, but FID lookups do not work on them. We verified that the directory named "purgelogs" appears on the underlying ldiskfs filesystem at ROOT/purgelogs.
We also see error messages diring recovery shortly after the recent boot like the following:
2013-09-11 12:58:27 sumom-mds1 login: LustreError: 4164:0:(mdt_open.c:1497:mdt_reint_open()) @@@ [0x24d18001:0x3db440f0:0x0]/XXXXXX->[0x24d98604:0 x2a32454:0x0] cr_flags=0104200200001 mode=0200100000 msg_flag=0x4 not found in open replay. req@ffff8808263d1000 x1443453865661288/t0(46385661850 2) o101->f45d6fab-2c9c-6b39-0090-4935fbe03e32@192.168.115.87@o2ib10:0/0 lens 568/1176 e 0 to 0 dl 1378929568 ref 1 fl Interpret:/4/0 rc 0/0
(I X'ed out the user name there, but everything else is cut-and-paste.)
Any ideas on the next step to get these directories accessible again?