Details
-
Bug
-
Resolution: Not a Bug
-
Critical
-
None
-
Lustre 2.5.5
-
None
-
RHEL 6.8/Toss 2.5.5, DDN 7700 storage for MDT
-
1
-
9223372036854775807
Description
We had a fairly significant power outage that took down our storage over the weekend and one of our MDTs would not mount after the system came back up with a "bad file descriptor" error. The target was mounted ldiskfs looked "normal" but a subsequent lustre mount yielded the same result.
A fsck was run with a -n option and came back clean but still didn't mount, then a fsck was run with no options and still came back clean but again wouldn't mount lustre.
Finally a fsck -fy was run and all hell broke loose. Many duplicate inodes, unattached blocks, multiple attached blocks and other errors were encountered. The fsck restarted 3 times and I observed passes that I've never seen before like Pass 1b, Pass1c and a couple dealing with directories.) Before it finally completed, it stuffed >93K files into lost+found.
I am familiar with using ll_recover_lost_found_objs to recovery OST objects but I don't know what options are available for an MDT. Looking for advise here.
Thanks.