Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Lustre 2.8.0
-
None
-
Lustre ldiskfs server back end running version 2.8.1 with a few additional patches. The OS is RHEL6.9
-
2
-
9223372036854775807
Description
One of production file systems running lustre 2.8.1 experienced a soft lock up very similar to LU-9488. I attempted to back port the patch but way to many changes have happened between 2.8.1 and lustre 2.10.0. Unsure if I would get the port right. I have attached the back trace.
"fail_loc=0x190" will slow down the OI scrub scanning, then we can have time to inject other failures before the OI scrub complete.
"fail_loc=0x198" will make the OI scrub iteration repeatedly scan the same bits for inode table. If without our former patches (28903/4/5/6), then the OI scrub will fall into soft lockup. But because we have such patches, then OI scrub can detect such dead repeat then move forward. So no soft lockup is the expected behavior.