Details
-
Technical task
-
Resolution: Fixed
-
Major
-
Lustre 2.7.0
-
None
-
15363
Description
Lustre is distributed, the components belong to the same file can reside on several servers, such as the file's MDT-object and the file's name entry can reside on different MDTs, the file's OST-object is on OST but its metadata is stored on the MDT.
Such distribution caused that if the LFSCK cannot verify some component during the first-stage scanning, then when handles orphans in the second-stage scanning, it is difficult to distinguish whether the missed component is really corrupted or because of former LFSCK failure.
To avoid improper repairing under above difficult cases, the LFSCK will skip some orphans handling. The most safe way is to skip all the orphans handling if the LFSCK hit some failures during the first-stage scanning. But such playing is too safe as to the LFSCK may be un-completely always. Because it is normal that some servers (MDS or OSS) may hit failure during the LFSCK scanning.
Be as some improvement, the LFSCK can records the server failure event in the LFSCK tracing file during the first-stage scanning, and can only skip the orphans that are related with the failed the servers during the second-stage scanning.