Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5669

LFSCK 5: Distinguish dangling name entry from corrupted name entry

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.7.0
    • None
    • 3
    • 15881

    Description

      During the first-stage scanning, if the namespace LFSCK finds the name entry that references a non-existing MDT-object, then it is possible that:

      1) It is a dangling name entry. Under such case, according to our design, the LFSCK will repair as following:

      1.1) By default, the LFSCK will report the inconsistency without re-creating the lost MDT-object.

      1.2) If the administrator requires to re-create the lost MDT-object when start the LFSCK, then the LFSCK will re-create the lost MDT-object.

      2) It may be a bad name entry that contains a bad FID. Two possible situations: assume it is the entry_A, and the original FID in the name entry_A is for the obj_B, but current FID is for the obj_C:

      2.1) Only the FID in the name entry_A is corrupted. Then as the LFSCK processing in the second-stage scanning, the LFSCK will find the obj_B that back references the name entry_A via its linkEA.

      2.2) Both the name and the FID in the entry_A are corrupted. Then even though the LFSCK find the obj_B in the second-stage scanning, it still cannot know the relationship between the name entry_A and the obj_B.

      From the LFSCK view, during the first-stage scanning, it cannot distinguish whether it is case 1) or case 2). Currently, we assume it is the case 1), and repair it as 1.1) or 1.2) according to the LFSCK start options. But if it is the case 2.1), and has been handled as 1.2), then the LFSCK created the obj_C by wrong.

      To resolve above trouble, in the second-stange scanning, when the LFSCK finds the original obj_B that back references the name entry_A via its linkEA, the LFSCK will check whether someone has ever modified the obj_C after the creating: if not, it will update the FID in the name entry_A to reference the obj_B, and destroy the obj_C; otherwise, to keep the new data in the obj_C, the LFSCK will create a new name entry under lost+found to reference the obj_B.

      Current solution looks reasonable, but it maybe confuses the application that read-accessed the name entry_A during the LFSCK. It will find the target MDT-object has been changed from obj_C to obj_B.

      To avoid such confusion, the LFSCK should not re-create the lost MDT-object until it is sure that the name entry_A is dangling, not bad name entry. In theory, the LFSCK can know that after the second-stage double scanning finished successfully (all known linkEA entries have been verified), and it needs to record all related name entries in the tracing file (or another name tracing file). But even though the LFSCK did that, if it is the case 2.2), then the LFSCK still cannot distinguish it from dangling name entry. So there are still some troubles to be resolved.

      On the other hand, can the application regard the case of obj_C replaced by obj_B as "unlink obj_C and create obj_B with the same name"?

      Attachments

        Activity

          People

            yong.fan nasf (Inactive)
            yong.fan nasf (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: