Details

    • Improvement
    • Resolution: Fixed
    • Critical
    • Lustre 2.12.0
    • None
    • 9223372036854775807

    Description

      PFID EA should already have enough information to restore mirrored files in a catastrophic failure where the metadata on the MDT is lost. After FLR is introduced, the filter_fid contains the following information:

      struct ost_layout {                                                             
              __u32   ol_stripe_size;                                                 
              __u32   ol_stripe_count;                                                
              __u64   ol_comp_start;                                                  
              __u64   ol_comp_end;                                                    
              __u32   ol_comp_id;                                                     
      } __attribute__((packed)); 
      
      struct filter_fid {                                                             
              struct lu_fid           ff_parent;                                      
              struct ost_layout       ff_layout;                                      
              __u32                   ff_layout_version;                              
              __u32                   ff_range; /* range of layout version that       
                                                 * write are allowed */               
      } __attribute__((packed)); 
      

      And component ID is composed of SEQ_ID and MIRROR_ID as follows:

      #define SEQ_ID_MAX              0x0000FFFF                                      
      #define SEQ_ID_MASK             SEQ_ID_MAX                                      
      /* bit 30:16 of lcme_id is used to store mirror id */                           
      #define MIRROR_ID_MASK          0x7FFF0000                                      
      #define MIRROR_ID_SHIFT         16  
      

      With the above information, the LFSCK just needs to use FID to identify the OST objects that belong to the same file, and use mirror ID, the upper 16 bit in component ID, to identify the components that belong to the same mirror, and the use SEQ ID and ol_comp_start, ol_comp_end in ost_layout to compose components in one mirror.

      The problem is how to identify and restore stale components. By checking the information of ff_layout_version and ff_range, it should be easy to know whether the file was being written at the time of failure; but it seems to be difficult to identify if a previous resync was complete. Therefore, we probably need more information for this purpose.

      Attachments

        Activity

          [LU-10288] LFSCK support for mirrored files
          pjones Peter Jones added a comment -

          Landed for 2.12

          pjones Peter Jones added a comment - Landed for 2.12

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32705/
          Subject: LU-10288 lfsck: layout LFSCK for mirrored file
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 36ba989752c62cc76b06089373fcd6cec6da9008

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32705/ Subject: LU-10288 lfsck: layout LFSCK for mirrored file Project: fs/lustre-release Branch: master Current Patch Set: Commit: 36ba989752c62cc76b06089373fcd6cec6da9008

          Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/32705
          Subject: LU-10288 lfsck: layout LFSCK for mirrored file
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 276b919a8b5f23c6e8d75c5907c58b760c87657c

          gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/32705 Subject: LU-10288 lfsck: layout LFSCK for mirrored file Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 276b919a8b5f23c6e8d75c5907c58b760c87657c

          The DoM related issue will be handled via another independent ticket LU-11081. This ticket will only process mirrored file without DoM component.

          yong.fan nasf (Inactive) added a comment - The DoM related issue will be handled via another independent ticket LU-11081 . This ticket will only process mirrored file without DoM component.

          Yes, the DoM is for non-zero case. I am not sure whether we still have customer to use 1.8 formatted file which MDT-object's size is not zero. If yes, then its LOV EA is non-PFL mode, so we can know it is not DoM case; if the 1.8 file lost its LOV EA, then its OST-object will declare as orphan, but 1.8 OST-object has no component ID, so the LFSCK should know it is for non-DoM file.

          yong.fan nasf (Inactive) added a comment - Yes, the DoM is for non-zero case. I am not sure whether we still have customer to use 1.8 formatted file which MDT-object's size is not zero. If yes, then its LOV EA is non-PFL mode, so we can know it is not DoM case; if the 1.8 file lost its LOV EA, then its OST-object will declare as orphan, but 1.8 OST-object has no component ID, so the LFSCK should know it is for non-DoM file.

          Another issue is about how to detect whether a file is DoM case or not if its LOV EA corrupted. Currently, the LFSCK can check the MDT-object's size on the MDT, if it is zero, then it will DoM case; otherwise, it can be handled as any case.

          Wouldn't it be true that a nonzero size and non-zero blocks means it is DoM? (Note there may be upgraded 1.8 MDTs that still store total file size in inode>i_size that we should probably fix)

          But such way will be broken once SoM related feature is introduced. So we some more reliable mechanism.

          The SOM feature will store the file size/blocks in a separate "som" xattr, so it should not interfere with DoM detection.

          adilger Andreas Dilger added a comment - Another issue is about how to detect whether a file is DoM case or not if its LOV EA corrupted. Currently, the LFSCK can check the MDT-object's size on the MDT, if it is zero, then it will DoM case; otherwise, it can be handled as any case. Wouldn't it be true that a non zero size and non-zero blocks means it is DoM? (Note there may be upgraded 1.8 MDTs that still store total file size in inode >i_size that we should probably fix) But such way will be broken once SoM related feature is introduced. So we some more reliable mechanism. The SOM feature will store the file size/blocks in a separate "som" xattr, so it should not interfere with DoM detection.
          yong.fan nasf (Inactive) added a comment - - edited

          Another issue is about how to detect whether a file is DoM case or not if its LOV EA corrupted. Currently, the LFSCK can check the MDT-object's size on the MDT, if it is non-zero, then it will DoM case; otherwise, it can be handled as any case. But such way will be broken once SoM related feature is introduced. So we some more reliable mechanism.

          yong.fan nasf (Inactive) added a comment - - edited Another issue is about how to detect whether a file is DoM case or not if its LOV EA corrupted. Currently, the LFSCK can check the MDT-object's size on the MDT, if it is non-zero, then it will DoM case; otherwise, it can be handled as any case. But such way will be broken once SoM related feature is introduced. So we some more reliable mechanism.

          Consider the DoM case, things will be more complex. The DoM data is directly attached to the MDT-object, if the LOV EA corrupted, we have to re-calculate the DoM boundary from its subsequent component range. But the component ID may be discontinuous, so during the LOV EA rebuilding, we have to guess the end boundary of DoM, and even if after the LFSCK, we may still not know exactly which one is the second component for the mirror. So we may have no way to recover the LOV EA exactly the same as before the corruption.

          yong.fan nasf (Inactive) added a comment - Consider the DoM case, things will be more complex. The DoM data is directly attached to the MDT-object, if the LOV EA corrupted, we have to re-calculate the DoM boundary from its subsequent component range. But the component ID may be discontinuous, so during the LOV EA rebuilding, we have to guess the end boundary of DoM, and even if after the LFSCK, we may still not know exactly which one is the second component for the mirror. So we may have no way to recover the LOV EA exactly the same as before the corruption.

          Moved out from under LU-9771 as it belongs to FLR2

          jgmitter Joseph Gmitter (Inactive) added a comment - Moved out from under LU-9771 as it belongs to FLR2

          People

            yong.fan nasf (Inactive)
            jay Jinshan Xiong (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: