Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1267

LFSCK II: MDT-OST consistency check/repair

Details

    • New Feature
    • Resolution: Fixed
    • Critical
    • Lustre 2.6.0
    • Lustre 2.6.0
    • 89
    • 3
    • 4024

    Description

      In Lustre, for striped (non-zero striped) file, the layout information for each OST-object is recorded as extended attributes (XATTR_NAME_LOV) in its MDT-object on MDT. Such EA contains the OST index, OID or FID of the OST-object, and so on. On OST-side, each OST-object records the information (MDT-object FID) that indicates which file the OST object belongs to. Over the lifetime of an active filesystem, the layout information in MDT-object's EA may be inconsistent with the information on OST. There are several inconsistent cases as following:

      1. OST-object1 is marked as part of file1 on MDT, but OST-object1 is unassigned or uninitialized on OST.
      2. OST-object1 is marked as part of file1 on OST, but there is no record for OST-object1 on MDT.
      3. OST-object1 is marked as part of file1 on OST, but MDT records that it belongs to file2.
      4. OST-object1 is marked as part of file1 on OST, but both file1 and file2 on MDT claim that it owns OST-object1.

      These inconsistent cases will misguide client/MDS when access related OST-objects, waste space, lose data, even destroy all the system. In LFSCK phase II, we will implement an online tool to check and repair the file layout inconsistency. Such tool will use the inode iterator implemented in LFSCK phase I to scan the whole system, and can be driven together with other LFSCK components.

      On the other hand, the owner information, UID and GID of OST-objects will also be verified to match that of the MDT-object, the inconsistent ones will be fixed, to ensure correct quota allocation.

      Attachments

        Issue Links

          Activity

            [LU-1267] LFSCK II: MDT-OST consistency check/repair

            All patches landed to Master.

            jlevi Jodi Levi (Inactive) added a comment - All patches landed to Master.
            laisiyao Lai Siyao added a comment -

            I have several concerns in the design for LWP and LFCK repair threads:
            1. IMHO LWP should be a full functional device, other than a place to store connection only. And the difference between LWP and OSP should be that LWP doesn't support recovery, so the use of this device will be simpler. And there should be a similar device LWD like LOD.
            2. in current code there are master and assistance LFSCK threads to cooperate on LFSCK, however with more component added, eg. DNE consistency, the master threads needs to communicate with several assistance threads. This is complicated and error-prone, instead we should avoid inter-process communications, so we can do it like ptlrpcd/ldlm threads: spawn worker threads pool (new threads can be created if necessary at runtime), and these worker threads pick and finish their assignment separately.
            3. statahead proves working on client side, and I don't see why it can't be done by lfsck as well, and if it can support bulk statahead, this should be quite efficient.

            laisiyao Lai Siyao added a comment - I have several concerns in the design for LWP and LFCK repair threads: 1. IMHO LWP should be a full functional device, other than a place to store connection only. And the difference between LWP and OSP should be that LWP doesn't support recovery, so the use of this device will be simpler. And there should be a similar device LWD like LOD. 2. in current code there are master and assistance LFSCK threads to cooperate on LFSCK, however with more component added, eg. DNE consistency, the master threads needs to communicate with several assistance threads. This is complicated and error-prone, instead we should avoid inter-process communications, so we can do it like ptlrpcd/ldlm threads: spawn worker threads pool (new threads can be created if necessary at runtime), and these worker threads pick and finish their assignment separately. 3. statahead proves working on client side, and I don't see why it can't be done by lfsck as well, and if it can support bulk statahead, this should be quite efficient.
            yong.fan nasf (Inactive) added a comment - - edited All the LFSCK phase II patches have been landed to master as following: 1) http://review.whamcloud.com/#/c/7145/ 2) http://review.whamcloud.com/#/c/7053/ 3) http://review.whamcloud.com/#/c/8002/ 4) http://review.whamcloud.com/#/c/7146/ 5) http://review.whamcloud.com/#/c/6997/ 6) http://review.whamcloud.com/#/c/8302/ 7) http://review.whamcloud.com/#/c/7666/ 8) http://review.whamcloud.com/#/c/7062/ 9) http://review.whamcloud.com/#/c/8623/ 10) http://review.whamcloud.com/#/c/7087/ 11) http://review.whamcloud.com/#/c/7108/ 12) http://review.whamcloud.com/#/c/7665/ 13) http://review.whamcloud.com/#/c/7156/ 14) http://review.whamcloud.com/#/c/9186/ 15) http://review.whamcloud.com/#/c/7456/ 16) http://review.whamcloud.com/#/c/7517/ 17) http://review.whamcloud.com/#/c/7519/ 18) http://review.whamcloud.com/#/c/7524/ 19) http://review.whamcloud.com/#/c/7743/ 20) http://review.whamcloud.com/#/c/9257/ 21) http://review.whamcloud.com/#/c/8303/ 22) http://review.whamcloud.com/#/c/7810/ 23) http://review.whamcloud.com/#/c/8305/ 24) http://review.whamcloud.com/#/c/7811/ 25) http://review.whamcloud.com/#/c/8694/ 26) http://review.whamcloud.com/#/c/7667/

            The task is restarted.

            yong.fan nasf (Inactive) added a comment - The task is restarted.

            This task will be postponed until DNE I and LFSCK 1.5 completed.

            yong.fan nasf (Inactive) added a comment - This task will be postponed until DNE I and LFSCK 1.5 completed.

            People

              yong.fan nasf (Inactive)
              yong.fan nasf (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: