Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11485

MDS allows "lfs setstripe" to mark last mirror as stale

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.13.0, Lustre 2.12.3
    • Lustre 2.11.0, Lustre 2.13.0, Lustre 2.12.2
    • None
    • 3
    • 9223372036854775807

    Description

      A trivial modification to sanity-flr test_0h shows a serious problem.

      If we write to the file before changing the layout flags, the file metadata is corrupted:

       lfs mirror create -N -E 1M -S 1M --flags=prefer -E eof -N2 tfile
       echo 1 > tfile
      #set flags to the first component
       lfs setstripe --comp-set -I 0x10001 --comp-flags=^prefer,stale tfile
      ls -la 
       ls: cannot access tfile: Input/output error
       total 16
       drwxr-xr-x 3 root root 4096 Oct 9 13:00 .
       drwxr-xr-x. 43 root root 122880 Sep 27 22:49 ..
       -?????????? ? ? ? ? ? tfile

       So this corrupts the layout.  I tried adding https://review.whamcloud.com/#/c/32847/ (LU-11158 mdt: grow lvb buffer to hold layout) and it did not resolve this issue.

       

      But additionally, I'm not sure what the correct behavior is here - should it be possible to remove the prefer flag in this situation, while the other replicas are stale?

       

      And to set stale on the write replica?  (That, at least, seems clearly incorrect.)

      The effect of this (... if it worked ...) would presumably be that all the replicas are stale, which would be unresolvable, and so should not be allowed.  (Separate from the corruption)

      There are various implications here.

      Attachments

        Issue Links

          Activity

            [LU-11485] MDS allows "lfs setstripe" to mark last mirror as stale

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36195/
            Subject: LU-11485 lod: disallow setting the last non-stale mirror as stale
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 79346930b340f73f27c28a1c53c73cc2138b8508

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36195/ Subject: LU-11485 lod: disallow setting the last non-stale mirror as stale Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 79346930b340f73f27c28a1c53c73cc2138b8508
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36141/
            Subject: LU-11485 lod: disallow setting the last non-stale mirror as stale
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: b380a1f9a1daa1683593d0ee4a508da859277164

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36141/ Subject: LU-11485 lod: disallow setting the last non-stale mirror as stale Project: fs/lustre-release Branch: master Current Patch Set: Commit: b380a1f9a1daa1683593d0ee4a508da859277164

            Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36195
            Subject: LU-11485 lod: disallow setting the last non-stale mirror as stale
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 45421f91ff1318f35ee6d433fdb89b8421c9659c

            gerrit Gerrit Updater added a comment - Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36195 Subject: LU-11485 lod: disallow setting the last non-stale mirror as stale Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 45421f91ff1318f35ee6d433fdb89b8421c9659c

            Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36141
            Subject: LU-11485 lod: disallow setting the last non-stale mirror as stale
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: cd7fb425ed8a8282b3aa0894b4ad654f7ba88acb

            gerrit Gerrit Updater added a comment - Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36141 Subject: LU-11485 lod: disallow setting the last non-stale mirror as stale Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: cd7fb425ed8a8282b3aa0894b4ad654f7ba88acb
            yujian Jian Yu added a comment -

            For setting stale flag, the check needs to be done in lod_declare_layout_set(), and the lme_stale field in struct lod_mirror_entry can be used to check if a component is stale or not.
            For "lfs mirror split", the check needs to be done in mirror_split().

            yujian Jian Yu added a comment - For setting stale flag, the check needs to be done in lod_declare_layout_set(), and the lme_stale field in struct lod_mirror_entry can be used to check if a component is stale or not. For "lfs mirror split", the check needs to be done in mirror_split().

            This still seems like a fairly serious issue that has a fairly straight forward solution - have the MDS check that setting the flags does not mark the last copy of a mirror STALE.

            Similarly, we shouldn't allow "lfs mirror split" to remove the last non-STALE mirror of a file (in case this is an issue).

            adilger Andreas Dilger added a comment - This still seems like a fairly serious issue that has a fairly straight forward solution - have the MDS check that setting the flags does not mark the last copy of a mirror STALE . Similarly, we shouldn't allow " lfs mirror split " to remove the last non- STALE mirror of a file (in case this is an issue).

            Ugh, bizarre - Something is eating my mails from Whamcloud Jira.  Sorry for not replying earlier, I didn't realize you had commented.

            Corrupts is too strong - I didn't realize it was just an i/o error causing the bad ls output - but it's not recoverable either, at least not for the user who caused it.  Thanks for clarifying (here and in the other LU).

            So as you said, we can't let them stale the last non-stale copy...  There's some complexity there, but manageable.

            paf Patrick Farrell (Inactive) added a comment - Ugh, bizarre - Something is eating my mails from Whamcloud Jira.  Sorry for not replying earlier, I didn't realize you had commented. Corrupts is too strong - I didn't realize it was just an i/o error causing the bad ls output - but it's not recoverable either, at least not for the user who caused it.  Thanks for clarifying (here and in the other LU). So as you said, we can't let them stale the last non-stale copy...  There's some complexity there, but manageable.

            As for removing the prefer flag from the layout, I think that is fine. This is only a hint to the MDS for which mirror to select as primary when writing to a file, but if the prefer mirror is unavailable for some reason (e.g. OST offline) then the MDS will pick some other mirror (if one is available).

            Note that the current FLR implementation is such that it will pick a whole mirror for write, and mark all other mirrors stale, so if any preferred OST is unavailable then the whole mirror will be skipped. This has availability implications as well, since a few offline OSTs might make the whole file unwritable. If there are multiple components to a mirror (e.g. PFL), in theory it would be possible to mark only one component of a preferred mirror stale, and use the remaining preferred components for the write, but there are non-trivial complexities in this case (e.g. if the components are not aligned between mirrors), so it was left out of the initial FLR implementation.

            adilger Andreas Dilger added a comment - As for removing the prefer flag from the layout, I think that is fine. This is only a hint to the MDS for which mirror to select as primary when writing to a file, but if the prefer mirror is unavailable for some reason (e.g. OST offline) then the MDS will pick some other mirror (if one is available). Note that the current FLR implementation is such that it will pick a whole mirror for write, and mark all other mirrors stale, so if any preferred OST is unavailable then the whole mirror will be skipped. This has availability implications as well, since a few offline OSTs might make the whole file unwritable. If there are multiple components to a mirror (e.g. PFL), in theory it would be possible to mark only one component of a preferred mirror stale, and use the remaining preferred components for the write, but there are non-trivial complexities in this case (e.g. if the components are not aligned between mirrors), so it was left out of the initial FLR implementation.

            I think the root of the problem is that you shouldn't be able to mark the last copy of a mirror stale like this. I don't think I'd go so far as to say this corrupts the layout, just that the file does not have a valid component to read from and returns an IO error, which shouldn't be allowed.

            As I mentioned in my review of LU-11482 flr: Inherit flags from template, while there are some flags that can be set from userspace, it isn't enough to check them only in lfs, they also need to be verified by the MDS that they make sense. We should never be able to set "init" directly from the client, and "stale" shouldn't be allowed on the last/only mirror of a file. I think a small tweak to the 33326 should fix part of this problem (set bad flags at create time), but a bit more work is needed to disallow setting stale.

            adilger Andreas Dilger added a comment - I think the root of the problem is that you shouldn't be able to mark the last copy of a mirror stale like this. I don't think I'd go so far as to say this corrupts the layout, just that the file does not have a valid component to read from and returns an IO error, which shouldn't be allowed. As I mentioned in my review of LU-11482 flr: Inherit flags from template , while there are some flags that can be set from userspace, it isn't enough to check them only in lfs , they also need to be verified by the MDS that they make sense. We should never be able to set " init " directly from the client, and " stale " shouldn't be allowed on the last/only mirror of a file. I think a small tweak to the 33326 should fix part of this problem (set bad flags at create time), but a bit more work is needed to disallow setting stale .

            People

              yujian Jian Yu
              paf Patrick Farrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: