[LU-11485] MDS allows "lfs setstripe" to mark last mirror as stale Created: 09/Oct/18  Updated: 26/Jun/20  Resolved: 21/Sep/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0, Lustre 2.13.0, Lustre 2.12.2
Fix Version/s: Lustre 2.13.0, Lustre 2.12.3

Type: Bug Priority: Major
Reporter: Patrick Farrell (Inactive) Assignee: Jian Yu
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11142 test-framework.sh run_e2fsck masks re... Open
is related to LU-11486 FLR allows overlapping "write preferr... Resolved
is related to LU-12649 Tracker for ongoing FLR improvements Open
is related to LU-13720 "lfs mirror delete" should resync fil... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

A trivial modification to sanity-flr test_0h shows a serious problem.

If we write to the file before changing the layout flags, the file metadata is corrupted:

 lfs mirror create -N -E 1M -S 1M --flags=prefer -E eof -N2 tfile
 echo 1 > tfile
#set flags to the first component
 lfs setstripe --comp-set -I 0x10001 --comp-flags=^prefer,stale tfile
ls -la 
 ls: cannot access tfile: Input/output error
 total 16
 drwxr-xr-x 3 root root 4096 Oct 9 13:00 .
 drwxr-xr-x. 43 root root 122880 Sep 27 22:49 ..
 -?????????? ? ? ? ? ? tfile

 So this corrupts the layout.  I tried adding https://review.whamcloud.com/#/c/32847/ (LU-11158 mdt: grow lvb buffer to hold layout) and it did not resolve this issue.

 

But additionally, I'm not sure what the correct behavior is here - should it be possible to remove the prefer flag in this situation, while the other replicas are stale?

 

And to set stale on the write replica?  (That, at least, seems clearly incorrect.)

The effect of this (... if it worked ...) would presumably be that all the replicas are stale, which would be unresolvable, and so should not be allowed.  (Separate from the corruption)

There are various implications here.



 Comments   
Comment by Patrick Farrell (Inactive) [ 09/Oct/18 ]

See LU-11486 for another issue found looking at sanity-flr.sh test_0h.

Comment by Patrick Farrell (Inactive) [ 09/Oct/18 ]

FYI that I am not actively working this one (Sorry).  It's behind several other things on the priority list here.  I'll update if that changes.

Comment by Andreas Dilger [ 10/Oct/18 ]

I think the root of the problem is that you shouldn't be able to mark the last copy of a mirror stale like this. I don't think I'd go so far as to say this corrupts the layout, just that the file does not have a valid component to read from and returns an IO error, which shouldn't be allowed.

As I mentioned in my review of LU-11482 flr: Inherit flags from template, while there are some flags that can be set from userspace, it isn't enough to check them only in lfs, they also need to be verified by the MDS that they make sense. We should never be able to set "init" directly from the client, and "stale" shouldn't be allowed on the last/only mirror of a file. I think a small tweak to the 33326 should fix part of this problem (set bad flags at create time), but a bit more work is needed to disallow setting stale.

Comment by Andreas Dilger [ 10/Oct/18 ]

As for removing the prefer flag from the layout, I think that is fine. This is only a hint to the MDS for which mirror to select as primary when writing to a file, but if the prefer mirror is unavailable for some reason (e.g. OST offline) then the MDS will pick some other mirror (if one is available).

Note that the current FLR implementation is such that it will pick a whole mirror for write, and mark all other mirrors stale, so if any preferred OST is unavailable then the whole mirror will be skipped. This has availability implications as well, since a few offline OSTs might make the whole file unwritable. If there are multiple components to a mirror (e.g. PFL), in theory it would be possible to mark only one component of a preferred mirror stale, and use the remaining preferred components for the write, but there are non-trivial complexities in this case (e.g. if the components are not aligned between mirrors), so it was left out of the initial FLR implementation.

Comment by Patrick Farrell (Inactive) [ 11/Oct/18 ]

Ugh, bizarre - Something is eating my mails from Whamcloud Jira.  Sorry for not replying earlier, I didn't realize you had commented.

Corrupts is too strong - I didn't realize it was just an i/o error causing the bad ls output - but it's not recoverable either, at least not for the user who caused it.  Thanks for clarifying (here and in the other LU).

So as you said, we can't let them stale the last non-stale copy...  There's some complexity there, but manageable.

Comment by Andreas Dilger [ 08/Aug/19 ]

This still seems like a fairly serious issue that has a fairly straight forward solution - have the MDS check that setting the flags does not mark the last copy of a mirror STALE.

Similarly, we shouldn't allow "lfs mirror split" to remove the last non-STALE mirror of a file (in case this is an issue).

Comment by Jian Yu [ 08/Sep/19 ]

For setting stale flag, the check needs to be done in lod_declare_layout_set(), and the lme_stale field in struct lod_mirror_entry can be used to check if a component is stale or not.
For "lfs mirror split", the check needs to be done in mirror_split().

Comment by Gerrit Updater [ 10/Sep/19 ]

Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36141
Subject: LU-11485 lod: disallow setting the last non-stale mirror as stale
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: cd7fb425ed8a8282b3aa0894b4ad654f7ba88acb

Comment by Gerrit Updater [ 16/Sep/19 ]

Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36195
Subject: LU-11485 lod: disallow setting the last non-stale mirror as stale
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 45421f91ff1318f35ee6d433fdb89b8421c9659c

Comment by Gerrit Updater [ 20/Sep/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36141/
Subject: LU-11485 lod: disallow setting the last non-stale mirror as stale
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b380a1f9a1daa1683593d0ee4a508da859277164

Comment by Peter Jones [ 21/Sep/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 23/Sep/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36195/
Subject: LU-11485 lod: disallow setting the last non-stale mirror as stale
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 79346930b340f73f27c28a1c53c73cc2138b8508

Generated at Sat Feb 10 02:44:17 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.