[LU-11485] MDS allows "lfs setstripe" to mark last mirror as stale Created: 09/Oct/18 Updated: 26/Jun/20 Resolved: 21/Sep/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0, Lustre 2.13.0, Lustre 2.12.2 |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.3 |
| Type: | Bug | Priority: | Major |
| Reporter: | Patrick Farrell (Inactive) | Assignee: | Jian Yu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
A trivial modification to sanity-flr test_0h shows a serious problem. If we write to the file before changing the layout flags, the file metadata is corrupted: lfs mirror create -N -E 1M -S 1M --flags=prefer -E eof -N2 tfile echo 1 > tfile #set flags to the first component lfs setstripe --comp-set -I 0x10001 --comp-flags=^prefer,stale tfile ls -la ls: cannot access tfile: Input/output error total 16 drwxr-xr-x 3 root root 4096 Oct 9 13:00 . drwxr-xr-x. 43 root root 122880 Sep 27 22:49 .. -?????????? ? ? ? ? ? tfile So this corrupts the layout. I tried adding https://review.whamcloud.com/#/c/32847/ (
But additionally, I'm not sure what the correct behavior is here - should it be possible to remove the prefer flag in this situation, while the other replicas are stale?
And to set stale on the write replica? (That, at least, seems clearly incorrect.) The effect of this (... if it worked ...) would presumably be that all the replicas are stale, which would be unresolvable, and so should not be allowed. (Separate from the corruption) There are various implications here. |
| Comments |
| Comment by Patrick Farrell (Inactive) [ 09/Oct/18 ] |
|
See |
| Comment by Patrick Farrell (Inactive) [ 09/Oct/18 ] |
|
FYI that I am not actively working this one (Sorry). It's behind several other things on the priority list here. I'll update if that changes. |
| Comment by Andreas Dilger [ 10/Oct/18 ] |
|
I think the root of the problem is that you shouldn't be able to mark the last copy of a mirror stale like this. I don't think I'd go so far as to say this corrupts the layout, just that the file does not have a valid component to read from and returns an IO error, which shouldn't be allowed. As I mentioned in my review of |
| Comment by Andreas Dilger [ 10/Oct/18 ] |
|
As for removing the prefer flag from the layout, I think that is fine. This is only a hint to the MDS for which mirror to select as primary when writing to a file, but if the prefer mirror is unavailable for some reason (e.g. OST offline) then the MDS will pick some other mirror (if one is available). Note that the current FLR implementation is such that it will pick a whole mirror for write, and mark all other mirrors stale, so if any preferred OST is unavailable then the whole mirror will be skipped. This has availability implications as well, since a few offline OSTs might make the whole file unwritable. If there are multiple components to a mirror (e.g. PFL), in theory it would be possible to mark only one component of a preferred mirror stale, and use the remaining preferred components for the write, but there are non-trivial complexities in this case (e.g. if the components are not aligned between mirrors), so it was left out of the initial FLR implementation. |
| Comment by Patrick Farrell (Inactive) [ 11/Oct/18 ] |
|
Ugh, bizarre - Something is eating my mails from Whamcloud Jira. Sorry for not replying earlier, I didn't realize you had commented. Corrupts is too strong - I didn't realize it was just an i/o error causing the bad ls output - but it's not recoverable either, at least not for the user who caused it. Thanks for clarifying (here and in the other LU). So as you said, we can't let them stale the last non-stale copy... There's some complexity there, but manageable. |
| Comment by Andreas Dilger [ 08/Aug/19 ] |
|
This still seems like a fairly serious issue that has a fairly straight forward solution - have the MDS check that setting the flags does not mark the last copy of a mirror STALE. Similarly, we shouldn't allow "lfs mirror split" to remove the last non-STALE mirror of a file (in case this is an issue). |
| Comment by Jian Yu [ 08/Sep/19 ] |
|
For setting stale flag, the check needs to be done in lod_declare_layout_set(), and the lme_stale field in struct lod_mirror_entry can be used to check if a component is stale or not. |
| Comment by Gerrit Updater [ 10/Sep/19 ] |
|
Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36141 |
| Comment by Gerrit Updater [ 16/Sep/19 ] |
|
Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36195 |
| Comment by Gerrit Updater [ 20/Sep/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36141/ |
| Comment by Peter Jones [ 21/Sep/19 ] |
|
Landed for 2.13 |
| Comment by Gerrit Updater [ 23/Sep/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36195/ |