Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.11.0, Lustre 2.13.0, Lustre 2.12.2
-
None
-
3
-
9223372036854775807
Description
A trivial modification to sanity-flr test_0h shows a serious problem.
If we write to the file before changing the layout flags, the file metadata is corrupted:
lfs mirror create -N -E 1M -S 1M --flags=prefer -E eof -N2 tfile echo 1 > tfile #set flags to the first component lfs setstripe --comp-set -I 0x10001 --comp-flags=^prefer,stale tfile ls -la ls: cannot access tfile: Input/output error total 16 drwxr-xr-x 3 root root 4096 Oct 9 13:00 . drwxr-xr-x. 43 root root 122880 Sep 27 22:49 .. -?????????? ? ? ? ? ? tfile
So this corrupts the layout. I tried adding https://review.whamcloud.com/#/c/32847/ (LU-11158 mdt: grow lvb buffer to hold layout) and it did not resolve this issue.
But additionally, I'm not sure what the correct behavior is here - should it be possible to remove the prefer flag in this situation, while the other replicas are stale?
And to set stale on the write replica? (That, at least, seems clearly incorrect.)
The effect of this (... if it worked ...) would presumably be that all the replicas are stale, which would be unresolvable, and so should not be allowed. (Separate from the corruption)
There are various implications here.
For setting stale flag, the check needs to be done in lod_declare_layout_set(), and the lme_stale field in struct lod_mirror_entry can be used to check if a component is stale or not.
For "lfs mirror split", the check needs to be done in mirror_split().