[LU-14512] prohibit extend file with stale mirror Created: 11/Mar/21  Updated: 13/Mar/21  Resolved: 13/Mar/21

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Zhenyu Xu Assignee: Zhenyu Xu
Resolution: Not a Bug Votes: 0
Labels: None

Issue Links:
Related
is related to LU-14514 lod_declare_layout_del() does not che... Resolved
is related to LU-13730 Check need to mirror Extend on a WRIT... Resolved
is related to LU-13720 "lfs mirror delete" should resync fil... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Otherwise a FLR file could be created which contains multiple updated mirrors but their contents do not match.



 Comments   
Comment by Alex Zhuravlev [ 11/Mar/21 ]

how this can be? my understanding is that extending itself can't introduce any inconsistency as it copies data from replica known to be correct?

Comment by Gerrit Updater [ 11/Mar/21 ]

Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/42008
Subject: LU-14512 flr: prohibit extend file with stale mirror
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a5b1e28b5e5607863cd91da6c6155d99c8f4c988

Comment by Zhenyu Xu [ 11/Mar/21 ]

IIRC, mirror extend does not do the resync, only merge the layout and bring its data, or extend to create a new mirror with empty data.

Comment by Alex Zhuravlev [ 11/Mar/21 ]

IIRC, mirror extend does not do the resync, only merge the layout and bring its data, or extend to create a new mirror with empty data.

well, no, lfs mirror extend -N creates a mirror and (re)sync it

Comment by Zhenyu Xu [ 11/Mar/21 ]

you are right.

There's another issue, as the file contains updated mirror1 and stale mirror2 (in write-pending state), it extends a mirror3 with data migrated from mirror1, lod_declare_update_write_pending() would fail the assertion that a FLR in write shouldn't contain multiple primary mirrors.

Comment by Alex Zhuravlev [ 11/Mar/21 ]

probably https://review.whamcloud.com/#/c/42003/4 can help with that?

Comment by Zhenyu Xu [ 11/Mar/21 ]

yes that can help. Is it a good idea to prohibit un-synced FLR file to extend new mirror in general?

Comment by Alex Zhuravlev [ 11/Mar/21 ]

I'm not sure. probably someone else can comment on this rather practical question?

Comment by Andreas Dilger [ 13/Mar/21 ]

Is it a good idea to prohibit un-synced FLR file to extend new mirror in general?

No, I don't think that is a good idea to impose that limitation.

Consider a file with mirror1 (primary), mirror2 (stale). If mirror2 is on a failed OST, we may want to create a whole new mirror3 to restore redundancy, and then delete mirror2. It should always be possible to create a new mirror as long as there is a non-stale mirror.

Comment by Andreas Dilger [ 13/Mar/21 ]

In the FLR case, the stale mirror may be an older version of the file (LU-12648) that is still useful, or may still have good data at the start of a file (common case of appending write) that could be manually recovered.

Note that it is better to add the new mirror to the file before removing the stale mirror, in case there are also some non-overlapping errors in the remaining (non-stale) mirror. This often happens in RAID arrays that a drive partly fails, and then during resync there are also sector errors on the "good" drive, and if the bad drive is still partly available the few bad sectors can still be recovered.

Generated at Sat Feb 10 03:10:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.