[LU-14470] striped directory layout mismatch after failover Created: 23/Feb/21 Updated: 28/Sep/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Andriy Skulysh | Assignee: | Lai Siyao |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
[15965.280047] LustreError: 23882:0:(llite_lib.c:1442:ll_update_lsm_md()) lustre: [0x200008107:0x10653:0x0] dir layout mismatch: [15965.283219] LustreError: 23882:0:(lustre_lmv.h:99:lsm_md_dump()) magic 0xcd20cd0 stripe count 2 master mdt 0 hash type 0x2 version 0 migrate offset 0 migrate hash 0x0 pool [15965.287312] LustreError: 23882:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[0] [0x20000dec0:0x7:0x0] [15965.289569] LustreError: 23882:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[1] [0x24000e690:0x7:0x0] [15965.291807] LustreError: 23882:0:(lustre_lmv.h:99:lsm_md_dump()) magic 0xcd20cd0 stripe count 2 master mdt 0 hash type 0x2 version 0 migrate offset 0 migrate hash 0x0 pool [15965.295841] LustreError: 23882:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[0] [0x20000e690:0x3:0x0] [15965.298063] LustreError: 23882:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[1] [0x24000ee60:0x1:0x0] [15965.310206] LustreError: 23884:0:(lustre_lmv.h:99:lsm_md_dump()) magic 0xcd20cd0 stripe count 2 master mdt 0 hash type 0x2 version 0 migrate offset 0 migrate hash 0x0 pool [15965.314355] LustreError: 23884:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[0] [0x20000dec0:0x7:0x0] [15965.316652] LustreError: 23884:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[1] [0x24000e690:0x7:0x0] [15965.318881] LustreError: 23884:0:(lustre_lmv.h:99:lsm_md_dump()) magic 0xcd20cd0 stripe count 2 master mdt 0 hash type 0x2 version 0 migrate offset 0 migrate hash 0x0 pool [15965.322888] LustreError: 23884:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[0] [0x20000e690:0x3:0x0] [15965.325121] LustreError: 23884:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[1] [0x24000ee60:0x1:0x0] [15965.340329] LustreError: 23886:0:(lustre_lmv.h:99:lsm_md_dump()) magic 0xcd20cd0 stripe count 2 master mdt 0 hash type 0x2 version 0 migrate offset 0 migrate hash 0x0 pool [15965.344411] LustreError: 23886:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[0] [0x20000dec0:0x7:0x0] [15965.346655] LustreError: 23886:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[1] [0x24000e690:0x7:0x0] [15965.348866] LustreError: 23886:0:(lustre_lmv.h:99:lsm_md_dump()) magic 0xcd20cd0 stripe count 2 master mdt 0 hash type 0x2 version 0 migrate offset 0 migrate hash 0x0 pool [15965.352827] LustreError: 23886:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[0] [0x20000e690:0x3:0x0] [15965.355133] LustreError: 23886:0:(lustre_lmv.h:103:lsm_md_dump()) stripe[1] [0x24000ee60:0x1:0x0] [15965.357439] LustreError: 23886:0:(llite_lib.c:2471:ll_prep_inode()) new_inode -fatal: rc -22 Create request is replayed but MDS creates striped directory shards with new fids, so client fails layout check. For me it looks like a design flaw. Client should replay create request with previously allocated fids and MDS should recreate directory shards using client fids. |
| Comments |
| Comment by Gerrit Updater [ 23/Feb/21 ] |
|
Andriy Skulysh (c17819@cray.com) uploaded a new patch: https://review.whamcloud.com/41731 |
| Comment by Lai Siyao [ 15/Mar/21 ] |
|
Dual or multiple MDT failure recovery is hard to support. For striped directory creation, the stripe FIDs are allocated by MDTs, and these FIDs may not be usable in recovery(meta sequence not allocated yet), or they may be used by other objects (FID allocated by other operations in recovery). But if we don't reuse the FIDs, there are other problems, even if we update directory layout after replay (instead of reporting error currently), the following touch under this striped directory can't be replayed because its parent has changed. |
| Comment by Lai Siyao [ 16/Mar/21 ] |
|
Some distributed transaction replays need all the information stored in update logs, because these transactions allocated FID on MDTs, e.g. striped directory creation and directory migration (to striped directory). Such operation replay can't be done from client side. One way to improve this is to store update logs on more MDTs than those involved, e.g., if a striped directory is created on MDT0 and MDT1, it also stores update logs on MDT2 (this can be configured, and may have more backups), so upon recovery, all the information can be obtained from MDT2 and successfully replayed. |
| Comment by Andreas Dilger [ 26/Nov/21 ] |
|
Since the FIDs for the MDT directory stripes are allocated by the MDS, I think there are two options here:
I do not think that storing the update logs on other MDTs is a good solution to this problem for several reasons:
|
| Comment by Lai Siyao [ 15/Dec/21 ] |
|
After some tests, I found this should be a test script issue. The sync-on-lock-cancel mechanism has eliminated dependency between striped directory creation and sub file creation: In summary, if two operation replays have dependency, but they may be replayed in random order (because they are replayed on different MDTs), this dependency should be eliminated (either by commit-on-sharing or by sync-on-lock-cancel). And in this case, there is no need to implement striped directory client replay, because if there is no subsequent operation depending on it, it can be a fresh new creation, while if there exists dependency, this dependency will be eliminated by sync-on-lock-cancel. BTW, I'm afraid this can't be tested with replay_barrier() which will fail step 2, so the test failed in the end. This should probably be tested on real machines. I will add a test to verify sync-on-lock-cancel is triggered for sub file creation under striped directory. |
| Comment by Gerrit Updater [ 16/Dec/21 ] |
|
"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45870 |
| Comment by Gerrit Updater [ 18/May/22 ] |
|
"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47385 |
| Comment by Gerrit Updater [ 28/Sep/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/47385/ |