[LU-11639] DNE migration failed with single stripe dir between 2.12 client and prior server Created: 07/Nov/18 Updated: 02/May/19 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sarah Liu | Assignee: | Qian Yingjin |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | interop | ||
| Environment: |
server: 2.10.5 |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Test DNE migration between 2.12 client and server prior to 2.12 (2.10.5 in the test) migrate failed on single stripe dir client [root@trevis-60vm7 lustre]# lfs getdirstripe -m test/ 0 [root@trevis-60vm7 lustre]# lfs migrate -m 1 test [ 7508.448202] LustreError: 11-0: lustre-MDT0000-mdc-ffff8efabb53b000: operation mds_reint to node 10.9.6.157@tcp failed: rc = -71 test migrate failed: Protocol error (-71) [root@trevis-60vm7 lustre]# lctl dl 0 UP mgc MGC10.9.6.157@tcp af419607-68f3-4546-4807-d9170b42889b 4 1 UP lov lustre-clilov-ffff8efabb53b000 06a06a8d-2f82-8a4b-a394-265eb37d4778 3 2 UP lmv lustre-clilmv-ffff8efabb53b000 06a06a8d-2f82-8a4b-a394-265eb37d4778 4 3 UP mdc lustre-MDT0000-mdc-ffff8efabb53b000 06a06a8d-2f82-8a4b-a394-265eb37d4778 4 4 UP mdc lustre-MDT0001-mdc-ffff8efabb53b000 06a06a8d-2f82-8a4b-a394-265eb37d4778 4 5 UP osc lustre-OST0000-osc-ffff8efabb53b000 06a06a8d-2f82-8a4b-a394-265eb37d4778 4 [root@trevis-60vm7 lustre]# MDS 0 [ 1595.659896] Lustre: MGS: Connection restored to 4402468b-47f5-81e4-eb32-5614a7b679dc (at 10.9.6.158@tcp) [ 1596.041271] Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000240000400-0x0000000280000400]:1:mdt [ 1963.481099] Lustre: MGS: Connection restored to b7583016-fa8d-e6d0-d2bf-689b0cd9f5de (at 10.9.6.159@tcp) [ 1963.482619] Lustre: Skipped 2 previous similar messages [ 1968.740819] Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000280000400-0x00000002c0000400]:0:ost [ 1986.129197] LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.9.6.160@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. [ 1986.131088] LustreError: Skipped 20 previous similar messages [ 1986.202837] Lustre: lustre-MDT0000: Connection restored to bf2ac545-2711-bbab-e37d-e086236dc162 (at 10.9.6.160@tcp) [ 1986.203937] Lustre: Skipped 1 previous similar message [ 7968.025390] Lustre: MGS: Connection restored to bf2ac545-2711-bbab-e37d-e086236dc162 (at 10.9.6.160@tcp) [10537.801688] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000 [10566.654108] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000 [10566.655158] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 1 previous similar message [10584.774386] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000 [10584.775446] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 1 previous similar message [10608.074356] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000 [10608.075473] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 3 previous similar messages [10614.923867] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000 [10614.924928] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 3 previous similar messages [10634.199088] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000 [10634.200110] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 1 previous similar message [10634.202903] LustreError: 5095:0:(mdt_handler.c:1951:mdt_reint_internal()) Can't unpack reint, rc -71 [10706.824226] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000 [10706.825232] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 2 previous similar messages [10722.305838] LustreError: 5095:0:(mdt_handler.c:1951:mdt_reint_internal()) Can't unpack reint, rc -71 [10774.262163] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000 [10774.263228] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 6 previous similar messages [10774.265904] LustreError: 5031:0:(mdt_handler.c:1951:mdt_reint_internal()) Can't unpack reint, rc -71 [11131.501714] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000 [11131.502781] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 2 previous similar messages [11147.375492] LustreError: 5095:0:(mdt_handler.c:1951:mdt_reint_internal()) Can't unpack reint, rc -71 [root@trevis-60vm4 ~]# |
| Comments |
| Comment by Lai Siyao [ 17/Apr/19 ] |
|
0x60000 is "MDS_ATTR_LSIZE | MDS_ATTR_LBLOCKS", and these flags are introduced by LSOM. |
| Comment by Peter Jones [ 17/Apr/19 ] |
|
Qian Any advice here? Peter |
| Comment by Qian Yingjin [ 17/Apr/19 ] |
|
There is a patch https://review.whamcloud.com/#/c/34663/ which should fix the problem of printing the error message, but it should not cause -71 error code (-EPROTO). Sarah, lctl set_param subsystem_debug=mds lctl set_param debug=trace Regards, |
| Comment by Sarah Liu [ 18/Apr/19 ] |
|
Please see the attached for the log |
| Comment by Qian Yingjin [ 19/Apr/19 ] |
|
Hi sarah, After analyzed the debug trace log, I don't think the debug log you collected is with "trace" enabled... Maybe you collected the log wrong? you should enable "trace" debug first via lctl and then execute the operations. could you please collect the debug log with trace enabled again?
Thanks, Qian |
| Comment by Sarah Liu [ 02/May/19 ] |
|
Sorry for the late response, the debug log was gotten after doing what you indicated and before I did the test lctl get_param debug debug=trace warning error emerg console [root@trevis-60vm1 tmp]# I attached another file that I triggered the trace on MDS, maybe it is what you need. |