[LU-11639] DNE migration failed with single stripe dir between 2.12 client and prior server Created: 07/Nov/18  Updated: 02/May/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sarah Liu Assignee: Qian Yingjin
Resolution: Unresolved Votes: 0
Labels: interop
Environment:

server: 2.10.5
client: lustre-master 2.11.56_55_g4afee32


Attachments: HTML File mds-trace     HTML File trace    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Test DNE migration between 2.12 client and server prior to 2.12 (2.10.5 in the test)

migrate failed on single stripe dir

client

[root@trevis-60vm7 lustre]# lfs getdirstripe -m test/
0
[root@trevis-60vm7 lustre]# lfs migrate -m 1 test
[ 7508.448202] LustreError: 11-0: lustre-MDT0000-mdc-ffff8efabb53b000: operation mds_reint to node 10.9.6.157@tcp failed: rc = -71
test migrate failed: Protocol error (-71)
[root@trevis-60vm7 lustre]# lctl dl
  0 UP mgc MGC10.9.6.157@tcp af419607-68f3-4546-4807-d9170b42889b 4
  1 UP lov lustre-clilov-ffff8efabb53b000 06a06a8d-2f82-8a4b-a394-265eb37d4778 3
  2 UP lmv lustre-clilmv-ffff8efabb53b000 06a06a8d-2f82-8a4b-a394-265eb37d4778 4
  3 UP mdc lustre-MDT0000-mdc-ffff8efabb53b000 06a06a8d-2f82-8a4b-a394-265eb37d4778 4
  4 UP mdc lustre-MDT0001-mdc-ffff8efabb53b000 06a06a8d-2f82-8a4b-a394-265eb37d4778 4
  5 UP osc lustre-OST0000-osc-ffff8efabb53b000 06a06a8d-2f82-8a4b-a394-265eb37d4778 4
[root@trevis-60vm7 lustre]#

MDS 0

[ 1595.659896] Lustre: MGS: Connection restored to 4402468b-47f5-81e4-eb32-5614a7b679dc (at 10.9.6.158@tcp)
[ 1596.041271] Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000240000400-0x0000000280000400]:1:mdt
[ 1963.481099] Lustre: MGS: Connection restored to b7583016-fa8d-e6d0-d2bf-689b0cd9f5de (at 10.9.6.159@tcp)
[ 1963.482619] Lustre: Skipped 2 previous similar messages
[ 1968.740819] Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000280000400-0x00000002c0000400]:0:ost
[ 1986.129197] LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.9.6.160@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
[ 1986.131088] LustreError: Skipped 20 previous similar messages
[ 1986.202837] Lustre: lustre-MDT0000: Connection restored to bf2ac545-2711-bbab-e37d-e086236dc162 (at 10.9.6.160@tcp)
[ 1986.203937] Lustre: Skipped 1 previous similar message
[ 7968.025390] Lustre: MGS: Connection restored to bf2ac545-2711-bbab-e37d-e086236dc162 (at 10.9.6.160@tcp)
[10537.801688] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
[10566.654108] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
[10566.655158] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 1 previous similar message
[10584.774386] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
[10584.775446] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 1 previous similar message
[10608.074356] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
[10608.075473] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 3 previous similar messages
[10614.923867] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
[10614.924928] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 3 previous similar messages
[10634.199088] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
[10634.200110] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 1 previous similar message
[10634.202903] LustreError: 5095:0:(mdt_handler.c:1951:mdt_reint_internal()) Can't unpack reint, rc -71
[10706.824226] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
[10706.825232] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 2 previous similar messages
[10722.305838] LustreError: 5095:0:(mdt_handler.c:1951:mdt_reint_internal()) Can't unpack reint, rc -71
[10774.262163] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
[10774.263228] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 6 previous similar messages
[10774.265904] LustreError: 5031:0:(mdt_handler.c:1951:mdt_reint_internal()) Can't unpack reint, rc -71
[11131.501714] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
[11131.502781] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 2 previous similar messages
[11147.375492] LustreError: 5095:0:(mdt_handler.c:1951:mdt_reint_internal()) Can't unpack reint, rc -71
[root@trevis-60vm4 ~]# 


 Comments   
Comment by Lai Siyao [ 17/Apr/19 ]

0x60000 is "MDS_ATTR_LSIZE | MDS_ATTR_LBLOCKS", and these flags are introduced by LSOM.

Comment by Peter Jones [ 17/Apr/19 ]

Qian

Any advice here?

Peter

Comment by Qian Yingjin [ 17/Apr/19 ]

There is a patch https://review.whamcloud.com/#/c/34663/ which should fix the problem of printing the error message, but it should not cause -71 error code (-EPROTO).

Sarah,
Could you please get the trace on the MDS?

lctl set_param subsystem_debug=mds
lctl set_param debug=trace

Regards,
Qian

Comment by Sarah Liu [ 18/Apr/19 ]

Please see the attached for the log
Thans

Comment by Qian Yingjin [ 19/Apr/19 ]

Hi sarah,

After analyzed the debug trace log, I don't think the debug log you collected is with "trace" enabled...

Maybe you collected the log wrong? you should enable "trace" debug first via lctl and then execute the operations.

could you please collect the debug log with trace enabled again?

 

Thanks,

Qian

Comment by Sarah Liu [ 02/May/19 ]

Sorry for the late response, the debug log was gotten after doing what you indicated and before I did the test

lctl get_param debug
debug=trace warning error emerg console
[root@trevis-60vm1 tmp]# 

I attached another file that I triggered the trace on MDS, maybe it is what you need.

Generated at Sat Feb 10 02:45:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.