Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11639

DNE migration failed with single stripe dir between 2.12 client and prior server

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.0
    • server: 2.10.5
      client: lustre-master 2.11.56_55_g4afee32
    • 3
    • 9223372036854775807

    Description

      Test DNE migration between 2.12 client and server prior to 2.12 (2.10.5 in the test)

      migrate failed on single stripe dir

      client

      [root@trevis-60vm7 lustre]# lfs getdirstripe -m test/
      0
      [root@trevis-60vm7 lustre]# lfs migrate -m 1 test
      [ 7508.448202] LustreError: 11-0: lustre-MDT0000-mdc-ffff8efabb53b000: operation mds_reint to node 10.9.6.157@tcp failed: rc = -71
      test migrate failed: Protocol error (-71)
      [root@trevis-60vm7 lustre]# lctl dl
        0 UP mgc MGC10.9.6.157@tcp af419607-68f3-4546-4807-d9170b42889b 4
        1 UP lov lustre-clilov-ffff8efabb53b000 06a06a8d-2f82-8a4b-a394-265eb37d4778 3
        2 UP lmv lustre-clilmv-ffff8efabb53b000 06a06a8d-2f82-8a4b-a394-265eb37d4778 4
        3 UP mdc lustre-MDT0000-mdc-ffff8efabb53b000 06a06a8d-2f82-8a4b-a394-265eb37d4778 4
        4 UP mdc lustre-MDT0001-mdc-ffff8efabb53b000 06a06a8d-2f82-8a4b-a394-265eb37d4778 4
        5 UP osc lustre-OST0000-osc-ffff8efabb53b000 06a06a8d-2f82-8a4b-a394-265eb37d4778 4
      [root@trevis-60vm7 lustre]#
      

      MDS 0

      [ 1595.659896] Lustre: MGS: Connection restored to 4402468b-47f5-81e4-eb32-5614a7b679dc (at 10.9.6.158@tcp)
      [ 1596.041271] Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000240000400-0x0000000280000400]:1:mdt
      [ 1963.481099] Lustre: MGS: Connection restored to b7583016-fa8d-e6d0-d2bf-689b0cd9f5de (at 10.9.6.159@tcp)
      [ 1963.482619] Lustre: Skipped 2 previous similar messages
      [ 1968.740819] Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000280000400-0x00000002c0000400]:0:ost
      [ 1986.129197] LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.9.6.160@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
      [ 1986.131088] LustreError: Skipped 20 previous similar messages
      [ 1986.202837] Lustre: lustre-MDT0000: Connection restored to bf2ac545-2711-bbab-e37d-e086236dc162 (at 10.9.6.160@tcp)
      [ 1986.203937] Lustre: Skipped 1 previous similar message
      [ 7968.025390] Lustre: MGS: Connection restored to bf2ac545-2711-bbab-e37d-e086236dc162 (at 10.9.6.160@tcp)
      [10537.801688] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
      [10566.654108] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
      [10566.655158] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 1 previous similar message
      [10584.774386] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
      [10584.775446] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 1 previous similar message
      [10608.074356] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
      [10608.075473] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 3 previous similar messages
      [10614.923867] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
      [10614.924928] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 3 previous similar messages
      [10634.199088] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
      [10634.200110] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 1 previous similar message
      [10634.202903] LustreError: 5095:0:(mdt_handler.c:1951:mdt_reint_internal()) Can't unpack reint, rc -71
      [10706.824226] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
      [10706.825232] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 2 previous similar messages
      [10722.305838] LustreError: 5095:0:(mdt_handler.c:1951:mdt_reint_internal()) Can't unpack reint, rc -71
      [10774.262163] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
      [10774.263228] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 6 previous similar messages
      [10774.265904] LustreError: 5031:0:(mdt_handler.c:1951:mdt_reint_internal()) Can't unpack reint, rc -71
      [11131.501714] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
      [11131.502781] LustreError: 5033:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 2 previous similar messages
      [11147.375492] LustreError: 5095:0:(mdt_handler.c:1951:mdt_reint_internal()) Can't unpack reint, rc -71
      [root@trevis-60vm4 ~]# 
      

      Attachments

        1. mds-trace
          529 kB
        2. trace
          254 kB

        Issue Links

          Activity

            [LU-11639] DNE migration failed with single stripe dir between 2.12 client and prior server
            sarah Sarah Liu added a comment -

            Sorry for the late response, the debug log was gotten after doing what you indicated and before I did the test

            lctl get_param debug
            debug=trace warning error emerg console
            [root@trevis-60vm1 tmp]# 
            

            I attached another file that I triggered the trace on MDS, maybe it is what you need.

            sarah Sarah Liu added a comment - Sorry for the late response, the debug log was gotten after doing what you indicated and before I did the test lctl get_param debug debug=trace warning error emerg console [root@trevis-60vm1 tmp]# I attached another file that I triggered the trace on MDS, maybe it is what you need.
            qian_wc Qian Yingjin added a comment -

            Hi sarah,

            After analyzed the debug trace log, I don't think the debug log you collected is with "trace" enabled...

            Maybe you collected the log wrong? you should enable "trace" debug first via lctl and then execute the operations.

            could you please collect the debug log with trace enabled again?

             

            Thanks,

            Qian

            qian_wc Qian Yingjin added a comment - Hi sarah, After analyzed the debug trace log, I don't think the debug log you collected is with "trace" enabled... Maybe you collected the log wrong? you should enable "trace" debug first via lctl and then execute the operations. could you please collect the debug log with trace enabled again?   Thanks, Qian
            sarah Sarah Liu added a comment -

            Please see the attached for the log
            Thans

            sarah Sarah Liu added a comment - Please see the attached for the log Thans
            qian_wc Qian Yingjin added a comment -

            There is a patch https://review.whamcloud.com/#/c/34663/ which should fix the problem of printing the error message, but it should not cause -71 error code (-EPROTO).

            Sarah,
            Could you please get the trace on the MDS?

            lctl set_param subsystem_debug=mds
            lctl set_param debug=trace
            

            Regards,
            Qian

            qian_wc Qian Yingjin added a comment - There is a patch https://review.whamcloud.com/#/c/34663/  which should fix the problem of printing the error message, but it should not cause -71 error code (-EPROTO). Sarah, Could you please get the trace on the MDS? lctl set_param subsystem_debug=mds lctl set_param debug=trace Regards, Qian
            pjones Peter Jones added a comment -

            Qian

            Any advice here?

            Peter

            pjones Peter Jones added a comment - Qian Any advice here? Peter
            laisiyao Lai Siyao added a comment -

            0x60000 is "MDS_ATTR_LSIZE | MDS_ATTR_LBLOCKS", and these flags are introduced by LSOM.

            laisiyao Lai Siyao added a comment - 0x60000 is "MDS_ATTR_LSIZE | MDS_ATTR_LBLOCKS", and these flags are introduced by LSOM.

            People

              qian_wc Qian Yingjin
              sarah Sarah Liu
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: