Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10702

replay-single test_87a: checksum doesn't match

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.10.1, Lustre 2.11.0, Lustre 2.10.2, Lustre 2.10.3
    • Hard Failover
      RHEL 7.4 Server/ldiskfs
      SLES 12 SP3 Client
      2.10.58 , master, build 3707
    • 3
    • 9223372036854775807

    Description

      replay-single test_87a - New checksum d41d8cd98f00b204e9800998ecf8427e does not match original 258b70206dfda5af3d4dfe6946e0adb8
      ^^^^^^^^^^^^^ DO NOT REMOVE LINE ABOVE ^^^^^^^^^^^^^

      This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/5874e3d8-12d9-11e8-bd00-52540065bddc

      test_87a failed with the following error:

      New checksum d41d8cd98f00b204e9800998ecf8427e does not match original 258b70206dfda5af3d4dfe6946e0adb8
      

      Test_logs:

      trevis-27vm3: CMD: trevis-27vm3 lctl get_param -n at_max
      trevis-27vm1: CMD: trevis-27vm1 lctl get_param -n at_max
      trevis-27vm3: osc.lustre-OST0000-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-27vm1: osc.lustre-OST0000-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-27vm4: CMD: trevis-27vm4 lctl get_param -n at_max
      trevis-27vm4: osc.lustre-OST0000-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-27vm3: osc.lustre-OST0001-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-27vm1: osc.lustre-OST0001-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-27vm4: osc.lustre-OST0001-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-27vm3: osc.lustre-OST0002-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-27vm4: osc.lustre-OST0002-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-27vm1: osc.lustre-OST0002-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-27vm3: osc.lustre-OST0003-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-27vm4: osc.lustre-OST0003-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-27vm1: osc.lustre-OST0003-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-27vm3: osc.lustre-OST0004-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-27vm1: osc.lustre-OST0004-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-27vm4: osc.lustre-OST0004-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-27vm3: osc.lustre-OST0005-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-27vm1: osc.lustre-OST0005-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-27vm4: osc.lustre-OST0005-osc-ffff*.ost_server_uuid in FULL state after 0 sec
      trevis-27vm1: osc.lustre-OST0006-osc-ffff*.ost_server_uuid in FULL state after 3 sec
      trevis-27vm3: osc.lustre-OST0006-osc-ffff*.ost_server_uuid in FULL state after 4 sec
      trevis-27vm4: osc.lustre-OST0006-osc-ffff*.ost_server_uuid in FULL state after 4 sec
      0+0 records in
      0+0 records out
      0 bytes copied, 0.0018678 s, 0.0 kB/s
       replay-single test_87a: @@@@@@ FAIL: New checksum d41d8cd98f00b204e9800998ecf8427e does not match original 258b70206dfda5af3d4dfe6946e0adb8 
        Trace dump:
      

      Attachments

        Issue Links

          Activity

            [LU-10702] replay-single test_87a: checksum doesn't match
            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - +1 https://testing.whamcloud.com/test_sets/0b6cbfe1-ab6f-4ab1-adea-6d8d959d05f9

            I have searched the failed test_87a test in Maloo, and found all are caused by the client eviction for there is no corresponding
            client connection record during recovery at OST

            [40129.638078] LustreError: 167-0: lustre-OST0000-osc-ffff880076344000: This client was evicted by lustre-OST0000; in progress operations using this service will fail.
            [40170.533253] LustreError: 167-0: lustre-OST0001-osc-ffff880076344000: This client was evicted by lustre-OST0001; in progress operations using this service will fail.
            [40210.533458] LustreError: 167-0: lustre-OST0002-osc-ffff880076344000: This client was evicted by lustre-OST0002; in progress operations using this service will fail.
            [40258.273807] LustreError: 167-0: lustre-OST0003-osc-ffff880076344000: This client was evicted by lustre-OST0003; in progress operations using this service will fail.
            
            00000100:00080000:1.0:1520181481.496098:0:29559:0:(import.c:675:ptlrpc_connect_import()) ffff88007b158800 lustre-OST0000_UUID: changing import state from DISCONN to CONNECTING
            00000100:00080000:1.0:1520181481.496100:0:29559:0:(import.c:519:import_select_connection()) lustre-OST0000-osc-ffff880076344000: connect to NID 10.9.6.83@tcp last attempt 0
            00000100:00080000:1.0:1520181481.496103:0:29559:0:(import.c:589:import_select_connection()) lustre-OST0000-osc-ffff880076344000: Connection changing to lustre-OST0000 (at 10.9.6.83@tcp)
            00000100:00080000:1.0:1520181481.496105:0:29559:0:(import.c:597:import_select_connection()) lustre-OST0000-osc-ffff880076344000: import ffff88007b158800 using connection 10.9.6.83@tcp/10.9.6.83@tcp
            00000008:00100000:1.0:1520181481.496108:0:29559:0:(osc_request.c:2652:osc_reconnect()) ocd_connect_flags: 0x20445af0e3640478 ocd_version: 34210560 ocd_grant: 8388608, lost: 0.
            00000100:00100000:1.0:1520181481.496114:0:29559:0:(import.c:763:ptlrpc_connect_import()) @@@ (re)connect request (timeout 5)  req@ffff880061791cc0 x1594011648012240/t0(0) o8->lustre-OST0000-osc-ffff880076344000@10.9.6.83@tcp:28/4 lens 520/544 e 0 to 0 dl 0 ref 1 fl New:N/0/ffffffff rc 0/-1
            00000020:01000000:1.0:1520181481.496126:0:29559:0:(obd_config.c:1370:class_process_proc_param()) lustre-OST0000-osc-ffff880076344000: set parameter 'import=connection=10.9.6.83@tcp::60'
            10000000:01000000:1.0:1520181481.496137:0:29559:0:(mgc_request.c:2151:mgc_process_log()) MGC10.9.6.85@tcp: configuration from log 'lustre-cliir' succeeded (0).
            00000100:00100000:0.0:1520181481.496137:0:31324:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_rcv:51f0d3b2-a534-84b0-eaaa-ade7c191b8ad:31324:1594011648012240:10.9.6.83@tcp:8
            00010000:00010000:1.0:1520181481.496140:0:29559:0:(ldlm_lock.c:797:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(CR) ns: ?? lock: ffff88006109c000/0xdfe6b47beb46e4d3 lrc: 3/1,0 mode: CR/CR res: ?? rrc=?? type: ??? flags: 0x11000000000000 nid: local remote: 0xfafbc8b63ef58e4d expref: -99 pid: 29559 timeout: 0 lvb_type: 0
            00010000:00010000:1.0:1520181481.496143:0:29559:0:(ldlm_lock.c:887:ldlm_lock_decref_internal()) ### do not add lock into lru list ns: ?? lock: ffff88006109c000/0xdfe6b47beb46e4d3 lrc: 2/0,0 mode: CR/CR res: ?? rrc=?? type: ??? flags: 0x11000000000000 nid: local remote: 0xfafbc8b63ef58e4d expref: -99 pid: 29559 timeout: 0 lvb_type: 0
            00000100:00080000:0.0:1520181481.496831:0:31324:0:(import.c:1041:ptlrpc_connect_interpret()) lustre-OST0000-osc-ffff880076344000: connect to target with instance 60
            00000080:00000004:0.0:1520181481.496841:0:31324:0:(lcommon_misc.c:97:cl_ocd_update()) Changing connect_flags: 0x20405af0e3440478 -> 0x20405af0e3440478
            00000080:00080000:0.0:1520181481.496845:0:31324:0:(lcommon_misc.c:71:cl_init_ea_size()) updating def/max_easize: 72/216
            00000100:00080000:0.0:1520181481.496850:0:31324:0:(import.c:1239:ptlrpc_connect_interpret()) @@@ lustre-OST0000-osc-ffff880076344000: evicting (reconnect/recover flags not set: 4)  req@ffff880061791cc0 x1594011648012240/t0(0) o8->lustre-OST0000-osc-ffff880076344000@10.9.6.83@tcp:28/4 lens 520/384 e 0 to 0 dl 1520181506 ref 1 fl Interpret:RN/0/0 rc 0/0
            00000100:00080000:0.0:1520181481.496854:0:31324:0:(import.c:1242:ptlrpc_connect_interpret()) ffff88007b158800 lustre-OST0000_UUID: changing import state from CONNECTING to EVICTED
            00000100:02020000:0.0:1520181481.496857:0:31324:0:(import.c:1471:ptlrpc_import_recovery_state_machine()) 167-0: lustre-OST0000-osc-ffff880076344000: This client was evicted by lustre-OST0000; in progress operations using this service will fail.
            00000100:00080000:0.0:1520181481.496873:0:31324:0:(import.c:1475:ptlrpc_import_recovery_state_machine()) evicted from lustre-OST0000_UUID@10.9.6.83@tcp; invalidating
            00000100:00100000:0.0:1520181481.501384:0:31324:0:(client.c:2045:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_rcv:51f0d3b2-a534-84b0-eaaa-ade7c191b8ad:31324:1594011648012240:10.9.6.83@tcp:8
            00000100:00080000:1.0:1520181481.501394:0:30975:0:(import.c:1417:ptlrpc_invalidate_import_thread()) thread invalidate import lustre-OST0000-osc-ffff880076344000 to lustre-OST0000_UUID@10.9.6.83@tcp
            00000100:00080000:1.0:1520181481.501397:0:30975:0:(import.c:214:ptlrpc_deactivate_and_unlock_import()) setting import lustre-OST0000_UUID INVALID
            00000100:00100000:1.0:1520181481.501398:0:30975:0:(client.c:2713:ptlrpc_free_committed()) lustre-OST0000-osc-ffff880076344000: committing for last_committed 0 gen 2
            00000100:00100000:1.0:1520181481.501402:0:30975:0:(client.c:2733:ptlrpc_free_committed()) @@@ free request with old gen  req@ffff880078cd53c0 x1594011648010144/t253403070465(253403070465) o10->lustre-OST0000-osc-ffff880076344000@10.9.6.83@tcp:6/4 lens 560/400 e 0 to 0 dl 1520181403 ref 1 fl Complete:R/4/0 rc 0/0
            00000100:00100000:1.0:1520181481.501409:0:30975:0:(client.c:2733:ptlrpc_free_committed()) @@@ free request with old gen  req@ffff880017b143c0 x1594011648010176/t253403070466(253403070466) o4->lustre-OST0000-osc-ffff880076344000@10.9.6.83@tcp:6/4 lens 608/416 e 0 to 0 dl 1520181398 ref 1 fl Complete:R/4/0 rc 0/0
            00000100:00100000:1.0:1520181481.501723:0:30975:0:(client.c:2733:ptlrpc_free_committed()) @@@ free request with old gen  req@ffff880017b146c0 x1594011648010208/t253403070467(253403070467) o4->lustre-OST0000-osc-ffff880076344000@10.9.6.83@tcp:6/4 lens 608/416 e 0 to 0 dl 1520181398 ref 1 fl Complete:R/4/0 rc 0/0
            00020000:01000000:1.0:1520181481.501988:0:30975:0:(lov_obd.c:425:lov_set_osc_active()) Marking OSC lustre-OST0000_UUID inactive
            00000080:00000004:1.0:1520181481.501990:0:30975:0:(lcommon_misc.c:97:cl_ocd_update()) Changing connect_flags: 0x20405af0e3440478 -> 0x20405af0e3440478
            00000080:00080000:1.0:1520181481.501992:0:30975:0:(lcommon_misc.c:71:cl_init_ea_size()) updating def/max_easize: 72/192
            00000100:00100000:1.0:1520181481.501994:0:30975:0:(import.c:318:ptlrpc_invalidate_import()) Sleeping 20 sec for inflight to error out
            00000100:00080000:1.0:1520181481.502288:0:30975:0:(import.c:1426:ptlrpc_invalidate_import_thread()) ffff88007b158800 lustre-OST0000_UUID: changing import state from EVICTED to RECOVER
            
            00000100:00100000:1.0:1520181481.497603:0:2470:0:(events.c:350:request_in_callback()) peer: 12345-10.9.6.78@tcp (source: 12345-10.9.6.78@tcp)
            00000100:00100000:0.0:1520181481.497642:0:2506:0:(service.c:1939:ptlrpc_server_handle_req_in()) got req x1594011648012240
            00000100:00100000:0.0:1520181481.497663:0:2506:0:(nrs_fifo.c:179:nrs_fifo_req_get()) NRS start fifo request from 12345-10.9.6.78@tcp, seq: 54
            00000100:00100000:0.0:1520181481.497669:0:2506:0:(service.c:2089:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost00_001:0+-99:31324:x1594011648012240:12345-10.9.6.78@tcp:8
            00010000:00080000:0.0:1520181481.497685:0:2506:0:(ldlm_lib.c:1229:target_handle_connect()) lustre-OST0000: connection from 51f0d3b2-a534-84b0-eaaa-ade7c191b8ad@10.9.6.78@tcp t253403070465 exp           (null) cur 1520181481 last 0
            00000020:00000080:0.0:1520181481.497698:0:2506:0:(genops.c:1379:class_connect()) connect: client 51f0d3b2-a534-84b0-eaaa-ade7c191b8ad, cookie 0xe0f23292040a3e10
            00002000:00100000:0.0:1520181481.497704:0:2506:0:(ofd_obd.c:157:ofd_parse_connect_data()) lustre-OST0000: cli 51f0d3b2-a534-84b0-eaaa-ade7c191b8ad/ffff88006a23d400 ocd_connect_flags: 0x20445af0e3640478 ocd_version: 20a0300 ocd_grant: 8388608 ocd_index: 0 ocd_group 0
            00002000:00100000:0.0:1520181481.497712:0:2506:0:(ofd_obd.c:264:ofd_parse_connect_data()) lustre-OST0000: cli (no nid) supports cksum type 7, return 7
            00000020:01000000:0.0:1520181481.497737:0:2506:0:(lprocfs_status_server.c:342:lprocfs_exp_setup()) using hash ffff88007a296c40
            00002000:00080000:0.0:1520181481.497753:0:2506:0:(ofd_obd.c:397:ofd_obd_connect()) lustre-OST0000: get connection from MDS 0
            00000100:00080000:0.0:1520181481.497762:0:2506:0:(import.c:104:ptlrpc_import_enter_resend()) ffff88007c65d800 : changing import state from NEW to RECOVER
            00000100:00080000:0.0:1520181481.497766:0:2506:0:(import.c:1536:ptlrpc_import_recovery_state_machine()) ffff88007c65d800 : changing import state from RECOVER to FULL
            00000100:02000000:0.0:1520181481.497768:0:2506:0:(import.c:1542:ptlrpc_import_recovery_state_machine()) lustre-OST0000: Connection restored to  (at 10.9.6.78@tcp)
            00000100:00100000:0.0:1520181481.497802:0:2506:0:(service.c:2135:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost00_001:51f0d3b2-a534-84b0-eaaa-ade7c191b8ad+6:31324:x1594011648012240:12345-10.9.6.78@tcp:8 Request procesed in 133us (206us total) trans 0 rc 0/0
            

            Could the client connection be not committed before failover?

            hongchao.zhang Hongchao Zhang added a comment - I have searched the failed test_87a test in Maloo, and found all are caused by the client eviction for there is no corresponding client connection record during recovery at OST [40129.638078] LustreError: 167-0: lustre-OST0000-osc-ffff880076344000: This client was evicted by lustre-OST0000; in progress operations using this service will fail. [40170.533253] LustreError: 167-0: lustre-OST0001-osc-ffff880076344000: This client was evicted by lustre-OST0001; in progress operations using this service will fail. [40210.533458] LustreError: 167-0: lustre-OST0002-osc-ffff880076344000: This client was evicted by lustre-OST0002; in progress operations using this service will fail. [40258.273807] LustreError: 167-0: lustre-OST0003-osc-ffff880076344000: This client was evicted by lustre-OST0003; in progress operations using this service will fail. 00000100:00080000:1.0:1520181481.496098:0:29559:0:(import.c:675:ptlrpc_connect_import()) ffff88007b158800 lustre-OST0000_UUID: changing import state from DISCONN to CONNECTING 00000100:00080000:1.0:1520181481.496100:0:29559:0:(import.c:519:import_select_connection()) lustre-OST0000-osc-ffff880076344000: connect to NID 10.9.6.83@tcp last attempt 0 00000100:00080000:1.0:1520181481.496103:0:29559:0:(import.c:589:import_select_connection()) lustre-OST0000-osc-ffff880076344000: Connection changing to lustre-OST0000 (at 10.9.6.83@tcp) 00000100:00080000:1.0:1520181481.496105:0:29559:0:(import.c:597:import_select_connection()) lustre-OST0000-osc-ffff880076344000: import ffff88007b158800 using connection 10.9.6.83@tcp/10.9.6.83@tcp 00000008:00100000:1.0:1520181481.496108:0:29559:0:(osc_request.c:2652:osc_reconnect()) ocd_connect_flags: 0x20445af0e3640478 ocd_version: 34210560 ocd_grant: 8388608, lost: 0. 00000100:00100000:1.0:1520181481.496114:0:29559:0:(import.c:763:ptlrpc_connect_import()) @@@ (re)connect request (timeout 5) req@ffff880061791cc0 x1594011648012240/t0(0) o8->lustre-OST0000-osc-ffff880076344000@10.9.6.83@tcp:28/4 lens 520/544 e 0 to 0 dl 0 ref 1 fl New:N/0/ffffffff rc 0/-1 00000020:01000000:1.0:1520181481.496126:0:29559:0:(obd_config.c:1370:class_process_proc_param()) lustre-OST0000-osc-ffff880076344000: set parameter 'import=connection=10.9.6.83@tcp::60' 10000000:01000000:1.0:1520181481.496137:0:29559:0:(mgc_request.c:2151:mgc_process_log()) MGC10.9.6.85@tcp: configuration from log 'lustre-cliir' succeeded (0). 00000100:00100000:0.0:1520181481.496137:0:31324:0:(client.c:1620:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_rcv:51f0d3b2-a534-84b0-eaaa-ade7c191b8ad:31324:1594011648012240:10.9.6.83@tcp:8 00010000:00010000:1.0:1520181481.496140:0:29559:0:(ldlm_lock.c:797:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(CR) ns: ?? lock: ffff88006109c000/0xdfe6b47beb46e4d3 lrc: 3/1,0 mode: CR/CR res: ?? rrc=?? type: ??? flags: 0x11000000000000 nid: local remote: 0xfafbc8b63ef58e4d expref: -99 pid: 29559 timeout: 0 lvb_type: 0 00010000:00010000:1.0:1520181481.496143:0:29559:0:(ldlm_lock.c:887:ldlm_lock_decref_internal()) ### do not add lock into lru list ns: ?? lock: ffff88006109c000/0xdfe6b47beb46e4d3 lrc: 2/0,0 mode: CR/CR res: ?? rrc=?? type: ??? flags: 0x11000000000000 nid: local remote: 0xfafbc8b63ef58e4d expref: -99 pid: 29559 timeout: 0 lvb_type: 0 00000100:00080000:0.0:1520181481.496831:0:31324:0:(import.c:1041:ptlrpc_connect_interpret()) lustre-OST0000-osc-ffff880076344000: connect to target with instance 60 00000080:00000004:0.0:1520181481.496841:0:31324:0:(lcommon_misc.c:97:cl_ocd_update()) Changing connect_flags: 0x20405af0e3440478 -> 0x20405af0e3440478 00000080:00080000:0.0:1520181481.496845:0:31324:0:(lcommon_misc.c:71:cl_init_ea_size()) updating def/max_easize: 72/216 00000100:00080000:0.0:1520181481.496850:0:31324:0:(import.c:1239:ptlrpc_connect_interpret()) @@@ lustre-OST0000-osc-ffff880076344000: evicting (reconnect/recover flags not set: 4) req@ffff880061791cc0 x1594011648012240/t0(0) o8->lustre-OST0000-osc-ffff880076344000@10.9.6.83@tcp:28/4 lens 520/384 e 0 to 0 dl 1520181506 ref 1 fl Interpret:RN/0/0 rc 0/0 00000100:00080000:0.0:1520181481.496854:0:31324:0:(import.c:1242:ptlrpc_connect_interpret()) ffff88007b158800 lustre-OST0000_UUID: changing import state from CONNECTING to EVICTED 00000100:02020000:0.0:1520181481.496857:0:31324:0:(import.c:1471:ptlrpc_import_recovery_state_machine()) 167-0: lustre-OST0000-osc-ffff880076344000: This client was evicted by lustre-OST0000; in progress operations using this service will fail. 00000100:00080000:0.0:1520181481.496873:0:31324:0:(import.c:1475:ptlrpc_import_recovery_state_machine()) evicted from lustre-OST0000_UUID@10.9.6.83@tcp; invalidating 00000100:00100000:0.0:1520181481.501384:0:31324:0:(client.c:2045:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_rcv:51f0d3b2-a534-84b0-eaaa-ade7c191b8ad:31324:1594011648012240:10.9.6.83@tcp:8 00000100:00080000:1.0:1520181481.501394:0:30975:0:(import.c:1417:ptlrpc_invalidate_import_thread()) thread invalidate import lustre-OST0000-osc-ffff880076344000 to lustre-OST0000_UUID@10.9.6.83@tcp 00000100:00080000:1.0:1520181481.501397:0:30975:0:(import.c:214:ptlrpc_deactivate_and_unlock_import()) setting import lustre-OST0000_UUID INVALID 00000100:00100000:1.0:1520181481.501398:0:30975:0:(client.c:2713:ptlrpc_free_committed()) lustre-OST0000-osc-ffff880076344000: committing for last_committed 0 gen 2 00000100:00100000:1.0:1520181481.501402:0:30975:0:(client.c:2733:ptlrpc_free_committed()) @@@ free request with old gen req@ffff880078cd53c0 x1594011648010144/t253403070465(253403070465) o10->lustre-OST0000-osc-ffff880076344000@10.9.6.83@tcp:6/4 lens 560/400 e 0 to 0 dl 1520181403 ref 1 fl Complete:R/4/0 rc 0/0 00000100:00100000:1.0:1520181481.501409:0:30975:0:(client.c:2733:ptlrpc_free_committed()) @@@ free request with old gen req@ffff880017b143c0 x1594011648010176/t253403070466(253403070466) o4->lustre-OST0000-osc-ffff880076344000@10.9.6.83@tcp:6/4 lens 608/416 e 0 to 0 dl 1520181398 ref 1 fl Complete:R/4/0 rc 0/0 00000100:00100000:1.0:1520181481.501723:0:30975:0:(client.c:2733:ptlrpc_free_committed()) @@@ free request with old gen req@ffff880017b146c0 x1594011648010208/t253403070467(253403070467) o4->lustre-OST0000-osc-ffff880076344000@10.9.6.83@tcp:6/4 lens 608/416 e 0 to 0 dl 1520181398 ref 1 fl Complete:R/4/0 rc 0/0 00020000:01000000:1.0:1520181481.501988:0:30975:0:(lov_obd.c:425:lov_set_osc_active()) Marking OSC lustre-OST0000_UUID inactive 00000080:00000004:1.0:1520181481.501990:0:30975:0:(lcommon_misc.c:97:cl_ocd_update()) Changing connect_flags: 0x20405af0e3440478 -> 0x20405af0e3440478 00000080:00080000:1.0:1520181481.501992:0:30975:0:(lcommon_misc.c:71:cl_init_ea_size()) updating def/max_easize: 72/192 00000100:00100000:1.0:1520181481.501994:0:30975:0:(import.c:318:ptlrpc_invalidate_import()) Sleeping 20 sec for inflight to error out 00000100:00080000:1.0:1520181481.502288:0:30975:0:(import.c:1426:ptlrpc_invalidate_import_thread()) ffff88007b158800 lustre-OST0000_UUID: changing import state from EVICTED to RECOVER 00000100:00100000:1.0:1520181481.497603:0:2470:0:(events.c:350:request_in_callback()) peer: 12345-10.9.6.78@tcp (source: 12345-10.9.6.78@tcp) 00000100:00100000:0.0:1520181481.497642:0:2506:0:(service.c:1939:ptlrpc_server_handle_req_in()) got req x1594011648012240 00000100:00100000:0.0:1520181481.497663:0:2506:0:(nrs_fifo.c:179:nrs_fifo_req_get()) NRS start fifo request from 12345-10.9.6.78@tcp, seq: 54 00000100:00100000:0.0:1520181481.497669:0:2506:0:(service.c:2089:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost00_001:0+-99:31324:x1594011648012240:12345-10.9.6.78@tcp:8 00010000:00080000:0.0:1520181481.497685:0:2506:0:(ldlm_lib.c:1229:target_handle_connect()) lustre-OST0000: connection from 51f0d3b2-a534-84b0-eaaa-ade7c191b8ad@10.9.6.78@tcp t253403070465 exp (null) cur 1520181481 last 0 00000020:00000080:0.0:1520181481.497698:0:2506:0:(genops.c:1379:class_connect()) connect: client 51f0d3b2-a534-84b0-eaaa-ade7c191b8ad, cookie 0xe0f23292040a3e10 00002000:00100000:0.0:1520181481.497704:0:2506:0:(ofd_obd.c:157:ofd_parse_connect_data()) lustre-OST0000: cli 51f0d3b2-a534-84b0-eaaa-ade7c191b8ad/ffff88006a23d400 ocd_connect_flags: 0x20445af0e3640478 ocd_version: 20a0300 ocd_grant: 8388608 ocd_index: 0 ocd_group 0 00002000:00100000:0.0:1520181481.497712:0:2506:0:(ofd_obd.c:264:ofd_parse_connect_data()) lustre-OST0000: cli (no nid) supports cksum type 7, return 7 00000020:01000000:0.0:1520181481.497737:0:2506:0:(lprocfs_status_server.c:342:lprocfs_exp_setup()) using hash ffff88007a296c40 00002000:00080000:0.0:1520181481.497753:0:2506:0:(ofd_obd.c:397:ofd_obd_connect()) lustre-OST0000: get connection from MDS 0 00000100:00080000:0.0:1520181481.497762:0:2506:0:(import.c:104:ptlrpc_import_enter_resend()) ffff88007c65d800 : changing import state from NEW to RECOVER 00000100:00080000:0.0:1520181481.497766:0:2506:0:(import.c:1536:ptlrpc_import_recovery_state_machine()) ffff88007c65d800 : changing import state from RECOVER to FULL 00000100:02000000:0.0:1520181481.497768:0:2506:0:(import.c:1542:ptlrpc_import_recovery_state_machine()) lustre-OST0000: Connection restored to (at 10.9.6.78@tcp) 00000100:00100000:0.0:1520181481.497802:0:2506:0:(service.c:2135:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost00_001:51f0d3b2-a534-84b0-eaaa-ade7c191b8ad+6:31324:x1594011648012240:12345-10.9.6.78@tcp:8 Request procesed in 133us (206us total) trans 0 rc 0/0 Could the client connection be not committed before failover?
            pjones Peter Jones added a comment -

            Hongchao

            Any idea why this is failing for SLES?

            Peter

            pjones Peter Jones added a comment - Hongchao Any idea why this is failing for SLES? Peter
            jamesanunez James Nunez (Inactive) added a comment - - edited

            This test started failing with checksum errors on 2017-09 between 13 and 26 with master tag 2.10.53 seen during SLES client and SLES client/server testing. See the following logs for some early failures:
            https://testing.hpdd.intel.com/test_sets/f83daaaa-a2c0-11e7-bb19-5254006e85c2
            https://testing.hpdd.intel.com/test_sets/cfc31ff2-a6a2-11e7-bb19-5254006e85c2

            Also seen in b2_10 starting with 2.10.1 on 2017-10-05;
            https://testing.hpdd.intel.com/test_sets/d5e27b1a-aa41-11e7-b78a-5254006e85c2

            jamesanunez James Nunez (Inactive) added a comment - - edited This test started failing with checksum errors on 2017-09 between 13 and 26 with master tag 2.10.53 seen during SLES client and SLES client/server testing. See the following logs for some early failures: https://testing.hpdd.intel.com/test_sets/f83daaaa-a2c0-11e7-bb19-5254006e85c2 https://testing.hpdd.intel.com/test_sets/cfc31ff2-a6a2-11e7-bb19-5254006e85c2 Also seen in b2_10 starting with 2.10.1 on 2017-10-05; https://testing.hpdd.intel.com/test_sets/d5e27b1a-aa41-11e7-b78a-5254006e85c2

            People

              hongchao.zhang Hongchao Zhang
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: