Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-518

replay-single test_45: Can't lstat /mnt/lustre/f45: Cannot send after transport endpoint shutdown

Details

    • Bug
    • Resolution: Won't Fix
    • Minor
    • Lustre 2.1.0
    • Lustre 2.0.0
    • None
    • 3
    • 22,981
    • 5988

    Description

      replay-single test 45 failed as follows:

      == replay-single test 45: Handle failed close == 07:35:08 (1311086108)
      multiop /mnt/lustre/f45 vO_c
      TMPPIPE=/tmp/multiop_open_wait_pipe.17755
      Can't lstat /mnt/lustre/f45: Cannot send after transport endpoint shutdown
       replay-single test_45: @@@@@@ FAIL: test_45 failed with 2 
      Dumping lctl log to /home/yujian/test_logs/2011-07-19/054342/replay-single.test_45.*.1311086110.log
      

      Dmesg on the client node fat-amd-3-ib:

      Lustre: DEBUG MARKER: == replay-single test 45: Handle failed close == 07:35:08 (1311086108)
      Lustre: setting import lustre-MDT0000_UUID INACTIVE by administrator request
      LustreError: 18821:0:(file.c:155:ll_close_inode_openhandle()) inode 144115440136749057 mdc close failed: rc = -108
      LustreError: 18826:0:(client.c:1057:ptlrpc_import_delay_req()) @@@ IMP_INVALID  req@ffff8804079b3400 x1374766420142719/t0(0) o-1->lustre-MDT0000_UUID@192.168.4.2@o2ib:12/10 lens 544/880 e 0 to 0 dl 0 ref 2 fl Rpc:/ffffffff/ffffffff rc 0/-1
      LustreError: 18826:0:(mdc_locks.c:722:mdc_enqueue()) ldlm_cli_enqueue: -108
      LustreError: 18826:0:(file.c:2165:ll_inode_revalidate_fini()) failure -108 inode 29
      Lustre: DEBUG MARKER: replay-single test_45: @@@@@@ FAIL: test_45 failed with 2
      

      Maloo report: https://maloo.whamcloud.com/test_sets/7f371672-b281-11e0-b33f-52540025f9af

      This is an known issue on Lustre master branch: bug 22981. Some instances were reported in bug 20997.

      Attachments

        Activity

          [LU-518] replay-single test_45: Can't lstat /mnt/lustre/f45: Cannot send after transport endpoint shutdown

          Old ticket for unsupported version

          simmonsja James A Simmons added a comment - Old ticket for unsupported version
          bobijam Zhenyu Xu added a comment -

          for unknown reason, the recovery reconnect request just failed, and made the MDC import to MDS invalid henceforward.

          client2 (fat-amd-3-ib) debug log

          1311086109.335453:0:18824:0:(recover.c:276:ptlrpc_set_import_active()) setting import lustre-MDT0000_UUID VALID
          1311086109.335467:0:18824:0:(import.c:167:ptlrpc_set_import_discon()) lustre-MDT0000-mdc-ffff880117b9d400: Connection to service lustre-MDT0000 via nid 192.168.4.4@o2ib was lost; in progress operations using this service will wait for recovery to complete.
          1311086109.335474:0:18824:0:(import.c:177:ptlrpc_set_import_discon()) ffff880218423000 lustre-MDT0000_UUID: changing import state from FULL to DISCONN
          1311086109.335482:0:18824:0:(import.c:621:ptlrpc_connect_import()) ffff880218423000 lustre-MDT0000_UUID: changing import state from DISCONN to CONNECTING
          1311086109.335489:0:18824:0:(import.c:478:import_select_connection()) lustre-MDT0000-mdc-ffff880117b9d400: connect to NID 192.168.4.4@o2ib last attempt 4305440055
          1311086109.335495:0:18824:0:(import.c:478:import_select_connection()) lustre-MDT0000-mdc-ffff880117b9d400: connect to NID 192.168.4.2@o2ib last attempt 4305416002
          1311086109.335505:0:18824:0:(import.c:550:import_select_connection()) Changing connection for lustre-MDT0000-mdc-ffff880117b9d400 to 192.168.4.2@o2ib/192.168.4.2@o2ib
          1311086109.335509:0:18824:0:(import.c:556:import_select_connection()) lustre-MDT0000-mdc-ffff880117b9d400: import ffff880218423000 using connection 192.168.4.2@o2ib/192.168.4.2@o2ib
          1311086109.335543:0:18824:0:(import.c:720:ptlrpc_connect_import()) @@@ (re)connect request (timeout 5) req@ffff880319ce9400 x1374766420142717/t0(0) o-1->lustre-MDT0000_UUID@192.168.4.2@o2ib:12/10 lens 368/392 e 0 to 0 dl 0 ref 1 fl New:N/ffffffff/ffffffff rc 0/-1
          1311086109.335575:0:18824:0:(recover.c:344:ptlrpc_recover_import_no_retry()) lustre-MDT0000_UUID: recovery started, waiting
          1311086109.335587:0:15748:0:(client.c:1392:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd-rcv:93a2f5a2-2e73-de1c-53c5-2e51c11c95b2:15748:1374766420142717:192.168.4.2@o2ib:38
          1311086109.336967:0:15748:0:(client.c:1775:ptlrpc_expire_one_request()) @@@ Request x1374766420142717 sent from lustre-MDT0000-mdc-ffff880117b9d400 to NID 192.168.4.2@o2ib has failed due to network error: [sent 1311086109] [real_sent 1311086109] [current 1311086109] [deadline 26s] [delay -26s] req@ffff880319ce9400 x1374766420142717/t0(0) o-1->lustre-MDT0000_UUID@192.168.4.2@o2ib:12/10 lens 368/392 e 0 to 1 dl 1311086135 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1
          1311086109.336994:0:15748:0:(client.c:1807:ptlrpc_expire_one_request()) @@@ err 110, sent_state=CONNECTING (now=CONNECTING) req@ffff880319ce9400 x1374766420142717/t0(0) o-1>lustre-MDT0000_UUID@192.168.4.2@o2ib:12/10 lens 368/392 e 0 to 1 dl 1311086135 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1
          1311086109.337009:0:15748:0:(import.c:1120:ptlrpc_connect_interpret()) ffff880218423000 lustre-MDT0000_UUID: changing import state from CONNECTING to DISCONN
          1311086109.337015:0:15748:0:(import.c:1166:ptlrpc_connect_interpret()) recovery of lustre-MDT0000_UUID on 192.168.4.2@o2ib failed (-110)

          bobijam Zhenyu Xu added a comment - for unknown reason, the recovery reconnect request just failed, and made the MDC import to MDS invalid henceforward. client2 (fat-amd-3-ib) debug log 1311086109.335453:0:18824:0:(recover.c:276:ptlrpc_set_import_active()) setting import lustre-MDT0000_UUID VALID 1311086109.335467:0:18824:0:(import.c:167:ptlrpc_set_import_discon()) lustre-MDT0000-mdc-ffff880117b9d400: Connection to service lustre-MDT0000 via nid 192.168.4.4@o2ib was lost; in progress operations using this service will wait for recovery to complete. 1311086109.335474:0:18824:0:(import.c:177:ptlrpc_set_import_discon()) ffff880218423000 lustre-MDT0000_UUID: changing import state from FULL to DISCONN 1311086109.335482:0:18824:0:(import.c:621:ptlrpc_connect_import()) ffff880218423000 lustre-MDT0000_UUID: changing import state from DISCONN to CONNECTING 1311086109.335489:0:18824:0:(import.c:478:import_select_connection()) lustre-MDT0000-mdc-ffff880117b9d400: connect to NID 192.168.4.4@o2ib last attempt 4305440055 1311086109.335495:0:18824:0:(import.c:478:import_select_connection()) lustre-MDT0000-mdc-ffff880117b9d400: connect to NID 192.168.4.2@o2ib last attempt 4305416002 1311086109.335505:0:18824:0:(import.c:550:import_select_connection()) Changing connection for lustre-MDT0000-mdc-ffff880117b9d400 to 192.168.4.2@o2ib/192.168.4.2@o2ib 1311086109.335509:0:18824:0:(import.c:556:import_select_connection()) lustre-MDT0000-mdc-ffff880117b9d400: import ffff880218423000 using connection 192.168.4.2@o2ib/192.168.4.2@o2ib 1311086109.335543:0:18824:0:(import.c:720:ptlrpc_connect_import()) @@@ (re)connect request (timeout 5) req@ffff880319ce9400 x1374766420142717/t0(0) o-1->lustre-MDT0000_UUID@192.168.4.2@o2ib:12/10 lens 368/392 e 0 to 0 dl 0 ref 1 fl New:N/ffffffff/ffffffff rc 0/-1 1311086109.335575:0:18824:0:(recover.c:344:ptlrpc_recover_import_no_retry()) lustre-MDT0000_UUID: recovery started, waiting 1311086109.335587:0:15748:0:(client.c:1392:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd-rcv:93a2f5a2-2e73-de1c-53c5-2e51c11c95b2:15748:1374766420142717:192.168.4.2@o2ib:38 1311086109.336967:0:15748:0:(client.c:1775:ptlrpc_expire_one_request()) @@@ Request x1374766420142717 sent from lustre-MDT0000-mdc-ffff880117b9d400 to NID 192.168.4.2@o2ib has failed due to network error: [sent 1311086109] [real_sent 1311086109] [current 1311086109] [deadline 26s] [delay -26s] req@ffff880319ce9400 x1374766420142717/t0(0) o-1->lustre-MDT0000_UUID@192.168.4.2@o2ib:12/10 lens 368/392 e 0 to 1 dl 1311086135 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1 1311086109.336994:0:15748:0:(client.c:1807:ptlrpc_expire_one_request()) @@@ err 110, sent_state=CONNECTING (now=CONNECTING) req@ffff880319ce9400 x1374766420142717/t0(0) o-1 >lustre-MDT0000_UUID@192.168.4.2@o2ib:12/10 lens 368/392 e 0 to 1 dl 1311086135 ref 1 fl Rpc:XN/ffffffff/ffffffff rc 0/-1 1311086109.337009:0:15748:0:(import.c:1120:ptlrpc_connect_interpret()) ffff880218423000 lustre-MDT0000_UUID: changing import state from CONNECTING to DISCONN 1311086109.337015:0:15748:0:(import.c:1166:ptlrpc_connect_interpret()) recovery of lustre-MDT0000_UUID on 192.168.4.2@o2ib failed (-110)

          People

            wc-triage WC Triage
            yujian Jian Yu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: