Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3728

mdt_handler.c:3176:mdt_tgt_connect()) ASSERTION( mti != ((void *)0) ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.5.0
    • 3
    • 9615

    Description

      I hit this on an idle system with lustre mounted using llmount.sh with 2 MDTs and 3 client mount points. I had used it for some HSM testing and ran racer but then I left it idle. Three hours after racer finished,

      Lustre: DEBUG MARKER: == racer test complete, duration 309 sec == 11:33:46 (1375979626)
      
      3 hours later MDT1 was spontaneously evicted from MDT0 and I saw the following:
      

      Lustre: 24786:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1375990283/real 1375990283] req@ffff88017464a800 x1442809754504504/t0(0) o400->lustre-MDT0000-osp-MDT0001@0@lo:24/10 lens 224/224 e 0 to 1 dl 1375990290 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
      Lustre: lustre-MDT0000: Client lustre-MDT0001-mdtlov_UUID (at 0@lo) reconnecting
      Lustre: lustre-MDT0000-osp-MDT0001: Connection restored to lustre-MDT0000 (at 0@lo)
      Lustre: 24787:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1375990288/real 1375990288] req@ffff880171622800 x1442809754504596/t0(0) o400->lustre-MDT0000-osp-MDT0001@0@lo:24/10 lens 224/224 e 0 to 1 dl 1375990295 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: 24788:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1375990295/real 1375990295] req@ffff880169776000 x1442809754504692/t0(0) o400->lustre-MDT0000-osp-MDT0001@0@lo:24/10 lens 224/224 e 0 to 1 dl 1375990302 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
      Lustre: lustre-MDT0000: Client lustre-MDT0001-mdtlov_UUID (at 0@lo) reconnecting
      LustreError: 26413:0:(mdt_handler.c:3176:mdt_tgt_connect()) ASSERTION( mti != ((void *)0) ) failed:
      LustreError: 26413:0:(mdt_handler.c:3176:mdt_tgt_connect()) LBUG
      Pid: 26413, comm: ll_ost_out01_00

      Call Trace:
      [<ffffffffa0ca1895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [<ffffffffa0ca1e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      [<ffffffffa0ad5955>] mdt_tgt_connect+0x515/0x550 [mdt]
      [<ffffffffa06379fd>] tgt_request_handle+0x57d/0xe30 [ptlrpc]
      [<ffffffffa05f4638>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      [<ffffffffa0ca254e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      [<ffffffffa0cb3a6f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      [<ffffffffa05eba49>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      [<ffffffff81055ab3>] ? __wake_up+0x53/0x70
      [<ffffffffa05f59bd>] ptlrpc_main+0xabd/0x1700 [ptlrpc]
      [<ffffffffa05f4f00>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      [<ffffffff81096936>] kthread+0x96/0xa0
      [<ffffffff8100c0ca>] child_rip+0xa/0x20
      [<ffffffff810968a0>] ? kthread+0x0/0xa0
      [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

      
      

      I was ahead of master by two xattr patches and two small HSM patches, but I suspect that they are not the issue.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: