Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6189

LustreError: (mdt_handler.c:4078:mdt_intent_reint()) ASSERTION( rc == 0 ) failed: Error occurred but lock handle is still in use, rc = -116

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.5.3
    • None
    • 2
    • 17312

    Description

      This morning within a few hours of each other, we hit this LBUG which caused the MDS to crash. The first time after reboot we had to abort recovery to get lustre back. We have a crashdump from the MDS.

      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.805235] LustreError: 0:0:(ldlm_lockd.c:344:waiting_locks_callback()) ### lock callback timer expired after 375s: evicting client at 4966@gni100 ns: mdt-
      atlas1-MDT0000_UUID lock: ffff881ec6e16c80/0xfc6e8aed747d1af2 lrc: 4/0,0 mode: CR/CR res: [0x2001a597a:0x85:0x0].0 bits 0x2 rrc: 4 type: IBT flags: 0x60200000000020 nid: 4966@gni100 remote: 0x20ee476ee499c158
      expref: 132 pid: 16827 timeout: 4301930544 lvb_type: 0
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.858358] LustreError: 16827:0:(mdt_handler.c:4078:mdt_intent_reint()) ASSERTION( rc == 0 ) failed: Error occurred but lock handle is still in use, rc = -1
      16
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.874660] LustreError: 16827:0:(mdt_handler.c:4078:mdt_intent_reint()) LBUG
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.882757] Pid: 16827, comm: mdt00_224
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.887151]
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.887152] Call Trace:
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.891770] [<ffffffffa0407895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.899670] [<ffffffffa0407e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.906710] [<ffffffffa0d4379a>] mdt_intent_reint+0x51a/0x520 [mdt]
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.913933] [<ffffffffa0d40c4e>] mdt_intent_policy+0x3ae/0x770 [mdt]
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.921281] [<ffffffffa06de2e5>] ldlm_lock_enqueue+0x135/0x980 [ptlrpc]
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.928910] [<ffffffffa0707d0b>] ldlm_handle_enqueue0+0x51b/0x10c0 [ptlrpc]
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.936903] [<ffffffff81069f75>] ? enqueue_entity+0x125/0x450
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.943544] [<ffffffffa0d41116>] mdt_enqueue+0x46/0xe0 [mdt]
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.950094] [<ffffffffa0d4602a>] mdt_handle_common+0x52a/0x1470 [mdt]
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.957515] [<ffffffffa0d833e5>] mds_regular_handle+0x15/0x20 [mdt]
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.964770] [<ffffffffa0737fe5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.973547] [<ffffffffa04084ce>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.980677] [<ffffffffa04193cf>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.988407] [<ffffffffa072f6c9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7271.996116] [<ffffffff810546b9>] ? __wake_up_common+0x59/0x90
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7272.002774] [<ffffffffa073934d>] ptlrpc_main+0xaed/0x1760 [ptlrpc]
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7272.009920] [<ffffffffa0738860>] ? ptlrpc_main+0x0/0x1760 [ptlrpc]
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7272.017040] [<ffffffff8109ab56>] kthread+0x96/0xa0
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7272.022607] [<ffffffff8100c20a>] child_rip+0xa/0x20
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7272.028267] [<ffffffff8109aac0>] ? kthread+0x0/0xa0
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7272.033930] [<ffffffff8100c200>] ? child_rip+0x0/0x20
      Feb 1 10:03:15 atlas-mds1.ccs.ornl.gov kernel: [ 7272.039782]

      Attachments

        Issue Links

          Activity

            People

              pjones Peter Jones
              curtispb Philip B Curtis
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: