Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5686

(mdt_handler.c:3203:mdt_intent_lock_replace()) ASSERTION( lustre_msg_get_flags(req->rq_reqmsg) & 0x0002 ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.1.6, Lustre 2.4.3
    • Clients:
       - RHEL6 w/ patched kernel 2.6.32-431.11.2.el6
       - Lustre 2.4.3 + bullpatches
      Servers:
       - RHEL6 w/ patched kernel 2.6.32-220.23.1
       - Lustre 2.1.6 + bullpatches
    • 3
    • 15925

    Description

      We hit the following LBUG twice on one of our MDT:

      [78073.117731] Lustre: 31681:0:(ldlm_lib.c:952:target_handle_connect()) work2-MDT0000: connection from 38d12a48-aabd-9279-dc69-b78c4e00321c@10.100.62.72@o2ib2 t189645377601 exp ffff880b95bb1c00 cur 1410508503 last 1410508503
      [78079.176124] Lustre: 31681:0:(mdt_handler.c:1005:mdt_getattr_name_lock()) Although resent, but still not get child lockparent:[0x22f2b0783:0x34b:0x0] child:[0x22d854b6e:0x85d5:0x0]
      [78079.192443] LustreError: 31681:0:(mdt_handler.c:3203:mdt_intent_lock_replace()) ASSERTION( lustre_msg_get_flags(req->rq_reqmsg) & 0x0002 ) failed:
      [78079.205971] LustreError: 31681:0:(mdt_handler.c:3203:mdt_intent_lock_replace()) LBUG
      [78079.215326] Pid: 31681, comm: mdt_104
      [78079.220352]
      [78079.220353] Call Trace:
      [78079.227394]  [<ffffffffa051a7f5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [78079.236100]  [<ffffffffa051ae07>] lbug_with_loc+0x47/0xb0 [libcfs]
      [78079.243815]  [<ffffffffa0d9671b>] mdt_intent_lock_replace+0x3bb/0x440 [mdt]
      [78079.252140]  [<ffffffffa0daad26>] mdt_intent_getattr+0x3a6/0x4a0 [mdt]
      [78079.260391]  [<ffffffffa0da6c09>] mdt_intent_policy+0x379/0x690 [mdt]
      [78079.268641]  [<ffffffffa07423c1>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc]
      [78079.276846]  [<ffffffffa07683cd>] ldlm_handle_enqueue0+0x48d/0xf50 [ptlrpc]
      [78079.285614]  [<ffffffffa0da7586>] mdt_enqueue+0x46/0x130 [mdt]
      [78079.292950]  [<ffffffffa0d9c762>] mdt_handle_common+0x932/0x1750 [mdt]
      [78079.300987]  [<ffffffffa0d9d655>] mdt_regular_handle+0x15/0x20 [mdt]
      [78079.309024]  [<ffffffffa07974f6>] ptlrpc_main+0xd16/0x1a80 [ptlrpc]
      [78079.316979]  [<ffffffff810017cc>] ? __switch_to+0x1ac/0x320
      [78079.324222]  [<ffffffffa07967e0>] ? ptlrpc_main+0x0/0x1a80 [ptlrpc]
      [78079.331896]  [<ffffffff8100412a>] child_rip+0xa/0x20
      [78079.338522]  [<ffffffffa07967e0>] ? ptlrpc_main+0x0/0x1a80 [ptlrpc]
      [78079.346599]  [<ffffffffa07967e0>] ? ptlrpc_main+0x0/0x1a80 [ptlrpc]
      [78079.354520]  [<ffffffff81004120>] ? child_rip+0x0/0x20
      [78079.361136]
      [78079.364683] Kernel panic - not syncing: LBUG
      

      The support engineer was able to retrieve the client node from the crash dump. Both time, the client was a login node running Lustre 2.4.3.

      It looks like LU-5314. The backported patch proposal failed on maloo ( http://review.whamcloud.com/#/c/10902/ )

      Attachments

        Issue Links

          Activity

            People

              bfaccini Bruno Faccini (Inactive)
              bruno.travouillon Bruno Travouillon (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: