Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5686

(mdt_handler.c:3203:mdt_intent_lock_replace()) ASSERTION( lustre_msg_get_flags(req->rq_reqmsg) & 0x0002 ) failed

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.1.6, Lustre 2.4.3
    • Clients:
       - RHEL6 w/ patched kernel 2.6.32-431.11.2.el6
       - Lustre 2.4.3 + bullpatches
      Servers:
       - RHEL6 w/ patched kernel 2.6.32-220.23.1
       - Lustre 2.1.6 + bullpatches
    • 3
    • 15925

    Description

      We hit the following LBUG twice on one of our MDT:

      [78073.117731] Lustre: 31681:0:(ldlm_lib.c:952:target_handle_connect()) work2-MDT0000: connection from 38d12a48-aabd-9279-dc69-b78c4e00321c@10.100.62.72@o2ib2 t189645377601 exp ffff880b95bb1c00 cur 1410508503 last 1410508503
      [78079.176124] Lustre: 31681:0:(mdt_handler.c:1005:mdt_getattr_name_lock()) Although resent, but still not get child lockparent:[0x22f2b0783:0x34b:0x0] child:[0x22d854b6e:0x85d5:0x0]
      [78079.192443] LustreError: 31681:0:(mdt_handler.c:3203:mdt_intent_lock_replace()) ASSERTION( lustre_msg_get_flags(req->rq_reqmsg) & 0x0002 ) failed:
      [78079.205971] LustreError: 31681:0:(mdt_handler.c:3203:mdt_intent_lock_replace()) LBUG
      [78079.215326] Pid: 31681, comm: mdt_104
      [78079.220352]
      [78079.220353] Call Trace:
      [78079.227394]  [<ffffffffa051a7f5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [78079.236100]  [<ffffffffa051ae07>] lbug_with_loc+0x47/0xb0 [libcfs]
      [78079.243815]  [<ffffffffa0d9671b>] mdt_intent_lock_replace+0x3bb/0x440 [mdt]
      [78079.252140]  [<ffffffffa0daad26>] mdt_intent_getattr+0x3a6/0x4a0 [mdt]
      [78079.260391]  [<ffffffffa0da6c09>] mdt_intent_policy+0x379/0x690 [mdt]
      [78079.268641]  [<ffffffffa07423c1>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc]
      [78079.276846]  [<ffffffffa07683cd>] ldlm_handle_enqueue0+0x48d/0xf50 [ptlrpc]
      [78079.285614]  [<ffffffffa0da7586>] mdt_enqueue+0x46/0x130 [mdt]
      [78079.292950]  [<ffffffffa0d9c762>] mdt_handle_common+0x932/0x1750 [mdt]
      [78079.300987]  [<ffffffffa0d9d655>] mdt_regular_handle+0x15/0x20 [mdt]
      [78079.309024]  [<ffffffffa07974f6>] ptlrpc_main+0xd16/0x1a80 [ptlrpc]
      [78079.316979]  [<ffffffff810017cc>] ? __switch_to+0x1ac/0x320
      [78079.324222]  [<ffffffffa07967e0>] ? ptlrpc_main+0x0/0x1a80 [ptlrpc]
      [78079.331896]  [<ffffffff8100412a>] child_rip+0xa/0x20
      [78079.338522]  [<ffffffffa07967e0>] ? ptlrpc_main+0x0/0x1a80 [ptlrpc]
      [78079.346599]  [<ffffffffa07967e0>] ? ptlrpc_main+0x0/0x1a80 [ptlrpc]
      [78079.354520]  [<ffffffff81004120>] ? child_rip+0x0/0x20
      [78079.361136]
      [78079.364683] Kernel panic - not syncing: LBUG
      

      The support engineer was able to retrieve the client node from the crash dump. Both time, the client was a login node running Lustre 2.4.3.

      It looks like LU-5314. The backported patch proposal failed on maloo ( http://review.whamcloud.com/#/c/10902/ )

      Attachments

        Issue Links

          Activity

            [LU-5686] (mdt_handler.c:3203:mdt_intent_lock_replace()) ASSERTION( lustre_msg_get_flags(req->rq_reqmsg) & 0x0002 ) failed
            adilger Andreas Dilger made changes -
            Resolution New: Duplicate [ 3 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-5530 [ LU-5530 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-4584 [ LU-4584 ]
            pjones Peter Jones made changes -
            End date New: 12/Feb/15
            Start date New: 30/Sep/14

            Bruno,
            You are right this must be made clear here also, so to be complete, the full list of post LU-2827 related tickets/patches has been well documented and in detail by Oleg in its LU-4584 2x comments dated 11/Sep/14 for LU-4584.

            bfaccini Bruno Faccini (Inactive) added a comment - Bruno, You are right this must be made clear here also, so to be complete, the full list of post LU-2827 related tickets/patches has been well documented and in detail by Oleg in its LU-4584 2x comments dated 11/Sep/14 for LU-4584 .

            Hello Bruno,

            Yes, one of our issue is very close to LU-5530. I see that there is a couple of patches to apply on top of LU-2827. Thanks for the tip.

            bruno.travouillon Bruno Travouillon (Inactive) added a comment - Hello Bruno, Yes, one of our issue is very close to LU-5530 . I see that there is a couple of patches to apply on top of LU-2827 . Thanks for the tip.

            Hello Bruno,
            I wonder if your new "ldlm-related" issues could be like those reported in LU-5530 ?

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Bruno, I wonder if your new "ldlm-related" issues could be like those reported in LU-5530 ?

            Hi,

            We are now running Lustre 2.5.3 + b2_5 patch http://review.whamcloud.com/#/c/10492/. Since the upgrade, we are hitting several issues on MDS/OSS around the ldlm. Are you aware of any complementary fix that we should apply with this one?

            In the meantime, we are still investigating those issues onsite and will report them asap in new JIRA tickets.

            bruno.travouillon Bruno Travouillon (Inactive) added a comment - Hi, We are now running Lustre 2.5.3 + b2_5 patch http://review.whamcloud.com/#/c/10492/ . Since the upgrade, we are hitting several issues on MDS/OSS around the ldlm. Are you aware of any complementary fix that we should apply with this one? In the meantime, we are still investigating those issues onsite and will report them asap in new JIRA tickets.

            Hello Bruno,

            In fact there are regression issues with b2_4 back-port (http://review.whamcloud.com/#/c/10902/) of LU-2827 changes. And I checked that the b2_5 version (http://review.whamcloud.com/#/c/10492/) is ok and that it will land soon now.

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Bruno, In fact there are regression issues with b2_4 back-port ( http://review.whamcloud.com/#/c/10902/ ) of LU-2827 changes. And I checked that the b2_5 version ( http://review.whamcloud.com/#/c/10492/ ) is ok and that it will land soon now.
            pjones Peter Jones made changes -
            Labels New: p4b

            People

              bfaccini Bruno Faccini (Inactive)
              bruno.travouillon Bruno Travouillon (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: