Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4179

LBUG ASSERTION( !lustre_handle_is_used(&lhc->mlh_reg_lh) ) failed:

Details

    • 3
    • 11313

    Description

      We have the crash dumps. But requires analysis by US personal only.

      LustreError: 6425:0:(mdt_open.c:1690:mdt_reint_open()) ASSERTION( !lustre_handle_is_used(&lhc->mlh_reg_lh) ) failed: ^M
      LustreError: 6425:0:(mdt_open.c:1690:mdt_reint_open()) LBUG^M
      4>Pid: 6425, comm: mdt01_002^M
      <4>^M
      <4>Call Trace:^M
      <4> [<ffffffffa041f895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]^M
      <4> [<ffffffffa041fe97>] lbug_with_loc+0x47/0xb0 [libcfs]^M
      <4> [<ffffffffa0ca7553>] mdt_reint_open+0x1973/0x20c0 [mdt]^M
      <4> [<ffffffffa0ca832c>] mdt_reconstruct_open+0x68c/0xc30 [mdt]^M
      <4> [<ffffffffa072d6a6>] ? __req_capsule_get+0x166/0x700 [ptlrpc]^M
      <4> [<ffffffffa07061ae>] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc]^M
      <4> [<ffffffffa0c9b195>] mdt_reconstruct+0x45/0x120 [mdt]^M
      <4> [<ffffffffa0c76cfb>] mdt_reint_internal+0x6bb/0x780 [mdt]^M
      <4> [<ffffffffa0c7708d>] mdt_intent_reint+0x1ed/0x520 [mdt]^M
      <4> [<ffffffffa0c74f3e>] mdt_intent_policy+0x39e/0x720 [mdt]^M
      <4> [<ffffffffa06bd7e1>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]^M
      <4> [<ffffffffa06e424f>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]^M
      <4> [<ffffffffa0c753c6>] mdt_enqueue+0x46/0xe0 [mdt]^M
      <4> [<ffffffffa0c7bab7>] mdt_handle_common+0x647/0x16d0 [mdt]^M
      <4> [<ffffffffa0706c0c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]^M
      <4> [<ffffffffa0cb5295>] mds_regular_handle+0x15/0x20 [mdt]^M
      <4> [<ffffffffa0716428>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]^M
      <4> [<ffffffffa04205de>] ? cfs_timer_arm+0xe/0x10 [libcfs]^M
      <4> [<ffffffffa0431dbf>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]^M
      <4> [<ffffffffa070d789>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]^M
      <4> [<ffffffff810557f3>] ? __wake_up+0x53/0x70^M
      <4> [<ffffffffa07177be>] ptlrpc_main+0xace/0x1700 [ptlrpc]^M
      <4> [<ffffffffa0716cf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]^M
      <4> [<ffffffff8100c0ca>] child_rip+0xa/0x20^M
      <4> [<ffffffffa0716cf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]^M
      <4> [<ffffffffa0716cf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]^M
      <4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20^M
      <4>^M
      <0>Kernel panic - not syncing: LBUG^M
      <4>Pid: 6425, comm: mdt01_002 Tainted: G --------------- T 2.6.32-358.6.2.el6.20130607.x86_64.lustre240 #1^M
      <4>Call Trace:^M
      <4> [<ffffffff8153e8da>] ? panic+0xa7/0x190^M
      <4> [<ffffffffa041feeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]^M
      <4> [<ffffffffa0ca7553>] ? mdt_reint_open+0x1973/0x20c0 [mdt]^M
      <4> [<ffffffffa0ca832c>] ? mdt_reconstruct_open+0x68c/0xc30 [mdt]^M
      <4> [<ffffffffa072d6a6>] ? __req_capsule_get+0x166/0x700 [ptlrpc]^M
      <4> [<ffffffffa07061ae>] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc]^M
      <4> [<ffffffffa0c9b195>] ? mdt_reconstruct+0x45/0x120 [mdt]^M
      <4> [<ffffffffa0c76cfb>] ? mdt_reint_internal+0x6bb/0x780 [mdt]^M
      <4> [<ffffffffa0c7708d>] ? mdt_intent_reint+0x1ed/0x520 [mdt]^M
      <4> [<ffffffffa0c74f3e>] ? mdt_intent_policy+0x39e/0x720 [mdt]^M
      <4> [<ffffffffa06bd7e1>] ? ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]^M
      <4> [<ffffffffa06e424f>] ? ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]^M
      <4> [<ffffffffa0c753c6>] ? mdt_enqueue+0x46/0xe0 [mdt]^M
      <4> [<ffffffffa0c7bab7>] ? mdt_handle_common+0x647/0x16d0 [mdt]^M
      <4> [<ffffffffa0706c0c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]^M
      <4> [<ffffffffa0cb5295>] ? mds_regular_handle+0x15/0x20 [mdt]^M
      <4> [<ffffffffa0716428>] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]^M
      <4> [<ffffffffa04205de>] ? cfs_timer_arm+0xe/0x10 [libcfs]^M
      <4> [<ffffffffa0431dbf>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]^M
      <4> [<ffffffffa070d789>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]^M
      <4> [<ffffffff810557f3>] ? __wake_up+0x53/0x70^M
      <4> [<ffffffffa07177be>] ? ptlrpc_main+0xace/0x1700 [ptlrpc]^M
      <4> [<ffffffffa0716cf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]^M
      <4> [<ffffffff8100c0ca>] ? child_rip+0xa/0x20^M
      <4> [<ffffffffa0716cf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]^M
      <4> [<ffffffffa0716cf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]^M
      <4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20^M

      Attachments

        Issue Links

          Activity

            [LU-4179] LBUG ASSERTION( !lustre_handle_is_used(&lhc->mlh_reg_lh) ) failed:

            Yes, they are same, but just some tag issue. If you just pull out the patch and build yourself, you can ignore it.

            di.wang Di Wang (Inactive) added a comment - Yes, they are same, but just some tag issue. If you just pull out the patch and build yourself, you can ignore it.
            jaylan Jay Lan (Inactive) added a comment - - edited

            Hmm, I had no problem patching and building based on
            http://review.whamcloud.com/8145 in my b2_4 build environment.

            I compared one in 8145 with one in 8173, the patches seemed to be identical to me?

            jaylan Jay Lan (Inactive) added a comment - - edited Hmm, I had no problem patching and building based on http://review.whamcloud.com/8145 in my b2_4 build environment. I compared one in 8145 with one in 8173, the patches seemed to be identical to me?

            http://review.whamcloud.com/8145 has the wrong tag "based on master), actually that patch is based on b2_4, so I resubmit a new one http://review.whamcloud.com/8173 please track this one.

            di.wang Di Wang (Inactive) added a comment - http://review.whamcloud.com/8145 has the wrong tag "based on master), actually that patch is based on b2_4, so I resubmit a new one http://review.whamcloud.com/8173 please track this one.

            IMHO, the patch itself is fine, but we can not say it is "safe', until it went through all of our validation processes, i.e. reviewing, passing internal tests and landing, as peter mentioned.

            di.wang Di Wang (Inactive) added a comment - IMHO, the patch itself is fine, but we can not say it is "safe', until it went through all of our validation processes, i.e. reviewing, passing internal tests and landing, as peter mentioned.

            What is the risk with this patch. We can't fully test the patch using test env because we didn't isolate trigger. We would need to deploy it on production to fully test it.

            mhanafi Mahmoud Hanafi added a comment - What is the risk with this patch. We can't fully test the patch using test env because we didn't isolate trigger. We would need to deploy it on production to fully test it.
            pjones Peter Jones added a comment -

            Jay

            Just to be clear - please only use the patch in a test environment until we have completed our validation

            Thanks

            Peter

            pjones Peter Jones added a comment - Jay Just to be clear - please only use the patch in a test environment until we have completed our validation Thanks Peter

            http://review.whamcloud.com/8145 Here is the one for b2_4, and the patch should be able to applied to 2.4.0 as well, please try.

            di.wang Di Wang (Inactive) added a comment - http://review.whamcloud.com/8145 Here is the one for b2_4, and the patch should be able to applied to 2.4.0 as well, please try.

            Hi Di,

            The new patch can not be applied to 2.4.0.

            Your new patch contains this line:
            GOTO(out_child_unlock, result = rc);
            but my code does not have out_child_unlock label.
            There is "out_child" label in the routine, but I can not tell what other changes need to be made.

            Could you write a patch that is applicable on top of 2.4.0?

            Thanks!

            jaylan Jay Lan (Inactive) added a comment - Hi Di, The new patch can not be applied to 2.4.0. Your new patch contains this line: GOTO(out_child_unlock, result = rc); but my code does not have out_child_unlock label. There is "out_child" label in the routine, but I can not tell what other changes need to be made. Could you write a patch that is applicable on top of 2.4.0? Thanks!

            I just updated the patch http://review.whamcloud.com/#/c/8142/ , please try again. Thanks.

            di.wang Di Wang (Inactive) added a comment - I just updated the patch http://review.whamcloud.com/#/c/8142/ , please try again. Thanks.
            di.wang Di Wang (Inactive) added a comment - - edited

            Hmm, the patch post on http://review.whamcloud.com/8142 missing a "}"

            diff --git a/lustre/mdt/mdt_open.c b/lustre/mdt/mdt_open.c
            index b4057a0..570d07c 100644
            --- a/lustre/mdt/mdt_open.c
            +++ b/lustre/mdt/mdt_open.c
            @@ -1841,17 +1841,18 @@ int mdt_reint_open(struct mdt_thread_info *info, struct mdt_lock_handle *lhc)
                            }
                     }
             
            -        LASSERT(!lustre_handle_is_used(&lhc->mlh_reg_lh));
            -
            -       /* get openlock if this is not replay and if a client requested it */
            -       if (!req_is_replay(req)) {
            -               rc = mdt_object_open_lock(info, child, lhc, &ibits);
            -               if (rc != 0)
            -                       GOTO(out_child_unlock, result = rc);
            -               else if (create_flags & MDS_OPEN_LOCK)
            -                       mdt_set_disposition(info, ldlm_rep, DISP_OPEN_LOCK);
            +       if (lustre_handle_is_used(&lhc->mlh_reg_lh)) {
            +               LASSERT((lustre_msg_get_flags(req->rq_reqmsg) & MSG_RESENT));
            +       } else {
            +               /* get openlock if this is not replay and if a client requested it */
            +               if (!req_is_replay(req)) {
            +                       rc = mdt_object_open_lock(info, child, lhc, &ibits);
            +                       if (rc != 0)
            +                               GOTO(out_child_unlock, result = rc);
            +                       else if (create_flags & MDS_OPEN_LOCK)
            +                               mdt_set_disposition(info, ldlm_rep, DISP_OPEN_LOCK);
            +               }
                    }
            -
                    /* Try to open it now. */
                    rc = mdt_finish_open(info, parent, child, create_flags,
                                         created, ldlm_rep);
            

            But please hold off a bit to try, since I need revisit these code a bit, Thanks!

            di.wang Di Wang (Inactive) added a comment - - edited Hmm, the patch post on http://review.whamcloud.com/8142 missing a "}" diff --git a/lustre/mdt/mdt_open.c b/lustre/mdt/mdt_open.c index b4057a0..570d07c 100644 --- a/lustre/mdt/mdt_open.c +++ b/lustre/mdt/mdt_open.c @@ -1841,17 +1841,18 @@ int mdt_reint_open(struct mdt_thread_info *info, struct mdt_lock_handle *lhc) } } - LASSERT(!lustre_handle_is_used(&lhc->mlh_reg_lh)); - - /* get openlock if this is not replay and if a client requested it */ - if (!req_is_replay(req)) { - rc = mdt_object_open_lock(info, child, lhc, &ibits); - if (rc != 0) - GOTO(out_child_unlock, result = rc); - else if (create_flags & MDS_OPEN_LOCK) - mdt_set_disposition(info, ldlm_rep, DISP_OPEN_LOCK); + if (lustre_handle_is_used(&lhc->mlh_reg_lh)) { + LASSERT((lustre_msg_get_flags(req->rq_reqmsg) & MSG_RESENT)); + } else { + /* get openlock if this is not replay and if a client requested it */ + if (!req_is_replay(req)) { + rc = mdt_object_open_lock(info, child, lhc, &ibits); + if (rc != 0) + GOTO(out_child_unlock, result = rc); + else if (create_flags & MDS_OPEN_LOCK) + mdt_set_disposition(info, ldlm_rep, DISP_OPEN_LOCK); + } } - /* Try to open it now. */ rc = mdt_finish_open(info, parent, child, create_flags, created, ldlm_rep); But please hold off a bit to try, since I need revisit these code a bit, Thanks!

            Sorry Jay, Let me work with this patch some more. We see this with the Master build as well.

            keith Keith Mannthey (Inactive) added a comment - Sorry Jay, Let me work with this patch some more. We see this with the Master build as well.

            People

              di.wang Di Wang (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: