Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4179

LBUG ASSERTION( !lustre_handle_is_used(&lhc->mlh_reg_lh) ) failed:

Details

    • 3
    • 11313

    Description

      We have the crash dumps. But requires analysis by US personal only.

      LustreError: 6425:0:(mdt_open.c:1690:mdt_reint_open()) ASSERTION( !lustre_handle_is_used(&lhc->mlh_reg_lh) ) failed: ^M
      LustreError: 6425:0:(mdt_open.c:1690:mdt_reint_open()) LBUG^M
      4>Pid: 6425, comm: mdt01_002^M
      <4>^M
      <4>Call Trace:^M
      <4> [<ffffffffa041f895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]^M
      <4> [<ffffffffa041fe97>] lbug_with_loc+0x47/0xb0 [libcfs]^M
      <4> [<ffffffffa0ca7553>] mdt_reint_open+0x1973/0x20c0 [mdt]^M
      <4> [<ffffffffa0ca832c>] mdt_reconstruct_open+0x68c/0xc30 [mdt]^M
      <4> [<ffffffffa072d6a6>] ? __req_capsule_get+0x166/0x700 [ptlrpc]^M
      <4> [<ffffffffa07061ae>] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc]^M
      <4> [<ffffffffa0c9b195>] mdt_reconstruct+0x45/0x120 [mdt]^M
      <4> [<ffffffffa0c76cfb>] mdt_reint_internal+0x6bb/0x780 [mdt]^M
      <4> [<ffffffffa0c7708d>] mdt_intent_reint+0x1ed/0x520 [mdt]^M
      <4> [<ffffffffa0c74f3e>] mdt_intent_policy+0x39e/0x720 [mdt]^M
      <4> [<ffffffffa06bd7e1>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]^M
      <4> [<ffffffffa06e424f>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]^M
      <4> [<ffffffffa0c753c6>] mdt_enqueue+0x46/0xe0 [mdt]^M
      <4> [<ffffffffa0c7bab7>] mdt_handle_common+0x647/0x16d0 [mdt]^M
      <4> [<ffffffffa0706c0c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]^M
      <4> [<ffffffffa0cb5295>] mds_regular_handle+0x15/0x20 [mdt]^M
      <4> [<ffffffffa0716428>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]^M
      <4> [<ffffffffa04205de>] ? cfs_timer_arm+0xe/0x10 [libcfs]^M
      <4> [<ffffffffa0431dbf>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]^M
      <4> [<ffffffffa070d789>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]^M
      <4> [<ffffffff810557f3>] ? __wake_up+0x53/0x70^M
      <4> [<ffffffffa07177be>] ptlrpc_main+0xace/0x1700 [ptlrpc]^M
      <4> [<ffffffffa0716cf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]^M
      <4> [<ffffffff8100c0ca>] child_rip+0xa/0x20^M
      <4> [<ffffffffa0716cf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]^M
      <4> [<ffffffffa0716cf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]^M
      <4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20^M
      <4>^M
      <0>Kernel panic - not syncing: LBUG^M
      <4>Pid: 6425, comm: mdt01_002 Tainted: G --------------- T 2.6.32-358.6.2.el6.20130607.x86_64.lustre240 #1^M
      <4>Call Trace:^M
      <4> [<ffffffff8153e8da>] ? panic+0xa7/0x190^M
      <4> [<ffffffffa041feeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]^M
      <4> [<ffffffffa0ca7553>] ? mdt_reint_open+0x1973/0x20c0 [mdt]^M
      <4> [<ffffffffa0ca832c>] ? mdt_reconstruct_open+0x68c/0xc30 [mdt]^M
      <4> [<ffffffffa072d6a6>] ? __req_capsule_get+0x166/0x700 [ptlrpc]^M
      <4> [<ffffffffa07061ae>] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc]^M
      <4> [<ffffffffa0c9b195>] ? mdt_reconstruct+0x45/0x120 [mdt]^M
      <4> [<ffffffffa0c76cfb>] ? mdt_reint_internal+0x6bb/0x780 [mdt]^M
      <4> [<ffffffffa0c7708d>] ? mdt_intent_reint+0x1ed/0x520 [mdt]^M
      <4> [<ffffffffa0c74f3e>] ? mdt_intent_policy+0x39e/0x720 [mdt]^M
      <4> [<ffffffffa06bd7e1>] ? ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]^M
      <4> [<ffffffffa06e424f>] ? ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]^M
      <4> [<ffffffffa0c753c6>] ? mdt_enqueue+0x46/0xe0 [mdt]^M
      <4> [<ffffffffa0c7bab7>] ? mdt_handle_common+0x647/0x16d0 [mdt]^M
      <4> [<ffffffffa0706c0c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]^M
      <4> [<ffffffffa0cb5295>] ? mds_regular_handle+0x15/0x20 [mdt]^M
      <4> [<ffffffffa0716428>] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]^M
      <4> [<ffffffffa04205de>] ? cfs_timer_arm+0xe/0x10 [libcfs]^M
      <4> [<ffffffffa0431dbf>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]^M
      <4> [<ffffffffa070d789>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]^M
      <4> [<ffffffff810557f3>] ? __wake_up+0x53/0x70^M
      <4> [<ffffffffa07177be>] ? ptlrpc_main+0xace/0x1700 [ptlrpc]^M
      <4> [<ffffffffa0716cf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]^M
      <4> [<ffffffff8100c0ca>] ? child_rip+0xa/0x20^M
      <4> [<ffffffffa0716cf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]^M
      <4> [<ffffffffa0716cf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]^M
      <4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20^M

      Attachments

        Issue Links

          Activity

            [LU-4179] LBUG ASSERTION( !lustre_handle_is_used(&lhc->mlh_reg_lh) ) failed:
            pjones Peter Jones added a comment -

            Jay

            Just to be clear - please only use the patch in a test environment until we have completed our validation

            Thanks

            Peter

            pjones Peter Jones added a comment - Jay Just to be clear - please only use the patch in a test environment until we have completed our validation Thanks Peter

            http://review.whamcloud.com/8145 Here is the one for b2_4, and the patch should be able to applied to 2.4.0 as well, please try.

            di.wang Di Wang (Inactive) added a comment - http://review.whamcloud.com/8145 Here is the one for b2_4, and the patch should be able to applied to 2.4.0 as well, please try.

            Hi Di,

            The new patch can not be applied to 2.4.0.

            Your new patch contains this line:
            GOTO(out_child_unlock, result = rc);
            but my code does not have out_child_unlock label.
            There is "out_child" label in the routine, but I can not tell what other changes need to be made.

            Could you write a patch that is applicable on top of 2.4.0?

            Thanks!

            jaylan Jay Lan (Inactive) added a comment - Hi Di, The new patch can not be applied to 2.4.0. Your new patch contains this line: GOTO(out_child_unlock, result = rc); but my code does not have out_child_unlock label. There is "out_child" label in the routine, but I can not tell what other changes need to be made. Could you write a patch that is applicable on top of 2.4.0? Thanks!

            I just updated the patch http://review.whamcloud.com/#/c/8142/ , please try again. Thanks.

            di.wang Di Wang (Inactive) added a comment - I just updated the patch http://review.whamcloud.com/#/c/8142/ , please try again. Thanks.
            di.wang Di Wang (Inactive) added a comment - - edited

            Hmm, the patch post on http://review.whamcloud.com/8142 missing a "}"

            diff --git a/lustre/mdt/mdt_open.c b/lustre/mdt/mdt_open.c
            index b4057a0..570d07c 100644
            --- a/lustre/mdt/mdt_open.c
            +++ b/lustre/mdt/mdt_open.c
            @@ -1841,17 +1841,18 @@ int mdt_reint_open(struct mdt_thread_info *info, struct mdt_lock_handle *lhc)
                            }
                     }
             
            -        LASSERT(!lustre_handle_is_used(&lhc->mlh_reg_lh));
            -
            -       /* get openlock if this is not replay and if a client requested it */
            -       if (!req_is_replay(req)) {
            -               rc = mdt_object_open_lock(info, child, lhc, &ibits);
            -               if (rc != 0)
            -                       GOTO(out_child_unlock, result = rc);
            -               else if (create_flags & MDS_OPEN_LOCK)
            -                       mdt_set_disposition(info, ldlm_rep, DISP_OPEN_LOCK);
            +       if (lustre_handle_is_used(&lhc->mlh_reg_lh)) {
            +               LASSERT((lustre_msg_get_flags(req->rq_reqmsg) & MSG_RESENT));
            +       } else {
            +               /* get openlock if this is not replay and if a client requested it */
            +               if (!req_is_replay(req)) {
            +                       rc = mdt_object_open_lock(info, child, lhc, &ibits);
            +                       if (rc != 0)
            +                               GOTO(out_child_unlock, result = rc);
            +                       else if (create_flags & MDS_OPEN_LOCK)
            +                               mdt_set_disposition(info, ldlm_rep, DISP_OPEN_LOCK);
            +               }
                    }
            -
                    /* Try to open it now. */
                    rc = mdt_finish_open(info, parent, child, create_flags,
                                         created, ldlm_rep);
            

            But please hold off a bit to try, since I need revisit these code a bit, Thanks!

            di.wang Di Wang (Inactive) added a comment - - edited Hmm, the patch post on http://review.whamcloud.com/8142 missing a "}" diff --git a/lustre/mdt/mdt_open.c b/lustre/mdt/mdt_open.c index b4057a0..570d07c 100644 --- a/lustre/mdt/mdt_open.c +++ b/lustre/mdt/mdt_open.c @@ -1841,17 +1841,18 @@ int mdt_reint_open(struct mdt_thread_info *info, struct mdt_lock_handle *lhc) } } - LASSERT(!lustre_handle_is_used(&lhc->mlh_reg_lh)); - - /* get openlock if this is not replay and if a client requested it */ - if (!req_is_replay(req)) { - rc = mdt_object_open_lock(info, child, lhc, &ibits); - if (rc != 0) - GOTO(out_child_unlock, result = rc); - else if (create_flags & MDS_OPEN_LOCK) - mdt_set_disposition(info, ldlm_rep, DISP_OPEN_LOCK); + if (lustre_handle_is_used(&lhc->mlh_reg_lh)) { + LASSERT((lustre_msg_get_flags(req->rq_reqmsg) & MSG_RESENT)); + } else { + /* get openlock if this is not replay and if a client requested it */ + if (!req_is_replay(req)) { + rc = mdt_object_open_lock(info, child, lhc, &ibits); + if (rc != 0) + GOTO(out_child_unlock, result = rc); + else if (create_flags & MDS_OPEN_LOCK) + mdt_set_disposition(info, ldlm_rep, DISP_OPEN_LOCK); + } } - /* Try to open it now. */ rc = mdt_finish_open(info, parent, child, create_flags, created, ldlm_rep); But please hold off a bit to try, since I need revisit these code a bit, Thanks!

            Sorry Jay, Let me work with this patch some more. We see this with the Master build as well.

            keith Keith Mannthey (Inactive) added a comment - Sorry Jay, Let me work with this patch some more. We see this with the Master build as well.

            I applied the patch against 2.4.0 and built. Hit build errors:

            /usr/src/redhat/BUILD/lustre-2.4.0/lustre/mdt/mdt_open.c: In function 'mdt_reint_open':
            /usr/src/redhat/BUILD/lustre-2.4.0/lustre/mdt/mdt_open.c:1741: error: invalid storage class for function 'mdt_mfd_closed'
            cc1: warnings being treated as errors
            /usr/src/redhat/BUILD/lustre-2.4.0/lustre/mdt/mdt_open.c:1740: error: ISO C90 forbids mixed declarations and code
            /usr/src/redhat/BUILD/lustre-2.4.0/lustre/mdt/mdt_open.c:2004: error: expected declaration or statement at end of input
            make[8]: *** [/usr/src/redhat/BUILD/lustre-2.4.0/lustre/mdt/mdt_open.o] Error 1
            make[7]: *** [/usr/src/redhat/BUILD/lustre-2.4.0/lustre/mdt] Error 2
            make[7]: *** Waiting for unfinished jobs....
            make[6]: *** [/usr/src/redhat/BUILD/lustre-2.4.0/lustre] Error 2
            make[5]: *** [_module_/usr/src/redhat/BUILD/lustre-2.4.0] Error 2

            jaylan Jay Lan (Inactive) added a comment - I applied the patch against 2.4.0 and built. Hit build errors: /usr/src/redhat/BUILD/lustre-2.4.0/lustre/mdt/mdt_open.c: In function 'mdt_reint_open': /usr/src/redhat/BUILD/lustre-2.4.0/lustre/mdt/mdt_open.c:1741: error: invalid storage class for function 'mdt_mfd_closed' cc1: warnings being treated as errors /usr/src/redhat/BUILD/lustre-2.4.0/lustre/mdt/mdt_open.c:1740: error: ISO C90 forbids mixed declarations and code /usr/src/redhat/BUILD/lustre-2.4.0/lustre/mdt/mdt_open.c:2004: error: expected declaration or statement at end of input make [8] : *** [/usr/src/redhat/BUILD/lustre-2.4.0/lustre/mdt/mdt_open.o] Error 1 make [7] : *** [/usr/src/redhat/BUILD/lustre-2.4.0/lustre/mdt] Error 2 make [7] : *** Waiting for unfinished jobs.... make [6] : *** [/usr/src/redhat/BUILD/lustre-2.4.0/lustre] Error 2 make [5] : *** [_module_/usr/src/redhat/BUILD/lustre-2.4.0] Error 2

            A patch against master can be tracked here: http://review.whamcloud.com/8142

            keith Keith Mannthey (Inactive) added a comment - A patch against master can be tracked here: http://review.whamcloud.com/8142

            I have attached a possible fix from Di Wang. Please test it and report back.

            If you get an LASSERT with this patch applied please send a fresh crashdump.

            keith Keith Mannthey (Inactive) added a comment - I have attached a possible fix from Di Wang. Please test it and report back. If you get an LASSERT with this patch applied please send a fresh crashdump.

            We not using DNE.

            We are hitting this bug at least once a day. So this is number one on our priority list to be fixed.

            mhanafi Mahmoud Hanafi added a comment - We not using DNE. We are hitting this bug at least once a day. So this is number one on our priority list to be fixed.

            I was able to extract a good Lustre Debug log from the crash. The system was busy with service.c:1079:ptlrpc_update_export_timer() during the time of the error. I don't know if this is important or not yet.

            keith Keith Mannthey (Inactive) added a comment - I was able to extract a good Lustre Debug log from the crash. The system was busy with service.c:1079:ptlrpc_update_export_timer() during the time of the error. I don't know if this is important or not yet.

            People

              di.wang Di Wang (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: