Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5579

MDS crashed by "mdt_check_resent_lock()) ASSERTION( lock != NULL ) failed"

Details

    • 3
    • 15569

    Description

      After I enabled message delay on routers, MDS crashed quite soon...

      <0>LustreError: 13914:0:(mdt_handler.c:2333:mdt_check_resent_lock()) ASSERTION( lock != NULL ) failed: Invalid lock handle 0x80235817e79ffcd
      <0>LustreError: 13914:0:(mdt_handler.c:2333:mdt_check_resent_lock()) LBUG
      <4>Pid: 13914, comm: mdt00_009
      <4>
      <4>Call Trace:
      <4> [<ffffffffa0706895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4> [<ffffffffa0706e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4> [<ffffffffa1107111>] mdt_check_resent_lock+0x1b1/0x1f0 [mdt]
      <4> [<ffffffffa111c22d>] mdt_getattr_name_lock+0x51d/0x1a50 [mdt]
      <4> [<ffffffffa111dc82>] mdt_intent_getattr+0x292/0x470 [mdt]
      <4> [<ffffffffa110b879>] mdt_intent_policy+0x499/0xca0 [mdt]
      <4> [<ffffffffa0a64549>] ldlm_lock_enqueue+0x129/0x9d0 [ptlrpc]
      <4> [<ffffffffa0a9048b>] ldlm_handle_enqueue0+0x51b/0x13a0 [ptlrpc]
      <4> [<ffffffffa07074ce>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      <4> [<ffffffffa0b11d12>] tgt_enqueue+0x62/0x1d0 [ptlrpc]
      <4> [<ffffffffa0b1259e>] tgt_request_handle+0x71e/0xb10 [ptlrpc]
      <4> [<ffffffffa0ac15c4>] ptlrpc_main+0xe64/0x1990 [ptlrpc]
      <4> [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
      <4> [<ffffffff810623a9>] ? find_busiest_queue+0x69/0x150
      <4> [<ffffffff815294ce>] ? thread_return+0x4e/0x760
      <4> [<ffffffffa0ac0760>] ? ptlrpc_main+0x0/0x1990 [ptlrpc]
      <4> [<ffffffff8109abf6>] kthread+0x96/0xa0
      <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
      <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0
      <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
      <4>
      <0>Kernel panic - not syncing: LBUG
      <4>Pid: 13914, comm: mdt00_009 Tainted: P           ---------------    2.6.32-431.23.3.el6_lustre.x86_64 #1
      <4>Call Trace:
      <4> [<ffffffff81528dbc>] ? panic+0xa7/0x16f
      <4> [<ffffffffa0706eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
      <4> [<ffffffffa1107111>] ? mdt_check_resent_lock+0x1b1/0x1f0 [mdt]
      <4> [<ffffffffa111c22d>] ? mdt_getattr_name_lock+0x51d/0x1a50 [mdt]
      <4> [<ffffffffa111dc82>] ? mdt_intent_getattr+0x292/0x470 [mdt]
      <4> [<ffffffffa110b879>] ? mdt_intent_policy+0x499/0xca0 [mdt]
      <4> [<ffffffffa0a64549>] ? ldlm_lock_enqueue+0x129/0x9d0 [ptlrpc]
      <4> [<ffffffffa0a9048b>] ? ldlm_handle_enqueue0+0x51b/0x13a0 [ptlrpc]
      <4> [<ffffffffa07074ce>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      <4> [<ffffffffa0b11d12>] ? tgt_enqueue+0x62/0x1d0 [ptlrpc]
      <4> [<ffffffffa0b1259e>] ? tgt_request_handle+0x71e/0xb10 [ptlrpc]
      <4> [<ffffffffa0ac15c4>] ? ptlrpc_main+0xe64/0x1990 [ptlrpc]
      <4> [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
      <4> [<ffffffff810623a9>] ? find_busiest_queue+0x69/0x150
      <4> [<ffffffff815294ce>] ? thread_return+0x4e/0x760
      <4> [<ffffffffa0ac0760>] ? ptlrpc_main+0x0/0x1990 [ptlrpc]
      <4> [<ffffffff8109abf6>] ? kthread+0x96/0xa0
      <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20
      <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0
      <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
      
      

      Attachments

        Issue Links

          Activity

            [LU-5579] MDS crashed by "mdt_check_resent_lock()) ASSERTION( lock != NULL ) failed"

            Reopening/resolving again in order to update the FixVersion to correctly reflect that there was a patch landed under this ticket for 2.8.0:
            http://review.whamcloud.com/#/c/12210/

            jgmitter Joseph Gmitter (Inactive) added a comment - Reopening/resolving again in order to update the FixVersion to correctly reflect that there was a patch landed under this ticket for 2.8.0: http://review.whamcloud.com/#/c/12210/

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12210/
            Subject: LU-5579 tests: Add test for resend enqueue vs lock destroy
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: cfbd77e4cd349f9414552e80e9f78f427ab13b53

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12210/ Subject: LU-5579 tests: Add test for resend enqueue vs lock destroy Project: fs/lustre-release Branch: master Current Patch Set: Commit: cfbd77e4cd349f9414552e80e9f78f427ab13b53

            Close this bug where the fix landed, use LU-5604 for landing the test script patch.

            adilger Andreas Dilger added a comment - Close this bug where the fix landed, use LU-5604 for landing the test script patch.
            jlevi Jodi Levi (Inactive) added a comment - http://review.whamcloud.com/#/c/12232/ is patch to track for this.

            Liang Zhen (liang.zhen@intel.com) uploaded a new patch: http://review.whamcloud.com/12780
            Subject: LU-5579 test: fixes for OBD_FAIL_LDLM_REPLY etc
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 08450b44d6a59c83698a2a223e702daa37f5e135

            gerrit Gerrit Updater added a comment - Liang Zhen (liang.zhen@intel.com) uploaded a new patch: http://review.whamcloud.com/12780 Subject: LU-5579 test: fixes for OBD_FAIL_LDLM_REPLY etc Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 08450b44d6a59c83698a2a223e702daa37f5e135
            pjones Peter Jones added a comment -

            Thanks for the tipoff Vitaly!

            pjones Peter Jones added a comment - Thanks for the tipoff Vitaly!

            actually the original patch had a fix and a test, but due to LU-5709 test did not worked so was split to 2 parts and only the fix landed. the test was moved here: http://review.whamcloud.com/#/c/12210/

            vitaly_fertman Vitaly Fertman added a comment - actually the original patch had a fix and a test, but due to LU-5709 test did not worked so was split to 2 parts and only the fix landed. the test was moved here: http://review.whamcloud.com/#/c/12210/
            pjones Peter Jones added a comment -

            Landed for 2.5.4 and 2.7

            pjones Peter Jones added a comment - Landed for 2.5.4 and 2.7
            vitaly_fertman Vitaly Fertman added a comment - CODE: http://review.whamcloud.com/11839

            People

              liang Liang Zhen (Inactive)
              liang Liang Zhen (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: