[LU-5579] MDS crashed by "mdt_check_resent_lock()) ASSERTION( lock != NULL ) failed" Created: 04/Sep/14 Updated: 01/Feb/22 Resolved: 16/Mar/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.7.0, Lustre 2.8.0, Lustre 2.5.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Liang Zhen (Inactive) | Assignee: | Liang Zhen (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HB, llnl, patch | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 15569 | ||||||||||||||||
| Description |
|
After I enabled message delay on routers, MDS crashed quite soon... <0>LustreError: 13914:0:(mdt_handler.c:2333:mdt_check_resent_lock()) ASSERTION( lock != NULL ) failed: Invalid lock handle 0x80235817e79ffcd <0>LustreError: 13914:0:(mdt_handler.c:2333:mdt_check_resent_lock()) LBUG <4>Pid: 13914, comm: mdt00_009 <4> <4>Call Trace: <4> [<ffffffffa0706895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa0706e97>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa1107111>] mdt_check_resent_lock+0x1b1/0x1f0 [mdt] <4> [<ffffffffa111c22d>] mdt_getattr_name_lock+0x51d/0x1a50 [mdt] <4> [<ffffffffa111dc82>] mdt_intent_getattr+0x292/0x470 [mdt] <4> [<ffffffffa110b879>] mdt_intent_policy+0x499/0xca0 [mdt] <4> [<ffffffffa0a64549>] ldlm_lock_enqueue+0x129/0x9d0 [ptlrpc] <4> [<ffffffffa0a9048b>] ldlm_handle_enqueue0+0x51b/0x13a0 [ptlrpc] <4> [<ffffffffa07074ce>] ? cfs_timer_arm+0xe/0x10 [libcfs] <4> [<ffffffffa0b11d12>] tgt_enqueue+0x62/0x1d0 [ptlrpc] <4> [<ffffffffa0b1259e>] tgt_request_handle+0x71e/0xb10 [ptlrpc] <4> [<ffffffffa0ac15c4>] ptlrpc_main+0xe64/0x1990 [ptlrpc] <4> [<ffffffff810096f0>] ? __switch_to+0xd0/0x320 <4> [<ffffffff810623a9>] ? find_busiest_queue+0x69/0x150 <4> [<ffffffff815294ce>] ? thread_return+0x4e/0x760 <4> [<ffffffffa0ac0760>] ? ptlrpc_main+0x0/0x1990 [ptlrpc] <4> [<ffffffff8109abf6>] kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] child_rip+0xa/0x20 <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 <4> <0>Kernel panic - not syncing: LBUG <4>Pid: 13914, comm: mdt00_009 Tainted: P --------------- 2.6.32-431.23.3.el6_lustre.x86_64 #1 <4>Call Trace: <4> [<ffffffff81528dbc>] ? panic+0xa7/0x16f <4> [<ffffffffa0706eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] <4> [<ffffffffa1107111>] ? mdt_check_resent_lock+0x1b1/0x1f0 [mdt] <4> [<ffffffffa111c22d>] ? mdt_getattr_name_lock+0x51d/0x1a50 [mdt] <4> [<ffffffffa111dc82>] ? mdt_intent_getattr+0x292/0x470 [mdt] <4> [<ffffffffa110b879>] ? mdt_intent_policy+0x499/0xca0 [mdt] <4> [<ffffffffa0a64549>] ? ldlm_lock_enqueue+0x129/0x9d0 [ptlrpc] <4> [<ffffffffa0a9048b>] ? ldlm_handle_enqueue0+0x51b/0x13a0 [ptlrpc] <4> [<ffffffffa07074ce>] ? cfs_timer_arm+0xe/0x10 [libcfs] <4> [<ffffffffa0b11d12>] ? tgt_enqueue+0x62/0x1d0 [ptlrpc] <4> [<ffffffffa0b1259e>] ? tgt_request_handle+0x71e/0xb10 [ptlrpc] <4> [<ffffffffa0ac15c4>] ? ptlrpc_main+0xe64/0x1990 [ptlrpc] <4> [<ffffffff810096f0>] ? __switch_to+0xd0/0x320 <4> [<ffffffff810623a9>] ? find_busiest_queue+0x69/0x150 <4> [<ffffffff815294ce>] ? thread_return+0x4e/0x760 <4> [<ffffffffa0ac0760>] ? ptlrpc_main+0x0/0x1990 [ptlrpc] <4> [<ffffffff8109abf6>] ? kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20 <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 |
| Comments |
| Comment by Vitaly Fertman [ 09/Sep/14 ] |
| Comment by Peter Jones [ 11/Oct/14 ] |
|
Landed for 2.5.4 and 2.7 |
| Comment by Vitaly Fertman [ 17/Oct/14 ] |
|
actually the original patch had a fix and a test, but due to |
| Comment by Peter Jones [ 17/Oct/14 ] |
|
Thanks for the tipoff Vitaly! |
| Comment by Gerrit Updater [ 19/Nov/14 ] |
|
Liang Zhen (liang.zhen@intel.com) uploaded a new patch: http://review.whamcloud.com/12780 |
| Comment by Jodi Levi (Inactive) [ 16/Dec/14 ] |
|
http://review.whamcloud.com/#/c/12232/ is patch to track for this. |
| Comment by Andreas Dilger [ 14/Jan/15 ] |
|
Close this bug where the fix landed, use |
| Comment by Gerrit Updater [ 25/Apr/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12210/ |
| Comment by Joseph Gmitter (Inactive) [ 16/Mar/16 ] |
|
Reopening/resolving again in order to update the FixVersion to correctly reflect that there was a patch landed under this ticket for 2.8.0: |