[LU-7825] ldlm_lock.c:810:ldlm_lock_decref_internal_nolock()) ASSERTION( lock->l_writers > 0 Created: 27/Feb/16 Updated: 16/Mar/16 Resolved: 16/Mar/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Frank Heckes (Inactive) | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | soak | ||
| Environment: |
lola |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Error happens during soak testing of build '20160224' (b2_8 RC2) (see: Sequence of events:
The error reads as: <0>LustreError: 5003:0:(ldlm_lock.c:810:ldlm_lock_decref_internal_nolock()) ASSERTION( lock->l_writers > 0 ) failed: <0>LustreError: 5003:0:(ldlm_lock.c:810:ldlm_lock_decref_internal_nolock()) LBUG <4>Pid: 5003, comm: mdt02_007 <4> <4>Call Trace: <4> [<ffffffffa0748875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa0748e77>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa0a2ef0f>] ldlm_lock_decref_internal_nolock+0x17f/0x180 [ptlrpc] <4> [<ffffffffa0a3102d>] ldlm_lock_decref_internal+0x4d/0xa80 [ptlrpc] <4> [<ffffffffa083f935>] ? class_handle2object+0x95/0x190 [obdclass] <4> [<ffffffffa0a325a0>] ldlm_lock_decref_and_cancel+0x80/0x150 [ptlrpc] <4> [<ffffffffa1164c67>] mdt_object_unlock+0xa7/0x2e0 [mdt] <4> [<ffffffffa11867ca>] mdt_reint_rename_or_migrate+0xf3a/0x2600 [mdt] <4> [<ffffffffa0ab7bdd>] ? null_alloc_rs+0xcd/0x320 [ptlrpc] <4> [<ffffffffa0876cbc>] ? upcall_cache_get_entry+0x29c/0x880 [obdclass] <4> [<ffffffffa087bbf0>] ? lu_ucred+0x20/0x30 [obdclass] <4> [<ffffffffa0a7d100>] ? lustre_pack_reply_v2+0x180/0x280 [ptlrpc] <4> [<ffffffffa117d50f>] ? ucred_set_jobid+0x5f/0x70 [mdt] <4> [<ffffffffa1187ec3>] mdt_reint_rename+0x13/0x20 [mdt] <4> [<ffffffffa118118d>] mdt_reint_rec+0x5d/0x200 [mdt] <4> [<ffffffffa116cddb>] mdt_reint_internal+0x62b/0x9f0 [mdt] <4> [<ffffffffa116d63b>] mdt_reint+0x6b/0x120 [mdt] <4> [<ffffffffa0ae0c2c>] tgt_request_handle+0x8ec/0x1440 [ptlrpc] <4> [<ffffffffa0a8dc61>] ptlrpc_main+0xd21/0x1800 [ptlrpc] <4> [<ffffffff8152a39e>] ? thread_return+0x4e/0x7d0 <4> [<ffffffffa0a8cf40>] ? ptlrpc_main+0x0/0x1800 [ptlrpc] <4> [<ffffffff8109e78e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c28a>] child_rip+0xa/0x20 <4> [<ffffffff8109e6f0>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20 <4> <0>Kernel panic - not syncing: LBUG <4>Pid: 5003, comm: mdt02_007 Tainted: P --------------- 2.6.32-504.30.3.el6_lustre.x86_64 #1 <4>Call Trace: <4> [<ffffffff81529c9c>] ? panic+0xa7/0x16f <4> [<ffffffffa0748ecb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] <4> [<ffffffffa0a2ef0f>] ? ldlm_lock_decref_internal_nolock+0x17f/0x180 [ptlrpc] <4> [<ffffffffa0a3102d>] ? ldlm_lock_decref_internal+0x4d/0xa80 [ptlrpc] <4> [<ffffffffa083f935>] ? class_handle2object+0x95/0x190 [obdclass] <4> [<ffffffffa0a325a0>] ? ldlm_lock_decref_and_cancel+0x80/0x150 [ptlrpc] <4> [<ffffffffa1164c67>] ? mdt_object_unlock+0xa7/0x2e0 [mdt] <4> [<ffffffffa11867ca>] ? mdt_reint_rename_or_migrate+0xf3a/0x2600 [mdt] <4> [<ffffffffa0ab7bdd>] ? null_alloc_rs+0xcd/0x320 [ptlrpc] <4> [<ffffffffa0876cbc>] ? upcall_cache_get_entry+0x29c/0x880 [obdclass] <4> [<ffffffffa087bbf0>] ? lu_ucred+0x20/0x30 [obdclass] <4> [<ffffffffa0a7d100>] ? lustre_pack_reply_v2+0x180/0x280 [ptlrpc] <4> [<ffffffffa117d50f>] ? ucred_set_jobid+0x5f/0x70 [mdt] <4> [<ffffffffa1187ec3>] ? mdt_reint_rename+0x13/0x20 [mdt] <4> [<ffffffffa118118d>] ? mdt_reint_rec+0x5d/0x200 [mdt] <4> [<ffffffffa116cddb>] ? mdt_reint_internal+0x62b/0x9f0 [mdt] <4> [<ffffffffa116d63b>] ? mdt_reint+0x6b/0x120 [mdt] <4> [<ffffffffa0ae0c2c>] ? tgt_request_handle+0x8ec/0x1440 [ptlrpc] <4> [<ffffffffa0a8dc61>] ? ptlrpc_main+0xd21/0x1800 [ptlrpc] <4> [<ffffffff8152a39e>] ? thread_return+0x4e/0x7d0 <4> [<ffffffffa0a8cf40>] ? ptlrpc_main+0x0/0x1800 [ptlrpc] <4> [<ffffffff8109e78e>] ? kthread+0x9e/0xc0 <4> [<ffffffff8100c28a>] ? child_rip+0xa/0x20 <4> [<ffffffff8109e6f0>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20 Attached message, console logs of MDS nodes lola-9, lola-10 and also vmcore-dmesg.txt. |
| Comments |
| Comment by Frank Heckes (Inactive) [ 27/Feb/16 ] |
|
The crash file has been saved at lhn.hpdd.intel.com:/scratch/crashdumps/lu-7825/lola-9/127.0.0.1-2016-02-27-02\:12\:58/. |
| Comment by Di Wang [ 27/Feb/16 ] |
|
Hmm, it looks like lock is not released correctly in the error handler path of mdt_reint_rename_internal(). will cook a patch. |
| Comment by Gerrit Updater [ 27/Feb/16 ] |
|
wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/18707 |
| Comment by Gerrit Updater [ 01/Mar/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18707/ |
| Comment by Joseph Gmitter (Inactive) [ 16/Mar/16 ] |
|
Landed to master and b2_8. Is present in the 2.8.0 release. |