Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.1.1
-
None
-
server: lustre-2.1.1, el62, ofed-1.5.3.1
git repo at https://github.com/jlan/lustre-nas/commits/nas-2.1.1
-
3
-
6382
Description
This is another case of OSS crashed since we upgraded servers to 2.1.1 due to LBUG.
LustreError: 21890:0:(ost_handler.c:1675:ost_prolong_lock_one()) ASSERTION(lock->l_req_mode == lock->l_granted_mode) failed
LustreError: 21890:0:(ost_handler.c:1675:ost_prolong_lock_one()) LBUG
Pid: 21890, comm: ll_ost_io_304
Call Trace:
[<ffffffffa0578855>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa0578e95>] lbug_with_loc+0x75/0xe0 [libcfs]
[<ffffffffa0583da6>] libcfs_assertion_failed+0x66/0x70 [libcfs]
[<ffffffffa0a4b3f9>] ost_prolong_lock_one+0xd9/0x110 [ost]
[<ffffffffa0a4b4b7>] ost_prolong_locks+0x87/0x280 [ost]
[<ffffffffa075b960>] ? lustre_swab_niobuf_remote+0x0/0x30 [ptlrpc]
[<ffffffffa0a4bf55>] ost_rw_hpreq_check+0x195/0x440 [ost]
The line it crashed is
LASSERT(lock->l_req_mode == lock->l_granted_mode);
in ost_prolong_lock_one():
static void ost_prolong_lock_one(struct ost_prolong_data *opd,
struct ldlm_lock *lock)
{
LASSERT(lock->l_req_mode == lock->l_granted_mode); <===== THIS LINE
LASSERT(lock->l_export == opd->opd_exp);
/* XXX: never try to grab resource lock here because we're inside
- exp_bl_list_lock; in ldlm_lockd.c to handle waiting list we take
- res lock and then exp_bl_list_lock. */
if (!(lock->l_flags & LDLM_FL_AST_SENT))
/* ignore locks not being cancelled */
return;
LDLM_DEBUG(lock,
"refreshed for req x"LPU64" ext("LPU64"->"LPU64") to %ds.\n",
opd->opd_req->rq_xid, opd->opd_extent.start,
opd->opd_extent.end, opd->opd_timeout);
/* OK. this is a possible lock the user holds doing I/O
- let's refresh eviction timer for it */
ldlm_refresh_waiting_lock(lock, opd->opd_timeout);
++opd->opd_locks;
}
Attachments
Issue Links
- duplicates
-
LU-1467 ASSERTION(lock->l_req_mode == lock->l_granted_mode)
-
- Resolved
-