[LU-1523] ASSERTION(lock->l_req_mode == lock->l_granted_mode) failed Created: 14/Jun/12  Updated: 29/Oct/12  Resolved: 14/Jun/12

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.1
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Jay Lan (Inactive) Assignee: Yang Sheng
Resolution: Duplicate Votes: 0
Labels: None
Environment:

server: lustre-2.1.1, el62, ofed-1.5.3.1

git repo at https://github.com/jlan/lustre-nas/commits/nas-2.1.1


Issue Links:
Duplicate
duplicates LU-1467 ASSERTION(lock->l_req_mode == lock->l... Resolved
Severity: 3
Rank (Obsolete): 6382

 Description   

This is another case of OSS crashed since we upgraded servers to 2.1.1 due to LBUG.

LustreError: 21890:0:(ost_handler.c:1675:ost_prolong_lock_one()) ASSERTION(lock->l_req_mode == lock->l_granted_mode) failed
LustreError: 21890:0:(ost_handler.c:1675:ost_prolong_lock_one()) LBUG
Pid: 21890, comm: ll_ost_io_304

Call Trace:
[<ffffffffa0578855>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa0578e95>] lbug_with_loc+0x75/0xe0 [libcfs]
[<ffffffffa0583da6>] libcfs_assertion_failed+0x66/0x70 [libcfs]
[<ffffffffa0a4b3f9>] ost_prolong_lock_one+0xd9/0x110 [ost]
[<ffffffffa0a4b4b7>] ost_prolong_locks+0x87/0x280 [ost]
[<ffffffffa075b960>] ? lustre_swab_niobuf_remote+0x0/0x30 [ptlrpc]
[<ffffffffa0a4bf55>] ost_rw_hpreq_check+0x195/0x440 [ost]

The line it crashed is
LASSERT(lock->l_req_mode == lock->l_granted_mode);
in ost_prolong_lock_one():

static void ost_prolong_lock_one(struct ost_prolong_data *opd,
struct ldlm_lock *lock)
{
LASSERT(lock->l_req_mode == lock->l_granted_mode); <===== THIS LINE
LASSERT(lock->l_export == opd->opd_exp);

/* XXX: never try to grab resource lock here because we're inside

  • exp_bl_list_lock; in ldlm_lockd.c to handle waiting list we take
  • res lock and then exp_bl_list_lock. */

if (!(lock->l_flags & LDLM_FL_AST_SENT))
/* ignore locks not being cancelled */
return;

LDLM_DEBUG(lock,
"refreshed for req x"LPU64" ext("LPU64"->"LPU64") to %ds.\n",
opd->opd_req->rq_xid, opd->opd_extent.start,
opd->opd_extent.end, opd->opd_timeout);

/* OK. this is a possible lock the user holds doing I/O

  • let's refresh eviction timer for it */
    ldlm_refresh_waiting_lock(lock, opd->opd_timeout);
    ++opd->opd_locks;
    }


 Comments   
Comment by Peter Jones [ 14/Jun/12 ]

Yangsheng

Is this a duplicate of LU-1467?

Thanks

Peter

Comment by Yang Sheng [ 14/Jun/12 ]

Yes, It duplicated LU-1467.

Generated at Sat Feb 10 01:17:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.