[LU-4776] suite sanity-scrub: ASSERTION( info->oti_r_locks == 0 ) Created: 18/Mar/14  Updated: 14/May/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-5264 ASSERTION( info->oti_r_locks == 0 ) a... Resolved
Severity: 3
Rank (Obsolete): 13141

 Description   

This issue was created by maloo for Dmitry Eremin <dmitry.eremin@intel.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/6cce6894-ae66-11e3-9c2b-52540035b04c.

17:25:05:LustreError: 18368:0:(osd_handler.c:5496:osd_key_exit()) ASSERTION( info->oti_r_locks == 0 ) failed:
17:25:05:LustreError: 18368:0:(osd_handler.c:5496:osd_key_exit()) LBUG
17:25:05:Pid: 18368, comm: mdt00_000
17:25:05:
17:25:05:Call Trace:
17:25:05: [<ffffffffa048e895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
17:25:05: [<ffffffffa048ee97>] lbug_with_loc+0x47/0xb0 [libcfs]
17:25:05: [<ffffffffa0d276cb>] osd_key_exit+0x5b/0xc0 [osd_ldiskfs]
17:25:05: [<ffffffffa05f7798>] lu_context_exit+0x58/0xa0 [obdclass]
17:25:05: [<ffffffffa0842584>] ptlrpc_main+0x904/0x1980 [ptlrpc]
17:25:05: [<ffffffffa0841c80>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
17:25:05: [<ffffffff8109aee6>] kthread+0x96/0xa0
17:25:05: [<ffffffff8100c20a>] child_rip+0xa/0x20
17:25:05: [<ffffffff8109ae50>] ? kthread+0x0/0xa0
17:25:05: [<ffffffff8100c200>] ? child_rip+0x0/0x20
17:25:05:
17:25:05:Kernel panic - not syncing: LBUG
17:25:05:Pid: 18368, comm: mdt00_000 Not tainted 2.6.32-431.5.1.el6_lustre.g1131719.x86_64 #1
17:25:05:Call Trace:
17:25:05: [<ffffffff81527983>] ? panic+0xa7/0x16f
17:25:05: [<ffffffffa048eeeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
17:25:05: [<ffffffffa0d276cb>] ? osd_key_exit+0x5b/0xc0 [osd_ldiskfs]
17:25:05: [<ffffffffa05f7798>] ? lu_context_exit+0x58/0xa0 [obdclass]
17:25:05: [<ffffffffa0842584>] ? ptlrpc_main+0x904/0x1980 [ptlrpc]
17:25:05: [<ffffffffa0841c80>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
17:25:05: [<ffffffff8109aee6>] ? kthread+0x96/0xa0
17:25:05: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
17:25:05: [<ffffffff8109ae50>] ? kthread+0x0/0xa0
17:25:05: [<ffffffff8100c200>] ? child_rip+0x0/0x20



 Comments   
Comment by Jodi Levi (Inactive) [ 18/Mar/14 ]

Fan Yong and Di,
Could you have a look and comment on this one?
Thank you!

Comment by nasf (Inactive) [ 19/Mar/14 ]

Firstly, the ASSERT() indicates that someone called dt_read_lock() but missed to call dt_read_unlock().

Secondly, the ASSERT() happened inside ptlrpcd thread stack, generally, the ptlrpcd thread should not call dt_

{read,write}

_lock() to avoid blocked.

Only with this log, it is not easy to locate where the issue is. Either more logs or read related code and check each dt_read_lock() one by one.

Comment by Nathaniel Clark [ 02/Jun/14 ]

Another instance review-dne-part-1 conf-sanity/22 on master:
https://maloo.whamcloud.com/test_sets/96808c42-e838-11e3-9bed-52540035b04c

Comment by Jian Yu [ 20/Oct/14 ]

The failure occurred while testing patch http://review.whamcloud.com/11213 on master branch with DNE configuration:
https://testing.hpdd.intel.com/test_sets/fded45b8-5882-11e4-b081-5254006e85c2

Comment by Yang Sheng [ 14/May/20 ]

Another instance:
https://testing.whamcloud.com/test_sessions/a775bc96-19e5-4cfa-a841-4acd8d9d6b6d

Generated at Sat Feb 10 01:45:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.