[LU-7448] sleeping under spinlock somewhere in nrs/tbf code Created: 18/Nov/15  Updated: 05/Jan/16  Resolved: 05/Jan/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Oleg Drokin Assignee: Emoly Liu
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Related
is related to LU-5717 Dead lock of nrs_tbf_timer_cb Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

I got this tell tale complaint in sanityn test 77e:

.781807] BUG: spinlock wrong CPU on CPU#1, lctl/29799 (Not tainted)
.782638]  lock: ffff880055277d08, .magic: dead4ead, .owner: lctl/29799, .owner_c
.784053] Pid: 29799, comm: lctl Not tainted 2.6.32-rhe6.7-debug #1
.784828] Call Trace:
.785635]  [<ffffffff812a06fa>] ? spin_bug+0xaa/0x100
.786402]  [<ffffffff812a07c6>] ? _raw_spin_unlock+0x76/0xa0
.787184]  [<ffffffff81530afe>] ? _spin_unlock+0xe/0x10
.787989]  [<ffffffffa153faa4>] ? nrs_policy_ctl+0xd4/0x2e0 [ptlrpc]
.788835]  [<ffffffffa15414f2>] ? ptlrpc_nrs_policy_control+0xe2/0x2a0 [ptlrpc]
.790282]  [<ffffffffa1522876>] ? ptlrpc_lprocfs_nrs_seq_write+0x3e6/0x600 [ptlrp
.791762]  [<ffffffffa1522490>] ? ptlrpc_lprocfs_nrs_seq_write+0x0/0x600 [ptlrpc]
.794101]  [<ffffffff811ff945>] ? proc_reg_write+0x85/0xc0
.794872]  [<ffffffff81192f48>] ? vfs_write+0xb8/0x1a0
.795616]  [<ffffffff811943f6>] ? fget_light_pos+0x16/0x50
.796356]  [<ffffffff81193881>] ? sys_write+0x51/0xb0
.797114]  [<ffffffff815312ee>] ? do_device_not_available+0xe/0x10
.797878]  [<ffffffff8100b112>] ? system_call_fastpath+0x16/0x1b

What this means is that something in nrs_policy_ctl() slept while holding nrs->nrs_lock to the point that when it woke up, it was rescheduled on a different cpu.

I only running rhel6 at the moment so this is the best I got and I do not see any obvious culprit right away.
Probably need to rerun with rhel7 where it actually catches offenders much better.



 Comments   
Comment by Li Xi (Inactive) [ 05/Jan/16 ]

Should be the the same problem with LU-5717.

Comment by Emoly Liu [ 05/Jan/16 ]

It's a dup of LU-5717.

Generated at Sat Feb 10 02:08:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.