[LU-8307] Add cond_resched between work items in ldlm_bl_thread_main Created: 20/Jun/16  Updated: 09/May/17  Resolved: 09/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Minor
Reporter: Patrick Farrell (Inactive) Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

When clearing all of the ldlm LRUs (as Cray does at the end of
a job), a ldlm_bl_work_item is generated for each namespace
and then they are placed on a list for the ldlm_bl threads to
iterate over.

If the number of namespaces greatly exceeds the number of
ldlm_bl threads, a given thread will iterate over many
namespaces without sleeping looking for work. This can go
on for an extremely long time and result in an RCU stall.

This patch adds a cond_resched() between completing one
work item and looking for the next. This is a fairly cheap
operation, as it will only schedule if there is an
interrupt waiting, and it will not be called too much -
Even the largest file systems have < 100 namespaces per
ldlm_bl_thread currently.



 Comments   
Comment by Gerrit Updater [ 20/Jun/16 ]

Patrick Farrell (paf@cray.com) uploaded a new patch: http://review.whamcloud.com/20888
Subject: LU-8307 ldlm: cond_resched in ldlm_bl_thread_main
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6e0acb78abd0a9fd29f4a46e9071e05f44aba823

Comment by Gerrit Updater [ 09/May/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/20888/
Subject: LU-8307 ldlm: cond_resched in ldlm_bl_thread_main
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c156613b29be6fcee13d0df7008f0cd7847a5263

Comment by Peter Jones [ 09/May/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:16:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.