Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
When clearing all of the ldlm LRUs (as Cray does at the end of
a job), a ldlm_bl_work_item is generated for each namespace
and then they are placed on a list for the ldlm_bl threads to
iterate over.
If the number of namespaces greatly exceeds the number of
ldlm_bl threads, a given thread will iterate over many
namespaces without sleeping looking for work. This can go
on for an extremely long time and result in an RCU stall.
This patch adds a cond_resched() between completing one
work item and looking for the next. This is a fairly cheap
operation, as it will only schedule if there is an
interrupt waiting, and it will not be called too much -
Even the largest file systems have < 100 namespaces per
ldlm_bl_thread currently.