Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
None
-
None
-
9223372036854775807
Description
It's somewhat widely seen in various logs that pacemaker complaints its thread was not scheduled for tens of seconds which is way too excessive.
Indeed MDS is pretty cpu hungry, but we need to ensure we insert enough of schedule points so that other processes get a shot at CPU too.
There are also some bandaids discussed like using numa settings to cordon off one cpu from use by Lustre, but those are just that - bandaids.
We probably can play with various debug settings that warn about this and make the timeouts lower to try and catch more of the offenders. Likely have a bunch in flock code with its double loops
Attachments
Issue Links
- is related to
-
LU-4423 Tracking of patches from upstream kernel to Lustre client
- Resolved