Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
None
-
None
-
3
-
13936
Description
We are seeing a problem with our Lustre clients on our BG/Q I/O Nodes (ION). It appears that all of the ptlrpcd are stuck running on "CPU" number zero. This is especially problematic for this CPU because all of the hardware interrupts for the networking card are wired only to CPU 0. This results in a client that is unresponsive under write loads (and maybe others) from the BG/Q compute nodes.
Some current details:
- PPC64
- 17 cores with 4-way threading, so Linux sees 68 CPUs.
- Hardware interrupts routed to CPU0 only
- CONFIG_NUMA is not set
- Lustre 2.4.0-28chaos, which contains the "
LU-4509ptlrpc: re-enqueue ptlrpcd worker" patch (see github.com/chaos/lustre)
When I watch CPU usage on a node using top, I see 100% CPU usage on CPU 0, and very little happening on the rest of the CPUs. Sometimes the flush-lustre thread uses a fair bit of CPU time CPU0 as well.
100+ sysiod threads (the user-space process that handles forwarded IO from the compute nodes) are all running, but using very little CPU time in aggregate. At least as far as I can tell. I have a feeling that I'm not seeing everything in top for some reason. Maybe I'm wrong.
I hacked the ptlrpcd() function to set the allowed CPUs according to my own mask, which is cpu_active_mask but with CPU 0 removed using cpumask_clear_cpu(). Sure enough, now CPU 1 is where 100% is spent insead of CPU 0. I understand why CPU 0 is avoided, but not why CPU 1 is the only CPU used out of all that are available.
The cpu mask handling code in Lustre is none too easy to follow... Is there some setting that Lustre is using that would bind it to a single CPU? I can't see it yet...
thanks Chris.