[LU-5043] ptlrpcd threads only run on one CPU Created: 10/May/14  Updated: 12/May/14  Resolved: 12/May/14

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Christopher Morrone Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 13936

 Description   

We are seeing a problem with our Lustre clients on our BG/Q I/O Nodes (ION). It appears that all of the ptlrpcd are stuck running on "CPU" number zero. This is especially problematic for this CPU because all of the hardware interrupts for the networking card are wired only to CPU 0. This results in a client that is unresponsive under write loads (and maybe others) from the BG/Q compute nodes.

Some current details:

  • PPC64
  • 17 cores with 4-way threading, so Linux sees 68 CPUs.
  • Hardware interrupts routed to CPU0 only
  • CONFIG_NUMA is not set
  • Lustre 2.4.0-28chaos, which contains the "LU-4509 ptlrpc: re-enqueue ptlrpcd worker" patch (see github.com/chaos/lustre)

When I watch CPU usage on a node using top, I see 100% CPU usage on CPU 0, and very little happening on the rest of the CPUs. Sometimes the flush-lustre thread uses a fair bit of CPU time CPU0 as well.

100+ sysiod threads (the user-space process that handles forwarded IO from the compute nodes) are all running, but using very little CPU time in aggregate. At least as far as I can tell. I have a feeling that I'm not seeing everything in top for some reason. Maybe I'm wrong.

I hacked the ptlrpcd() function to set the allowed CPUs according to my own mask, which is cpu_active_mask but with CPU 0 removed using cpumask_clear_cpu(). Sure enough, now CPU 1 is where 100% is spent insead of CPU 0. I understand why CPU 0 is avoided, but not why CPU 1 is the only CPU used out of all that are available.

The cpu mask handling code in Lustre is none too easy to follow... Is there some setting that Lustre is using that would bind it to a single CPU? I can't see it yet...



 Comments   
Comment by Christopher Morrone [ 10/May/14 ]

I am not sure that I understand the logic here in ptlrpcd_select_pc():

        case PDL_POLICY_SAME:
                idx = cfs_smp_processor_id() % ptlrpcds->pd_nthreads;
                break;

What makes us think that this will result in a ptlrpcd thread that runs on the same CPU as the caller?

Comment by Christopher Morrone [ 10/May/14 ]

OK, you guys (and Lustre in general) are off the hook for this issue, I think. This was either self- or vendor-inflicted months ago.

It looks like someone got the bright idea to set "isolcpu=0,1,2,3" on the kernel command line. This "feature" isolates those cpus from normal scheduler activities. It does not, apparently, stop new kernel threads from starting up on those isolated CPUs. So many of the Lustre kernel threads got started on CPU 0 and because that CPU is isolated, they never get moved anywhere else.

There is a possibility that Lustre supposed to do something to play nice with that option...but if I find that out I'll just start a new ticket.

Closing ticket.

Comment by Peter Jones [ 10/May/14 ]

thanks Chris.

Generated at Sat Feb 10 01:48:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.