Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5043

ptlrpcd threads only run on one CPU

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • None
    • None
    • 3
    • 13936

    Description

      We are seeing a problem with our Lustre clients on our BG/Q I/O Nodes (ION). It appears that all of the ptlrpcd are stuck running on "CPU" number zero. This is especially problematic for this CPU because all of the hardware interrupts for the networking card are wired only to CPU 0. This results in a client that is unresponsive under write loads (and maybe others) from the BG/Q compute nodes.

      Some current details:

      • PPC64
      • 17 cores with 4-way threading, so Linux sees 68 CPUs.
      • Hardware interrupts routed to CPU0 only
      • CONFIG_NUMA is not set
      • Lustre 2.4.0-28chaos, which contains the "LU-4509 ptlrpc: re-enqueue ptlrpcd worker" patch (see github.com/chaos/lustre)

      When I watch CPU usage on a node using top, I see 100% CPU usage on CPU 0, and very little happening on the rest of the CPUs. Sometimes the flush-lustre thread uses a fair bit of CPU time CPU0 as well.

      100+ sysiod threads (the user-space process that handles forwarded IO from the compute nodes) are all running, but using very little CPU time in aggregate. At least as far as I can tell. I have a feeling that I'm not seeing everything in top for some reason. Maybe I'm wrong.

      I hacked the ptlrpcd() function to set the allowed CPUs according to my own mask, which is cpu_active_mask but with CPU 0 removed using cpumask_clear_cpu(). Sure enough, now CPU 1 is where 100% is spent insead of CPU 0. I understand why CPU 0 is avoided, but not why CPU 1 is the only CPU used out of all that are available.

      The cpu mask handling code in Lustre is none too easy to follow... Is there some setting that Lustre is using that would bind it to a single CPU? I can't see it yet...

      Attachments

        Activity

          People

            wc-triage WC Triage
            morrone Christopher Morrone (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: