Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17501

specify libcfs CPT cpu_pattern to exclude NUMA cores

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.17.0
    • None
    • 3
    • 9223372036854775807

    Description

      Some (non Lustre) filesystems consume 100% of the CPU cycles on one or more cores busy-waiting by polling for event completion and scheduled at a high priority (not sure if "real time" or not). They also apparently configure the CPU scheduler to deny scheduling of other processes on some CPU cores.

      The ptlrpcd threads handle RPC sending and receiving and are normally distributed evenly across cores and bound to each NUMA domain to minimize cross-CPU memory traffic when there are well-distributed application workloads running on a system (e.g. multi-threaded computational workload) that allocate and dirty data pages on all of the NUMA domains evenly. In some cases, where the number of cores is larger than the number of active application threads, then it is advantageous for ptlrpcd threads on other CPU cores to take over the RPC processing in order to offload CPU-intensive tasks like checksums, compression, and encryption to cores that are otherwise under utilized.

      At no time do ptlrpcd (or other Lustre service) threads exclusively utilize or busy wait on CPU cores or prevent application threads from using them when they are not actively processing requests on behalf of the application.

      However, if ptlrpcd threads are started on a NUMA core and then try to process RPCs, they can become stalled when threads on that NUMA domain could not be scheduled for lengthy periods of time. This causes intermittent laggy RPC handling when those threads are processing a time-sensitive RPC.

      To work around this issue, we used lscpu to determine the NUMA configuration of the CPUs installed and then created a CPT configuration that avoided scheduling the ptlrpcd threads on cores that had been taken over by the other filesystem:

      # lscpu | grep NUMA
      NUMA:
       NUMA node(s):     2
       NUMA node0 CPU(s):   0-63,128-191
       NUMA node1 CPU(s):   64-127,192-255
      

      In the /etc/modprobe.d/lustre.conf file the following lines were added to create the Lustre CPU Partition Table to the last 8 cores (of 64) in each of 4 NUMA nodes, to avoid the other filesystem that was heating up the first two cores on each of the NUMA nodes:

      options libcfs cpu_npartitions=4
      options libcfs cpu_pattern="0[56-63] 1[120-127] 2[184-191] 3[248-255]"
      options ptlrpcd max_ptlrpcds=64
      

      That allows those threads to run on 32 different cores, with a maximum of 16 threads running across the 8 cores in each NUMA node.

      However, this is only a workaround solution, as specifying the cpu_pattern and cpu_npartitions is relatively complex and CPU-specific, and likely needs to be different for different systems within the same cluster. It would be better to have more flexible mechanisms to avoid this issue.

      One option is to add an exclude pattern option to libcfs which avoids the specified cores when configuring the CPT map, something like the following to exclude the specified 2 cores in each of two NUMA nodes:

      options libcfs cpu_pattern="X[0-1] X[64-65]"
      

      That allows a relatively simple (and mostly universal) option to avoid e.g. core0 and core1 on all machines, without having to know the full NUMA configuration details of each one. To exclude cores on each NUMA node, a syntax like the following could be used:

      options libcfs cpu_pattern="N X[0-1]"
      

      which would mean "exclude all of the cores in NUMA node0 and node1", to be aligned with the "N 0[0-1]" definition, which means "include all of the cores in NUMA node0 and node1 into CPT0".

      To exclude specific cores in each NUMA node, an option like the following could be used:

      options libcfs cpu_pattern="N C[0-1]"
      

      to exclude the first eight cores on each NUMA domain. The meaning of "X" and "C" would be identical if "N" is not specified. Possibly it makes sense to also allow "N C[-2,-1]" to allow excluding the last two cores on each NUMA node, in case that is needed at some point?

      Having an exclude list for cores would also be an easy way to reserve CPU cores for userspace threads running on server nodes (e.g. HA (Corosync/Pacemaker), monitoring, logging, sshd, etc.).

      A further improvement would be to dynamically detect when the CPU scheduler has been configured to avoid scheduling processes on a particular core, and/or dynamically detect when ptlrpcd is unable to be scheduled on a core and avoid using it entirely (probably with a console message to that effect), similar to CPU hot-unplug. Dynamic exclusion/load detection is more complex to implement, but would avoid the need to statically configure nodes at all, and work around the breakage that is introduced by other filesystems.

      Attachments

        Issue Links

          Activity

            [LU-17501] specify libcfs CPT cpu_pattern to exclude NUMA cores

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57659/
            Subject: LU-17501 libcfs: fix ncpt check for cpt patterns
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 5504873b36ca11a8c4f8c7e3a0e9128c68db866d

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57659/ Subject: LU-17501 libcfs: fix ncpt check for cpt patterns Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5504873b36ca11a8c4f8c7e3a0e9128c68db866d

            "Frederick Dilger <fdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57659
            Subject: LU-17501 libcfs: fix ncpt check for cpt patterns
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 117dbc0f3b5bc86d8d14f2ecedede538e2a85d24

            gerrit Gerrit Updater added a comment - "Frederick Dilger <fdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57659 Subject: LU-17501 libcfs: fix ncpt check for cpt patterns Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 117dbc0f3b5bc86d8d14f2ecedede538e2a85d24
            pjones Peter Jones added a comment -

            Merged for 2.17

            pjones Peter Jones added a comment - Merged for 2.17

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56617/
            Subject: LU-17501 libcfs: adding X for cpu pattern
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: c3c36178914e4cfa96fd9b15051653a1ecec8845

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56617/ Subject: LU-17501 libcfs: adding X for cpu pattern Project: fs/lustre-release Branch: master Current Patch Set: Commit: c3c36178914e4cfa96fd9b15051653a1ecec8845

            bzzz I think the test script is fixed in the next patch https://review.whamcloud.com/56617 ("LU-17501 libcfs: adding X for cpu pattern").

            adilger Andreas Dilger added a comment - bzzz I think the test script is fixed in the next patch https://review.whamcloud.com/56617 (" LU-17501 libcfs: adding X for cpu pattern ").

            On a local setup I hit this:

            == conf-sanity test 200c: set CPU pattern using NUMA node layout ========================================================== 14:10:00 (1733753400)
            Stopping clients: tmp.rg6dEN1avH /mnt/lustre (opts:)
            Stopping clients: tmp.rg6dEN1avH /mnt/lustre2 (opts:)
            LNET unconfigure error 22: Invalid argument
            Loading modules from /mnt/build/lustre/tests/..
            detected 2 online CPUs by sysfs
            MODOPTS_LIBCFS=cpu_pattern="N"
            libcfs will create CPU partition based on online CPUs
            ../libcfs/libcfs/libcfs options: 'cpu_pattern="N"'
            ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1'
            gss/krb5 is not supported
            quota/lquota options: 'hash_lqs_cur_bits=3'
            /sys/module/libcfs/parameters/cpu_npartitions:0
            /sys/module/libcfs/parameters/cpu_pattern:N
            0	: 0 1
             conf-sanity test_200c: @@@@@@ FAIL: CPU partitions not , found: 1 
              Trace dump:
              = ./../tests/test-framework.sh:7229:error()
              = conf-sanity.sh:12130:test_200c()
              = ./../tests/test-framework.sh:7602:run_one()
              = ./../tests/test-framework.sh:7665:run_one_logged()
              = ./../tests/test-framework.sh:7483:run_test()
              = conf-sanity.sh:12152:main()
            
            bzzz Alex Zhuravlev added a comment - On a local setup I hit this: == conf-sanity test 200c: set CPU pattern using NUMA node layout ========================================================== 14:10:00 (1733753400) Stopping clients: tmp.rg6dEN1avH /mnt/lustre (opts:) Stopping clients: tmp.rg6dEN1avH /mnt/lustre2 (opts:) LNET unconfigure error 22: Invalid argument Loading modules from /mnt/build/lustre/tests/.. detected 2 online CPUs by sysfs MODOPTS_LIBCFS=cpu_pattern= "N" libcfs will create CPU partition based on online CPUs ../libcfs/libcfs/libcfs options: 'cpu_pattern= "N" ' ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' gss/krb5 is not supported quota/lquota options: 'hash_lqs_cur_bits=3' /sys/module/libcfs/parameters/cpu_npartitions:0 /sys/module/libcfs/parameters/cpu_pattern:N 0 : 0 1 conf-sanity test_200c: @@@@@@ FAIL: CPU partitions not , found: 1 Trace dump: = ./../tests/test-framework.sh:7229:error() = conf-sanity.sh:12130:test_200c() = ./../tests/test-framework.sh:7602:run_one() = ./../tests/test-framework.sh:7665:run_one_logged() = ./../tests/test-framework.sh:7483:run_test() = conf-sanity.sh:12152:main()

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56347/
            Subject: LU-17501 libcfs: fix CPT NUMA core exclusion handling
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 660787c66fa80182a55c0f89895880a884fe0ba5

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56347/ Subject: LU-17501 libcfs: fix CPT NUMA core exclusion handling Project: fs/lustre-release Branch: master Current Patch Set: Commit: 660787c66fa80182a55c0f89895880a884fe0ba5

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55544/
            Subject: LU-17501 libcfs: allow CPT exclude list for cores
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 111f5836ecbfef0e43c0d739199f5b6ddfb2464c

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55544/ Subject: LU-17501 libcfs: allow CPT exclude list for cores Project: fs/lustre-release Branch: master Current Patch Set: Commit: 111f5836ecbfef0e43c0d739199f5b6ddfb2464c

            "Frederick Dilger <fdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56617
            Subject: LU-17501 libcfs: adding X for cpu pattern
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 7c7f2218c1030837ab7435b728362ebb40588f5e

            gerrit Gerrit Updater added a comment - "Frederick Dilger <fdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56617 Subject: LU-17501 libcfs: adding X for cpu pattern Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 7c7f2218c1030837ab7435b728362ebb40588f5e

            "Frederick Dilger <fdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56347
            Subject: LU-17501 libcfs: fix 'C' functionality
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 4af0af20e8af4e8578d70c62305052f9ade1e40f

            gerrit Gerrit Updater added a comment - "Frederick Dilger <fdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56347 Subject: LU-17501 libcfs: fix 'C' functionality Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4af0af20e8af4e8578d70c62305052f9ade1e40f

            People

              fdilger Fred Dilger
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: