Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11454

Allow switching off CPT binding for PTLRPC threads

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      It is not always advantageous to bind the various server service threads to specific CPTs.  If all or most traffic is coming in from one node (likely a router), then activity on the server ends up limited to the CPT associated with that router.  This is advantageous if NUMA latencies are relatively high, but can be disadvantageous if CPTs are small and latencies are low.

      Specifically, the work is limited to the CPUs in the CPT, which means that some workloads can end up needing more CPU, but are unable to get it.

      In essence, the default behavior of strict binding is fine but is not always preferable.  So, add an option to disable this strict binding.

      Attachments

        Issue Links

          Activity

            [LU-11454] Allow switching off CPT binding for PTLRPC threads
            pjones Peter Jones added a comment -

            Landed for 2.12

            pjones Peter Jones added a comment - Landed for 2.12

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33262/
            Subject: LU-11454 ptlrpc: Make CPU binding switchable
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 3eb7a1dfc3e7401ebcc45ccb116ed673607fd27f

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33262/ Subject: LU-11454 ptlrpc: Make CPU binding switchable Project: fs/lustre-release Branch: master Current Patch Set: Commit: 3eb7a1dfc3e7401ebcc45ccb116ed673607fd27f

            I saw huge IOPS drops if oss_cpu_bind and oss_create_cpu_bind are disabled conjunctions with oss_max_threads, oss_num_threads, oss_num_create_threads. Something like this.

            options libcfs cpu_npartitions=16 cpu_pattern=""
            options ost oss_max_threads=128 oss_num_threads=128 oss_num_create_threads=128 oss_cpu_bind=0 oss_create_cpu_bind=0
            

            Here is test results on my test box.

            params IOPS (4K ranodm read)
            oss_max_threads=128,
            oss_num_threads=128,
            oss_num_create_threads=128
            833K
            oss_max_threads=128,
            oss_num_threads=128,
            oss_num_create_threads=128,
            oss_cpu_bind=0,
            oss_create_cpu_bind=0
            621K

            The parameters are still tunable and disabled by default, so, it's no impacts at all. Howerver, any particular reasons this performance drop? or did we expect? btw, OSS has single numa domain.

            sihara Shuichi Ihara added a comment - I saw huge IOPS drops if oss_cpu_bind and oss_create_cpu_bind are disabled conjunctions with oss_max_threads, oss_num_threads, oss_num_create_threads. Something like this. options libcfs cpu_npartitions=16 cpu_pattern="" options ost oss_max_threads=128 oss_num_threads=128 oss_num_create_threads=128 oss_cpu_bind=0 oss_create_cpu_bind=0 Here is test results on my test box. params IOPS (4K ranodm read) oss_max_threads=128, oss_num_threads=128, oss_num_create_threads=128 833K oss_max_threads=128, oss_num_threads=128, oss_num_create_threads=128, oss_cpu_bind=0, oss_create_cpu_bind=0 621K The parameters are still tunable and disabled by default, so, it's no impacts at all. Howerver, any particular reasons this performance drop? or did we expect? btw, OSS has single numa domain.

            There are definitely reason to have CPT bindings on clients as well - avoiding Lustre contention/jitter with application threads comes to mind.

            I've added Ihara to the ticket, as he is better positioned to report if this change improves performance (due to more efficient CPU usage), or hurts it (due to cross-core contention).

            adilger Andreas Dilger added a comment - There are definitely reason to have CPT bindings on clients as well - avoiding Lustre contention/jitter with application threads comes to mind. I've added Ihara to the ticket, as he is better positioned to report if this change improves performance (due to more efficient CPU usage), or hurts it (due to cross-core contention).

            Patrick, how about this approach:

            If the configuration is binding an NI to one or more CPTs, we continue to keep the worker threads bound to those CPTs.  This should continue to support the SGI big iron.  If there is no binding of the NIs, then we are free to use all cores and can have a single pool of worker threads which don't have any CPT bindings.

            Locks and other "limited" resources continue to be CPT based to alleviate contention.  If we properly round-robin over the worker threads, we should get a good distribution of work over the CPT resources.

            Thoughts?

            dougo Doug Oucharek (Inactive) added a comment - Patrick, how about this approach: If the configuration is binding an NI to one or more CPTs, we continue to keep the worker threads bound to those CPTs.  This should continue to support the SGI big iron.  If there is no binding of the NIs, then we are free to use all cores and can have a single pool of worker threads which don't have any CPT bindings. Locks and other "limited" resources continue to be CPT based to alleviate contention.  If we properly round-robin over the worker threads, we should get a good distribution of work over the CPT resources. Thoughts?

            By the way, I'd like to sound people out - Olaf and Andreas in particular - on what we think about changing the default to "CPT aware but no binding", at least for the MDT. I think it is going to be better for almost all real world server configs. (Server, not client)

            The "bind to just this CPT" behavior seems like something designed for high NUMA distances/latencies, like seen on the SGI/HPE big iron - but my understanding is they're not used as servers, just clients. Servers are usually much smaller systems.

            We've seen significant performance improvements in routed configs (where there is effectively a router-CPT binding) by disabling the binding of worker threads, and no performance loss in other configs (even a slight gain). We still want the CPT awareness so there are multiple queues for sleeping & getting work for the ptlrpc threads, we just don't want to limit the CPUs they can use.

            The patch today does not change the default behavior - but I think we should consider it. Currently we've only got solid numbers for the MDT - We don't see any major performance problems on our OSTs related to this, so I'm not (yet) suggesting changing the default there.

            If there's interest in changing the default, I can work to get those numbers.

            paf Patrick Farrell (Inactive) added a comment - By the way, I'd like to sound people out - Olaf and Andreas in particular - on what we think about changing the default to "CPT aware but no binding", at least for the MDT. I think it is going to be better for almost all real world server configs. (Server, not client) The "bind to just this CPT" behavior seems like something designed for high NUMA distances/latencies, like seen on the SGI/HPE big iron - but my understanding is they're not used as servers, just clients. Servers are usually much smaller systems. We've seen significant performance improvements in routed configs (where there is effectively a router-CPT binding) by disabling the binding of worker threads, and no performance loss in other configs (even a slight gain). We still want the CPT awareness so there are multiple queues for sleeping & getting work for the ptlrpc threads, we just don't want to limit the CPUs they can use. The patch today does not change the default behavior - but I think we should consider it. Currently we've only got solid numbers for the MDT - We don't see any major performance problems on our OSTs related to this, so I'm not (yet) suggesting changing the default there. If there's interest in changing the default, I can work to get those numbers.

            Patrick Farrell (paf@cray.com) uploaded a new patch: https://review.whamcloud.com/33262
            Subject: LU-11454 ptlrpc: Make binding switchable
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: fea65e1c476f634196cf87b3597f2accda7f69a6

            gerrit Gerrit Updater added a comment - Patrick Farrell (paf@cray.com) uploaded a new patch: https://review.whamcloud.com/33262 Subject: LU-11454 ptlrpc: Make binding switchable Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: fea65e1c476f634196cf87b3597f2accda7f69a6

            People

              paf Patrick Farrell (Inactive)
              paf Patrick Farrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: