Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14976

Changing tbf policy induces high CPU load

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • None
    • None
    • Centos 7 VMs on Lustre 2.14
    • 3
    • 9223372036854775807

    Description

      Reproducer:

      1. Activate "tbf gid" policy:
        lctl set_param mds.MDS.mdt.nrs_policies="tbf gid"
      2. Register a rule for a group (with a small rate value):
        lctl set_param mds.MDS.mdt.nrs_tbf_rule="start eaujames gid={1000} rate=10"
      3. Start doing md oprations with the limited gid on the mdt (multithreaded file creations/deletions)
      4. When a message is queued inside the policy, changes the policy to tbf:
        lctl set_param mds.MDS.mdt.nrs_policies="tbf"
      5. Stop md operations. Lustre consumes 100% on CPU partition where the message is queued:
        For our production filesystem, on MDT0001 all cpt were impacted (>100 rpc in queue, load ~300) and on MDT0000 one cpt was impacted (1 rpc in queue, load ~90).
      mds.MDS.mdt.nrs_policies=
      regular_requests:
        - name: fifo
          state: started
          fallback: yes
          queued: 0
          active: 0  
        
        - name: crrn
          state: stopped
          fallback: no
          queued: 0
          active: 0
        
        - name: tbf
          state: started
          fallback: no
          queued: 1
          active: 0
        
        - name: delay
          state: stopped
          fallback: no
          queued: 0
          active: 0
      

      When we try to change the policy to fifo, the proccess is block to "stopping" state:

      mds.MDS.mdt.nrs_policies=
      regular_requests:
        - name: fifo
          state: started
          fallback: yes 
          queued: 0    
          active: 0   
      
        - name: crrn
          state: stopped
          fallback: no
          queued: 0    
          active: 0  
      
        - name: tbf 
          state: stopping
          fallback: no
          queued: 1    
          active: 0   
        
        - name: delay
          state: stopped
          fallback: no
          queued: 0
          active: 0
      

      Analyse:

      It seems that when we change tbf policy ("tbf gid" -> "tbf"), old rpc queued inside "tbf gid" became inaccessible to ptlrpc threads.

      ptlrpc_wait_event wake up when an rpc is availabled to enqueue. But in that case ptlrpc thread is unable to enqueue the request, so it wake up all the time (causing the cpu load).

      00000100:00000001:1.0:1630509978.890060:0:4749:0:(service.c:2029:ptlrpc_server_request_get()) Process leaving (rc=0 : 0 : 0)
      00000100:00000001:0.0:1630509978.890060:0:5580:0:(service.c:2008:ptlrpc_server_request_get()) Process entered
      00000100:00000001:2.0:1630509978.890061:0:5653:0:(service.c:2029:ptlrpc_server_request_get()) Process leaving (rc=0 : 0 : 0)
      00000100:00000001:2.0:1630509978.890061:0:5653:0:(service.c:2248:ptlrpc_server_handle_request()) Process leaving (rc=0 : 0 : 0)
      00000100:00000001:1.0:1630509978.890061:0:4749:0:(service.c:2248:ptlrpc_server_handle_request()) Process leaving (rc=0 : 0 : 0)
      00000100:00000001:0.0:1630509978.890061:0:5580:0:(service.c:2029:ptlrpc_server_request_get()) Process leaving (rc=0 : 0 : 0)
      00000100:00000001:0.0:1630509978.890061:0:5580:0:(service.c:2248:ptlrpc_server_handle_request()) Process leaving (rc=0 : 0 : 0)
      00000100:00000001:1.0:1630509978.890062:0:4749:0:(service.c:2244:ptlrpc_server_handle_request()) Process entered
      00000100:00000001:1.0:1630509978.890062:0:4749:0:(service.c:2008:ptlrpc_server_request_get()) Process entered
      00000100:00000001:2.0:1630509978.890063:0:5653:0:(service.c:2244:ptlrpc_server_handle_request()) Process entered
      00000100:00000001:2.0:1630509978.890063:0:5653:0:(service.c:2008:ptlrpc_server_request_get()) Process entered
      00000100:00000001:1.0:1630509978.890063:0:4749:0:(service.c:2029:ptlrpc_server_request_get()) Process leaving (rc=0 : 0 : 0)
      00000100:00000001:0.0:1630509978.890063:0:5580:0:(service.c:2244:ptlrpc_server_handle_request()) Process entered
      00000100:00000001:2.0:1630509978.890064:0:5653:0:(service.c:2029:ptlrpc_server_request_get()) Process leaving (rc=0 : 0 : 0)
      

      On my VM for one mdt thread ptlrpc_server_handle_request() is called with 300kHz frequency (doing nothing).

      Attachments

        Issue Links

          Activity

            [LU-14976] Changing tbf policy induces high CPU load

            "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51119
            Subject: LU-14976 nrs: change nrs policies at run time
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: bdb237e26e903d2eb9d7fb1697965c7234a431f5

            gerrit Gerrit Updater added a comment - "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51119 Subject: LU-14976 nrs: change nrs policies at run time Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: bdb237e26e903d2eb9d7fb1697965c7234a431f5

            "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51118
            Subject: LU-14976 nrs: change nrs policies at run time
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: 8292d1a744b996a43acb8d1f34210d8f9b6c7581

            gerrit Gerrit Updater added a comment - "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51118 Subject: LU-14976 nrs: change nrs policies at run time Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: 8292d1a744b996a43acb8d1f34210d8f9b6c7581
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48523/
            Subject: LU-14976 nrs: change nrs policies at run time
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: c098c09564a125dd44ffe0c135cd1cb6359229e7

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48523/ Subject: LU-14976 nrs: change nrs policies at run time Project: fs/lustre-release Branch: master Current Patch Set: Commit: c098c09564a125dd44ffe0c135cd1cb6359229e7

            "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/48523
            Subject: LU-14976 nrs: change nrs policies at run time
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 37530fe5fc53a80519e4334a3a295e690f03afbc

            gerrit Gerrit Updater added a comment - "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/48523 Subject: LU-14976 nrs: change nrs policies at run time Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 37530fe5fc53a80519e4334a3a295e690f03afbc

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44817/
            Subject: LU-14976 ptlrpc: align function names with param names
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 7fe49f1e7cf0586da0f389188325014a8a13b849

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44817/ Subject: LU-14976 ptlrpc: align function names with param names Project: fs/lustre-release Branch: master Current Patch Set: Commit: 7fe49f1e7cf0586da0f389188325014a8a13b849

            This issue occurred on a filesystem in production.

            Here the context:
            A user was filling the changelog list 18k open/s  (changelog usage jump from 30% to 70% in one night). So the admin wanted to limit this user to avoid MDT crash.
            The activated NRS policy was "tbf gid", the admin changed the tbf policy to "tbf" to limit the user by uid.

            eaujames Etienne Aujames added a comment - This issue occurred on a filesystem in production. Here the context: A user was filling the changelog list 18k open/s  (changelog usage jump from 30% to 70% in one night). So the admin wanted to limit this user to avoid MDT crash. The activated NRS policy was "tbf gid", the admin changed the tbf policy to "tbf" to limit the user by uid.

            My above patch does not make any attempt to fix this problem, just cleans up code for the "nrs_policies" parameter, and other parameters in this file, so that this code is easier to find.

            adilger Andreas Dilger added a comment - My above patch does not make any attempt to fix this problem, just cleans up code for the " nrs_policies " parameter, and other parameters in this file, so that this code is easier to find.

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44817
            Subject: LU-14976 ptlrpc: align function names with param names
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: d2504288bf6b798666f3c44a1bb685e455ea5fa0

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44817 Subject: LU-14976 ptlrpc: align function names with param names Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d2504288bf6b798666f3c44a1bb685e455ea5fa0

            My guess is that the RPCs are only connected to the old NRS type, and then "fetching" RPCs to process with a new NRS type returns nothing.  What needs to happen in the very rare case that the NRS type is changed at runtime is either:

            1. check the old NRS type to fetch any previous RPCs before fetching RPCs from the new NRS type
            2. move all RPCs from the old NRS type and add them to the new NRS type

            My preference would be #2, because this only adds overhead on the rare case when the NRS type is changed, rather than adding overhead for fetching every RPC from the queue.  However, looking at ptlrpc_server_request_get->ptlrpc_nrs_req_get_nolock0() it would appear that #1 is supposed to be handling this case properly:

                    /**
                     * Always try to drain requests from all NRS polices even if they are
                     * inactive, because the user can change policy status at runtime.
                     */
                    list_for_each_entry(policy, &nrs->nrs_policy_queued, pol_list_queued) {
                           nrq = nrs_request_get(policy, peek, force);
            

            but that doesn't seem to be working properly (only nrs_tbf_req_get() appears in the flame graph). It may be that the "nrs gid" queue internal to the TBF policy itself is not making those RPCs available?

            eaujames, to step back a minute, what is the reason to change the NRS policy type while the system is in use? Is this just something you hit during benchmarking? The NRS policy type should basically never change during the lifetime of a system.

            adilger Andreas Dilger added a comment - My guess is that the RPCs are only connected to the old NRS type, and then "fetching" RPCs to process with a new NRS type returns nothing.  What needs to happen in the  very rare case that the NRS type is changed at runtime is either: check the old NRS type to fetch any previous RPCs before fetching RPCs from the new NRS type move all RPCs from the old NRS type and add them to the new NRS type My preference would be #2, because this only adds overhead on the rare case when the NRS type is changed, rather than adding overhead for fetching every RPC from the queue.  However, looking at ptlrpc_server_request_get->ptlrpc_nrs_req_get_nolock0() it would appear that #1 is supposed to be handling this case properly: /** * Always try to drain requests from all NRS polices even if they are * inactive, because the user can change policy status at runtime. */ list_for_each_entry(policy, &nrs->nrs_policy_queued, pol_list_queued) { nrq = nrs_request_get(policy, peek, force); but that doesn't seem to be working properly (only nrs_tbf_req_get() appears in the flame graph). It may be that the " nrs gid " queue internal to the TBF policy itself is not making those RPCs available? eaujames , to step back a minute, what is the reason to change the NRS policy type while the system is in use? Is this just something you hit during benchmarking? The NRS policy type should basically never change during the lifetime of a system.

            People

              eaujames Etienne Aujames
              eaujames Etienne Aujames
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: