Details

    • Technical task
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • Lustre 2.14.0
    • None
    • a server (1 x IB-EDR) and a client (2 x IB-HDR100) and MR enabled
    • 9223372036854775807

    Description

      If server has more than one CPT, each peer connection should be able to distributed to different CPT as a load-balancing perspective.
      An decision of CPT is based on a hash function with peer NID's address, but some cases, hash returns same value and both peers went to same CPT eventually.
      This causes a critical performance problem since number of CPU core belongs to each CPT and if both peers go to single CPT on server to handle, a half of CPU are alway busy and other half of CPU are idle.

      Here is an example.

      server# cat /sys/kernel/debug/lnet/cpu_partition_table
      0	: 0 1 2 3 4 5 6 7 8 9
      1	: 10 11 12 13 14 15 16 17 18 19
      
      server# lnetctl net show
      net:
          - net type: lo
            local NI(s):
              - nid: 0@lo
                status: up
          - net type: o2ib10
            local NI(s):
              - nid: 10.0.11.224@o2ib10
                status: up
                interfaces:
                    0: ib0
      
      client # cat /sys/kernel/debug/lnet/cpu_partition_table
      0	: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
      1	: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
      2	: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
      3	: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
      4	: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
      5	: 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
      6	: 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
      7	: 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
      
      client # lnetctl net show -v
          - net type: o2ib10
            local NI(s):
              - nid: 10.0.11.81@o2ib10
                status: up
                interfaces:
                    0: ib0
       - snip -
                lnd tunables:
                dev cpt: 0
                tcp bonding: 0
                CPT: "[0,1,2,3]"
      
              - nid: 10.4.11.71@o2ib10
                status: up
                interfaces:
                    0: ib4
      - snip -
                lnd tunables:
                dev cpt: 4
                tcp bonding: 0
                CPT: "[4,5,6,7]"
      

      on client.

         PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                          
       20263 root      20   0       0      0      0 R  98.3   0.0   0:29.85 kiblnd_sd_06_01                                  
       20264 root      20   0       0      0      0 R  98.3   0.0   0:29.85 kiblnd_sd_06_02                                  
       20265 root      20   0       0      0      0 R  98.3   0.0   0:29.85 kiblnd_sd_06_03                                  
       20262 root      20   0       0      0      0 R  98.0   0.0   0:29.84 kiblnd_sd_06_00                                  
       20247 root      20   0       0      0      0 R  89.1   0.0   1:19.11 kiblnd_sd_02_01                                  
       20248 root      20   0       0      0      0 R  88.7   0.0   1:19.20 kiblnd_sd_02_02                                  
       20249 root      20   0       0      0      0 R  88.7   0.0   1:19.15 kiblnd_sd_02_03                                  
       20246 root      20   0       0      0      0 R  87.7   0.0   1:19.24 kiblnd_sd_02_00    
      

      Two CPT are busy becouse of two interfaces.

      On server

        PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                            
      27651 root      20   0       0      0      0 R  86.0  0.0   2:22.27 kiblnd_sd_00_00                                    
      27652 root      20   0       0      0      0 R  86.0  0.0   2:22.30 kiblnd_sd_00_01                                    
      27653 root      20   0       0      0      0 R  86.0  0.0   2:22.27 kiblnd_sd_00_02                                    
      27654 root      20   0       0      0      0 R  85.4  0.0   2:22.28 kiblnd_sd_00_03  
      

      Only an CPT is busy even for two peers are connected to server.

      Amir added an debug patch and confirmed both peers went to first CPT.

      00000800:00000200:18.0:1591055201.186835:0:20660:0:(o2iblnd.c:795:kiblnd_create_conn()) peer_ni = 10.0.11.81@o2ib10, ni = 10.0.11.224@o2ib10, cpt = 0
      00000800:00000200:18.0:1591055201.189343:0:20660:0:(o2iblnd.c:795:kiblnd_create_conn()) peer_ni = 10.4.11.81@o2ib10, ni = 10.0.11.224@o2ib10, cpt = 0
      

      The problem hash function retuns same value even client IP address chagned below, then both peers eventually go to same CPT on server if server has only single interface.

      1407418001001297 nid1 of client 64 bit representation
      1407418001263431 nid2 of client 64 bit rpresentation
      

      Attachments

        Issue Links

          Activity

            [LU-13621] LNET peer doesn't distribute well to different CPT

            "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50381
            Subject: LU-13621 lnet: utility to print cpt number
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 82a00420f7d45e68dcf57ae7979d17c1a5085b66

            gerrit Gerrit Updater added a comment - "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50381 Subject: LU-13621 lnet: utility to print cpt number Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 82a00420f7d45e68dcf57ae7979d17c1a5085b66
            pjones Peter Jones added a comment -

            Seems to be landed for 2.15

            pjones Peter Jones added a comment - Seems to be landed for 2.15

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/39113/
            Subject: LU-13621 lnet: utility to print cpt number
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: df6f17ee97ac47c949c1963ff8d57fb2d4becd06

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/39113/ Subject: LU-13621 lnet: utility to print cpt number Project: fs/lustre-release Branch: master Current Patch Set: Commit: df6f17ee97ac47c949c1963ff8d57fb2d4becd06

            Shuichi, is it true that the CPT hash function is imbalanced even if there are multiple CPTs and multiple clients connecting (e.g. 32 clients connecting to a server with 4 CPTs)? There are always going to be cases where two clients will map to a single CPT (in this case 10.4.11.71 and 10.4.11.81) no matter which mapping function is used. However, it is a much bigger problem if, say, 32 clients with sequential NIDs are not uniformly distributed across the CPTs on the server, or within 1 of an even split.

            adilger Andreas Dilger added a comment - Shuichi, is it true that the CPT hash function is imbalanced even if there are multiple CPTs and multiple clients connecting (e.g. 32 clients connecting to a server with 4 CPTs)? There are always going to be cases where two clients will map to a single CPT (in this case 10.4.11.71 and 10.4.11.81) no matter which mapping function is used. However, it is a much bigger problem if, say, 32 clients with sequential NIDs are not uniformly distributed across the CPTs on the server, or within 1 of an even split.

            I added a command to print the cpt number (or index of the cpt if the NI is bound to a set of CPTs). I think it would be useful to be able to pull this information out without having to dive into the kernel.

            Using this utility it shows that varying the first 2 octets of the IP address and the net name/number does not change the cpt value the NID is being hashed to. This is something to be aware of on existing installation. Depending on the addressing scheme the site uses, we could endup with a situation where all the NIDs are being hashed into the same CPT. This will create a problem with CPT locking and will create a problem at the LND, since we'll be picking a scheduler thread from the same CPT pool.

            ashehata Amir Shehata (Inactive) added a comment - I added a command to print the cpt number (or index of the cpt if the NI is bound to a set of CPTs). I think it would be useful to be able to pull this information out without having to dive into the kernel. Using this utility it shows that varying the first 2 octets of the IP address and the net name/number does not change the cpt value the NID is being hashed to. This is something to be aware of on existing installation. Depending on the addressing scheme the site uses, we could endup with a situation where all the NIDs are being hashed into the same CPT. This will create a problem with CPT locking and will create a problem at the LND, since we'll be picking a scheduler thread from the same CPT pool.

            Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39113
            Subject: LU-13621 lnet: utility to print cpt number
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c0ead63ee1edec2680656d2e1593cf56a637c222

            gerrit Gerrit Updater added a comment - Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39113 Subject: LU-13621 lnet: utility to print cpt number Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c0ead63ee1edec2680656d2e1593cf56a637c222

            People

              ssmirnov Serguei Smirnov
              sihara Shuichi Ihara
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: