Details

    • Technical task
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • Lustre 2.14.0
    • None
    • a server (1 x IB-EDR) and a client (2 x IB-HDR100) and MR enabled
    • 9223372036854775807

    Description

      If server has more than one CPT, each peer connection should be able to distributed to different CPT as a load-balancing perspective.
      An decision of CPT is based on a hash function with peer NID's address, but some cases, hash returns same value and both peers went to same CPT eventually.
      This causes a critical performance problem since number of CPU core belongs to each CPT and if both peers go to single CPT on server to handle, a half of CPU are alway busy and other half of CPU are idle.

      Here is an example.

      server# cat /sys/kernel/debug/lnet/cpu_partition_table
      0	: 0 1 2 3 4 5 6 7 8 9
      1	: 10 11 12 13 14 15 16 17 18 19
      
      server# lnetctl net show
      net:
          - net type: lo
            local NI(s):
              - nid: 0@lo
                status: up
          - net type: o2ib10
            local NI(s):
              - nid: 10.0.11.224@o2ib10
                status: up
                interfaces:
                    0: ib0
      
      client # cat /sys/kernel/debug/lnet/cpu_partition_table
      0	: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
      1	: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
      2	: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
      3	: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
      4	: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
      5	: 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
      6	: 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
      7	: 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
      
      client # lnetctl net show -v
          - net type: o2ib10
            local NI(s):
              - nid: 10.0.11.81@o2ib10
                status: up
                interfaces:
                    0: ib0
       - snip -
                lnd tunables:
                dev cpt: 0
                tcp bonding: 0
                CPT: "[0,1,2,3]"
      
              - nid: 10.4.11.71@o2ib10
                status: up
                interfaces:
                    0: ib4
      - snip -
                lnd tunables:
                dev cpt: 4
                tcp bonding: 0
                CPT: "[4,5,6,7]"
      

      on client.

         PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                          
       20263 root      20   0       0      0      0 R  98.3   0.0   0:29.85 kiblnd_sd_06_01                                  
       20264 root      20   0       0      0      0 R  98.3   0.0   0:29.85 kiblnd_sd_06_02                                  
       20265 root      20   0       0      0      0 R  98.3   0.0   0:29.85 kiblnd_sd_06_03                                  
       20262 root      20   0       0      0      0 R  98.0   0.0   0:29.84 kiblnd_sd_06_00                                  
       20247 root      20   0       0      0      0 R  89.1   0.0   1:19.11 kiblnd_sd_02_01                                  
       20248 root      20   0       0      0      0 R  88.7   0.0   1:19.20 kiblnd_sd_02_02                                  
       20249 root      20   0       0      0      0 R  88.7   0.0   1:19.15 kiblnd_sd_02_03                                  
       20246 root      20   0       0      0      0 R  87.7   0.0   1:19.24 kiblnd_sd_02_00    
      

      Two CPT are busy becouse of two interfaces.

      On server

        PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                            
      27651 root      20   0       0      0      0 R  86.0  0.0   2:22.27 kiblnd_sd_00_00                                    
      27652 root      20   0       0      0      0 R  86.0  0.0   2:22.30 kiblnd_sd_00_01                                    
      27653 root      20   0       0      0      0 R  86.0  0.0   2:22.27 kiblnd_sd_00_02                                    
      27654 root      20   0       0      0      0 R  85.4  0.0   2:22.28 kiblnd_sd_00_03  
      

      Only an CPT is busy even for two peers are connected to server.

      Amir added an debug patch and confirmed both peers went to first CPT.

      00000800:00000200:18.0:1591055201.186835:0:20660:0:(o2iblnd.c:795:kiblnd_create_conn()) peer_ni = 10.0.11.81@o2ib10, ni = 10.0.11.224@o2ib10, cpt = 0
      00000800:00000200:18.0:1591055201.189343:0:20660:0:(o2iblnd.c:795:kiblnd_create_conn()) peer_ni = 10.4.11.81@o2ib10, ni = 10.0.11.224@o2ib10, cpt = 0
      

      The problem hash function retuns same value even client IP address chagned below, then both peers eventually go to same CPT on server if server has only single interface.

      1407418001001297 nid1 of client 64 bit representation
      1407418001263431 nid2 of client 64 bit rpresentation
      

      Attachments

        Issue Links

          Activity

            People

              ssmirnov Serguei Smirnov
              sihara Shuichi Ihara
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: