Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14064 socklnd needs improved interface selection and configuration
  3. LU-14676

Better hash distribution to different CPTs when LNET router is exist

    XMLWordPrintable

Details

    • Technical task
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • Lustre 2.15.0
    • None
    • 9223372036854775807

    Description

      When server receives messages from the clients, those messages are going into each CPT(CPU partition), then pass them to upper layer.
      And CPT ID distribution is decided by hashing based on client's NID.

      However, if there is lnet routers between clients and servers, hashing is based on router's NID, not client's NIDs.
      Let's assume the following configuration.
      1 x server(20 cpu cores, CPT=20 means 1 CPU core belong into each CPT)
      1 x lnet router
      10 x client

      Without LNET router
      All client's NID are active.

      nid                      refs state  last   max   rtr   min    tx   min queue
      0@lo                        1    NA    -1     0     0     0     0     0 0
      10.0.0.34@o2ib12            7    NA    -1     8     8     8     2   -20 3616
      10.0.11.226@o2ib12          1    NA    -1     8     8     8     8    -8 0  
      10.0.0.39@o2ib12            5    NA    -1     8     8     8     4   -18 2560
      10.0.0.31@o2ib12            5    NA    -1     8     8     8     4   -20 1984
      10.0.0.35@o2ib12            5    NA    -1     8     8     8     4   -18 1752
      10.0.0.36@o2ib12            6    NA    -1     8     8     8     3   -19 2544
      10.0.0.32@o2ib12            1    NA    -1     8     8     8     8   -18 0   
      10.0.0.33@o2ib12            6    NA    -1     8     8     8     3   -17 2312
      10.0.11.225@o2ib12          1    NA    -1     8     8     8     8    -8 0   
      10.0.0.40@o2ib12            6    NA    -1     8     8     8     3   -19 3056
      10.0.0.38@o2ib12            1    NA    -1     8     8     8     8   -21 0   
      10.0.11.227@o2ib12          1    NA    -1     8     8     8     8    -8 0  
      10.0.0.37@o2ib12            6    NA    -1     8     8     8     3   -18 3248
      

      And, those messages are handled by lnet threads in different CPTs because of hash(client's NID).

      top - 01:05:44 up 1 day, 16:09,  2 users,  load average: 39.70, 18.43, 7.46
      Tasks: 1442 total,  75 running, 1367 sleeping,   0 stopped,   0 zombie
      %Cpu(s):  0.0 us, 50.6 sy,  0.0 ni, 48.3 id,  1.0 wa,  0.0 hi,  0.2 si,  0.0 st
      KiB Mem : 15369398+total, 13057227+free, 18096748 used,  5024956 buff/cache
      KiB Swap: 11075580 total, 11075580 free,        0 used. 13475987+avail Mem 
      
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                          
      17303 root      20   0       0      0      0 S  17.8  0.0   0:03.16 kworker/u40:1                                    
      17601 root      20   0       0      0      0 S   9.2  0.0   0:00.47 ll_ost19_004                                     
      17642 root      20   0       0      0      0 S   7.9  0.0   0:00.33 ll_ost19_007                                     
      16187 root      20   0       0      0      0 R   7.3  0.0   0:07.56 kiblnd_sd_03_00                                  
      16192 root      20   0       0      0      0 R   7.3  0.0   0:07.57 kiblnd_sd_08_00                                  
      16198 root      20   0       0      0      0 R   7.3  0.0   0:11.14 kiblnd_sd_14_00                                  
      16201 root      20   0       0      0      0 R   7.3  0.0   0:07.70 kiblnd_sd_17_00                                  
      16632 root      20   0       0      0      0 R   7.3  0.0   0:07.10 mdt03_000                                        
      16634 root      20   0       0      0      0 R   7.3  0.0   0:06.95 mdt03_002                                        
      16647 root      20   0       0      0      0 R   7.3  0.0   0:07.24 mdt08_000                                        
      16649 root      20   0       0      0      0 R   7.3  0.0   0:07.06 mdt08_002   
      

      With LNET router

      It's same test from 10 clients, but messages goes through the lnet router.
      There is only single active NID on server which is router node.

      nid                      refs state  last   max   rtr   min    tx   min queue
      0@lo                        1    NA    -1     0     0     0     0     0 0
      192.168.11.35@o2ib10        2    NA    -1     0     0     0     0     0 0
      10.0.11.226@o2ib12          1    NA    -1     8     8     8     8    -8 0   
      192.168.11.36@o2ib10        2    NA    -1     0     0     0     0     0 0
      10.12.11.135@o2ib12        13    up    -1     8     8     8     3   -94 3248
      192.168.11.40@o2ib10        2    NA    -1     0     0     0     0     0 0
      192.168.11.32@o2ib10        2    NA    -1     0     0     0     0     0 0
      192.168.11.33@o2ib10        2    NA    -1     0     0     0     0     0 0
      192.168.11.37@o2ib10        2    NA    -1     0     0     0     0     0 0
      192.168.11.34@o2ib10        2    NA    -1     0     0     0     0     0 0
      10.0.11.225@o2ib12          1    NA    -1     8     8     8     8    -7 0  
      192.168.11.39@o2ib10        2    NA    -1     0     0     0     0     0 0
      10.0.11.227@o2ib12          1    NA    -1     8     8     8     8    -8 0  
      192.168.11.38@o2ib10        2    NA    -1     0     0     0     0     0 0
      192.168.11.31@o2ib10        2    NA    -1     0     0     0     0     0 0
      

      Then, that goes into cpt=2 and other 19 CPTs (19 CPU cores) are idle.

      Tasks: 1067 total,   3 running, 1064 sleeping,   0 stopped,   0 zombie
      %Cpu(s):  0.0 us,  5.0 sy,  0.0 ni, 94.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
      KiB Mem : 15369398+total, 14010728+free, 12679048 used,   907648 buff/cache
      KiB Swap: 11075580 total, 11075580 free,        0 used. 14017849+avail Mem 
      
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                          
      13044 root      20   0       0      0      0 R   7.0  0.0   0:01.81 kiblnd_sd_02_00                                  
      14146 root      20   0       0      0      0 S   6.6  0.0   0:00.93 mdt02_005                                        
      13489 root      20   0       0      0      0 R   6.3  0.0   0:01.37 mdt02_002   
      

      It would be nice to have better hashing to distribute messages to different CPTs on server to improve metadata performance and IOPS when LNET router is exist.

      Attachments

        Issue Links

          Activity

            People

              ssmirnov Serguei Smirnov
              sihara Shuichi Ihara
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: