Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17752

Advanced hash function for better CPT allocation

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      The number of patches improved hash for better CPT distribution (LU-14676, LU-16797), but they are still not enough and getting unbalanced CPT RR in recent use cases.

      For instance, there are three following examples for 64 clients with various IP address assignment policy.

                  sequential order-1  sequential order-2   random order(by DHCP)
      src01-c0-n0    10.0.46.1             10.0.46.1          10.0.46.107
      src01-c0-n1    10.0.46.2             10.0.46.33         10.0.45.41
      src01-c1-n0    10.0.46.3             10.0.46.2          10.0.45.140
      src01-c1-n1    10.0.46.4             10.0.46.34         10.0.47.74
      src02-c0-n0    10.0.46.5             10.0.46.3          10.0.45.40
      src02-c0-n1    10.0.46.6             10.0.46.35         10.0.46.219
      src02-c1-n0    10.0.46.7             10.0.46.4          10.0.47.205
      src02-c1-n1    10.0.46.8             10.0.46.36         10.0.47.96
      ...            ...                   ...
      src16-c0-n0    10.0.46.61            10.0.46.31         10.0.47.178
      src16-c0-n1    10.0.46.62            10.0.46.63         10.0.47.34
      src16-c1-n0    10.0.46.63            10.0.46.32         10.0.46.83
      src16-c1-n1    10.0.46.64            10.0.46.64         10.0.45.226
      

      If all 64 clients src[01-16]-c[0-1]-n[0-1] NIDs distribute across 8 CPTs

      [root@src01-c0-n0 ~]# for i in `seq 1 64`; do lnetctl cpt-of-nid --ncpt 8 --nid 10.0.46.$i@o2ib12; done | grep value | sort  | uniq -c
            8     value: 0
            8     value: 1
            8     value: 2
            8     value: 3
            8     value: 4
            8     value: 5
            8     value: 6
            8     value: 7
      

      this balanced well. However, if we only use half of clients src[01-16]-c[0-1]-n0 (10.0.46.1, 10.0.46.3, 10.0.46.5...)

      [root@src01-c0-n0 ~]# for i in `seq 1 2 64`; do lnetctl cpt-of-nid --ncpt 8 --nid 10.0.46.$i@o2ib12; done | grep value | sort  | uniq -c
            8     value: 0
            8     value: 2
            8     value: 4
            8     value: 6
      

      it's still good CPT distribution, but 4/8 CPTs are only used. That means half of CPU cores are idle.

      if there are another 32 clients src[01-08]-c[0-1]-n[0-1] with sequential IP address order. (10.0.46.1 - 10.0.46.32), that works well.

      [root@src01-c0-n0 ~]# for i in `seq 1 32`; do lnetctl cpt-of-nid --ncpt 8 --nid 10.0.46.$i@o2ib12; done | grep value | sort  | uniq -c
            4     value: 0
            4     value: 1
            4     value: 2
            4     value: 3
            4     value: 4
            4     value: 5
            4     value: 6
            4     value: 7
      

      CPT distribution for random IP addresses which are assigned by DHCP.

            7 value: 0
            7 value: 1
            6 value: 2
            8 value: 3
            3 value: 4
           12 value: 5
           12 value: 6
            9 value: 7
      

      it causes unbalanced CPT distribution that makes busy and less busy CPT. This causes number of problems.

      • it doesn't make consistent performance for different group of NIDs
      • good sequential IP address order could ONLY make the maximum or hero performance

      It needs some re-work to have better hash or it would have some mechanisms to make better CPT distributions

      Sequential IP order might work in general, but there are still many cases which sequential order doesn't work properly.
      e.g.
      For HPC, few clients will be picked up from each rack for the performance. (balanced network distribution)
      Cloud use case, client's IP address are assigned by DHCP and managed by dynamic DNS

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              sihara Shuichi Ihara
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: