Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12289

Route with fault remote device selected on separated IB subnet

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.1
    • None
    • 3
    • 9223372036854775807

    Description

      LNet MultiRail selects routes from local to remote in order.
      Local device is detemined in round-robin fashion,
      so even if Health value of a remote device(A) is smaller than another remote device(B),
      a local device(A) which is the peer of the remote device(A) may be selected.
      If device(A) and device(B) are on different subnets, a failure route will be selected.

                Subnet1     Subnet2
      
                 (DOWN)  |
          REMOTE  IB(A)  |   IB(B)
                  ↑      |
                  x      |
                  |      |
           LOCAL  IB(A)  |   IB(B)
      
      local device is selected in round-robin fashion regardless of remote side device status
      

      We modify the finding best local device algorithm as follows,

      1. get the maximum health value of remote device
      2. if the value is smaller than the best health value, don't use this device
      3. if the value is bigger than the best health value, update the best device
      4. if the value is identical with the best value, update the best device by conventional way
      5. update the best health value

      Attachments

        Issue Links

          Activity

            People

              takamura Tatsushi Takamura
              takamura Tatsushi Takamura
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: