Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16709

LNet: locking multiple NIDs of the same MR peer as primary results in incorrect representation

    XMLWordPrintable

Details

    • 3
    • 9223372036854775807

    Description

      System configurations which result in Lustre layer specifying the same MR peer using multiple NIDs cause an issue with primary NID locking logic: when "primary nid locking" feature is enabled, LNet creates separate peer records, each record containing one NID of the MR peer as "locked primary". After the discovery completes in the background, these records are not being merged. This results in incorrect peer representation. Here's an example:

      server:

      # lnetctl net show
      net:
          - net type: lo
            local NI(s):
              - nid: 0@lo
                status: up
          - net type: tcp
            local NI(s):
              - nid: 192.168.122.50@tcp
                status: up
                interfaces:
                    0: eth0
              - nid: 192.168.122.134@tcp
                status: up
                interfaces:
                    0: ens12
      

      client:

      # mount -t lustre 192.168.122.134@tcp:192.168.122.50@tcp:/lustrewt /mnt/lustrefs
      # lnetctl peer show 
      peer:
          - primary nid: 192.168.122.134@tcp
            Multi-Rail: True
            peer ni:
              - nid: 192.168.122.134@tcp
                state: NA
          - primary nid: 192.168.122.50@tcp
            Multi-Rail: True
            peer ni:
              - nid: 192.168.122.50@tcp
                state: NA
      

      Attachments

        Activity

          People

            ssmirnov Serguei Smirnov
            ssmirnov Serguei Smirnov
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: