Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15852

Don't add "temp" peer NIs after discovery completes

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      On kjlmo13 we saw incorrect peer entry for two servers after client mount:

      [root@c-lmo1049 ~]# lnetctl debug recovery -p
      peer NI recovery:
          nid-0: 10.230.77.11@o2ib1
          nid-1: 10.230.77.9@o2ib1
      [root@c-lmo1049 ~]# lnetctl debug recovery -l
      [root@c-lmo1049 ~]# lnetctl peer show --nid 10.230.77.11@o2ib1
      peer:
          - primary nid: 10.230.77.10@o2ib1
            Multi-Rail: True
            peer ni:
              - nid: 10.230.77.10@o2ib1
                state: NA
              - nid: 10.230.77.11@o2ib1
                state: NA
      [root@c-lmo1049 ~]# lnetctl peer show --nid 10.230.77.9@o2ib1
      peer:
          - primary nid: 10.230.77.8@o2ib1
            Multi-Rail: True
            peer ni:
              - nid: 10.230.77.8@o2ib1
                state: NA
              - nid: 10.230.77.9@o2ib1
                state: NA
      [root@c-lmo1049 ~]#
      

      Those servers' actual NIDs were:

      ----------------
      kjlmo1304
      ----------------
      10.230.77.8@o2ib1
      ----------------
      kjlmo1305
      ----------------
      10.230.77.10@o2ib1
      ----------------
      

      Issue is config log processing with LUS-9293/LU-14661. Config log says these servers have two NIDs each. Discovery correctly deletes the missing NIDs, but then later config log processing adds them back. At that point the peer is "up to date" so discovery is not performed again.

      We should either mark this peer as out of date or just skip adding temporary peer NIs to a peer that is considered up to date. Probably the latter is best because then we do not require an additional discovery handshake.

      Attachments

        Activity

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: