[LU-15852] Don't add "temp" peer NIs after discovery completes - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.16.0
Affects Version/s: Lustre 2.15.0
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

On kjlmo13 we saw incorrect peer entry for two servers after client mount:

[root@c-lmo1049 ~]# lnetctl debug recovery -p
peer NI recovery:
    nid-0: 10.230.77.11@o2ib1
    nid-1: 10.230.77.9@o2ib1
[root@c-lmo1049 ~]# lnetctl debug recovery -l
[root@c-lmo1049 ~]# lnetctl peer show --nid 10.230.77.11@o2ib1
peer:
    - primary nid: 10.230.77.10@o2ib1
      Multi-Rail: True
      peer ni:
        - nid: 10.230.77.10@o2ib1
          state: NA
        - nid: 10.230.77.11@o2ib1
          state: NA
[root@c-lmo1049 ~]# lnetctl peer show --nid 10.230.77.9@o2ib1
peer:
    - primary nid: 10.230.77.8@o2ib1
      Multi-Rail: True
      peer ni:
        - nid: 10.230.77.8@o2ib1
          state: NA
        - nid: 10.230.77.9@o2ib1
          state: NA
[root@c-lmo1049 ~]#

Those servers' actual NIDs were:

----------------
kjlmo1304
----------------
10.230.77.8@o2ib1
----------------
kjlmo1305
----------------
10.230.77.10@o2ib1
----------------

Issue is config log processing with LUS-9293/~~LU-14661~~. Config log says these servers have two NIDs each. Discovery correctly deletes the missing NIDs, but then later config log processing adds them back. At that point the peer is "up to date" so discovery is not performed again.

We should either mark this peer as out of date or just skip adding temporary peer NIs to a peer that is considered up to date. Probably the latter is best because then we do not require an additional discovery handshake.

Attachments

Activity

People

Assignee:: Chris Horn

Reporter:: Chris Horn

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 12/May/22 8:48 PM

Updated:: 06/Dec/22 1:36 PM

Resolved:: 02/Nov/22 1:28 PM