[LU-15852] Don't add "temp" peer NIs after discovery completes Created: 12/May/22 Updated: 06/Dec/22 Resolved: 02/Nov/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.0 |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Chris Horn | Assignee: | Chris Horn |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
On kjlmo13 we saw incorrect peer entry for two servers after client mount: [root@c-lmo1049 ~]# lnetctl debug recovery -p
peer NI recovery:
nid-0: 10.230.77.11@o2ib1
nid-1: 10.230.77.9@o2ib1
[root@c-lmo1049 ~]# lnetctl debug recovery -l
[root@c-lmo1049 ~]# lnetctl peer show --nid 10.230.77.11@o2ib1
peer:
- primary nid: 10.230.77.10@o2ib1
Multi-Rail: True
peer ni:
- nid: 10.230.77.10@o2ib1
state: NA
- nid: 10.230.77.11@o2ib1
state: NA
[root@c-lmo1049 ~]# lnetctl peer show --nid 10.230.77.9@o2ib1
peer:
- primary nid: 10.230.77.8@o2ib1
Multi-Rail: True
peer ni:
- nid: 10.230.77.8@o2ib1
state: NA
- nid: 10.230.77.9@o2ib1
state: NA
[root@c-lmo1049 ~]#
Those servers' actual NIDs were: ---------------- kjlmo1304 ---------------- 10.230.77.8@o2ib1 ---------------- kjlmo1305 ---------------- 10.230.77.10@o2ib1 ---------------- Issue is config log processing with LUS-9293/ We should either mark this peer as out of date or just skip adding temporary peer NIs to a peer that is considered up to date. Probably the latter is best because then we do not require an additional discovery handshake. |
| Comments |
| Comment by Gerrit Updater [ 12/May/22 ] |
|
"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/47322 |
| Comment by Gerrit Updater [ 02/Nov/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/47322/ |
| Comment by Peter Jones [ 02/Nov/22 ] |
|
Landed for 2.16 |