[LU-13478] LNet: peer update adjustment on discovery toggle Created: 23/Apr/20  Updated: 27/May/20  Resolved: 27/May/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Amir Shehata (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Do not delete a non-MR peer when it first connects. We should only delete a peer from our local database iff we've already discovered it and it's state is changing from discovery on to discovery off.

I think it is better to do a push to the peers only when discovery is toggled from on to off. This lets the peers know to clear their representation of the node. When it attempts to connect to it after, it'll get the correct list of NIDs.

However in the reverse case; discovery going from off to on, the PUSH can be sent on an interface which is not recorded on the far side. The push is dropped in this case. By not pushing when we go from off to on we avoid this scenario and allow the peer to rediscover when it commences communication.



 Comments   
Comment by Gerrit Updater [ 23/Apr/20 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38321
Subject: LU-13478 lnet: handle discovery off properly
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0aba7078644af2f8379a9dee58f4c8827263683c

Comment by Gerrit Updater [ 27/May/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38321/
Subject: LU-13478 lnet: handle discovery off properly
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: adae4295b62b1074f5c3c45543c586282394b1be

Comment by Peter Jones [ 27/May/20 ]

Landed for 2.14

Generated at Sat Feb 10 03:01:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.