[LU-13028] LNet Discovery: toggling discovery on/off is not handled properly Created: 27/Nov/19  Updated: 09/Apr/21  Resolved: 31/Mar/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Amir Shehata (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-12312 sanity-sec: test_31: 'network' mount ... Reopened
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
  • When a peer goes from discovery on to discovery off, it's possible that its NIs change, such that the list of NIs local to the node are no longer valid.
    • Before turning off discovery send a broadcast to all peers with the new status
    • When a node receives a peer's state with discovery off. It should just delete the existing peer
      • If the peer is being used for as a gateway, then allocate a new peer for the route
    • When sending to a peer fails, it should be flagged for rediscovery


 Comments   
Comment by Gerrit Updater [ 04/Dec/19 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36919
Subject: LU-13028 lnet: advertise discovery off
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 51da7585f1fb1b2c74f93300c23d8d4e36b28093

Comment by Gerrit Updater [ 31/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36919/
Subject: LU-13028 lnet: advertise discovery when toggled
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4577410165641e3756406aca7f9a21c73d1fd630

Comment by Peter Jones [ 31/Mar/20 ]

Landed for 2.14

Comment by Andreas Dilger [ 02/Apr/20 ]

The patch https://review.whamcloud.com/36919 "LU-13028 lnet: advertise discovery when toggled" changed lnet, but was submitted with "trivial" and looks like it is causing sanity-sec test_31 to fail 100% of the time with:

LNet Dynamic Peer Discovery is enabled on this node. 
'network' mount option cannot be taken into account.

I've pushed a revert patch to see if this change is the source of the problem.

Generated at Sat Feb 10 02:57:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.