[LU-14955] LNet: change use of fatal error flag for ni selection to be a part of health feature Created: 20/Aug/21  Updated: 22/Nov/23  Resolved: 01/Sep/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Improvement Priority: Minor
Reporter: Serguei Smirnov Assignee: Serguei Smirnov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Rank (Obsolete): 9223372036854775807

 Description   

Making use of fatal error flag for NI selection to be a part of health feature allows the user to control it by turning the health feature on/off. Some user may decide to turn of fatal link state detection if LNet is configured with a single NI.



 Comments   
Comment by Andreas Dilger [ 21/Aug/21 ]

Serguei, is there any real benefit for the only link/interface on a node to be marked unavailable? I don't think that makes sense. In general, I guess enabling LNet Health doesn't make sense for a system with only a single link/interface, since there isn't any choice but to continue using that one interface. In most such cases, the error will be transient, so retrying will fix the problem, and if the only interface on a client is permanently broken, then there isn't anything that can be done anyway.

Comment by Serguei Smirnov [ 23/Aug/21 ]

Andreas, yes, after some debating, the patch that I'm about to push is going to allow the only NI to be picked regardless of its fatal state. As an extrapolation, if there are many NIs and they are all in fatal state, LNet is also going to be able to select one.

Comment by Gerrit Updater [ 24/Aug/21 ]

"Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44746
Subject: LU-14955 lnet: make fatal ni handling part of health feature
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2f9ae83236e0c4cda5b5a1ae04ab1f71a3cf6036

Comment by Gerrit Updater [ 01/Sep/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44746/
Subject: LU-14955 lnet: Use fatal NI if none other available
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ff3322fd0c77a8042558711d9f410326d2aa6375

Comment by Peter Jones [ 01/Sep/22 ]

Landed for 2.16

Generated at Sat Feb 10 03:14:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.