[LU-14955] LNet: change use of fatal error flag for ni selection to be a part of health feature Created: 20/Aug/21 Updated: 22/Nov/23 Resolved: 01/Sep/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Serguei Smirnov | Assignee: | Serguei Smirnov |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
Making use of fatal error flag for NI selection to be a part of health feature allows the user to control it by turning the health feature on/off. Some user may decide to turn of fatal link state detection if LNet is configured with a single NI. |
| Comments |
| Comment by Andreas Dilger [ 21/Aug/21 ] |
|
Serguei, is there any real benefit for the only link/interface on a node to be marked unavailable? I don't think that makes sense. In general, I guess enabling LNet Health doesn't make sense for a system with only a single link/interface, since there isn't any choice but to continue using that one interface. In most such cases, the error will be transient, so retrying will fix the problem, and if the only interface on a client is permanently broken, then there isn't anything that can be done anyway. |
| Comment by Serguei Smirnov [ 23/Aug/21 ] |
|
Andreas, yes, after some debating, the patch that I'm about to push is going to allow the only NI to be picked regardless of its fatal state. As an extrapolation, if there are many NIs and they are all in fatal state, LNet is also going to be able to select one. |
| Comment by Gerrit Updater [ 24/Aug/21 ] |
|
"Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44746 |
| Comment by Gerrit Updater [ 01/Sep/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44746/ |
| Comment by Peter Jones [ 01/Sep/22 ] |
|
Landed for 2.16 |