Align LNet routing with Multi-Rail and LNet health
(LU-11297)
|
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.3 |
| Type: | Technical task | Priority: | Minor |
| Reporter: | Amir Shehata (Inactive) | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
In the following scenario
Lustre->LNetPrimaryNID with 0@lo
Discover is initiated on 0@lo
The peer is created with 0@lo and <addr>@<net>
The interface health of the peer's <addr>@<net> is decremented
LNetPut() to self
selection algorithm selects 0@lo to send to
This exposes an issue where we try and go through the peer credit management algorithm, but because there are no credits associated with 0@lo we end up indefinitely queuing the message. ptlrpc will then get stuck waiting for send completion on the message. This was exposed via conf-sanity 32
|
| Comments |
| Comment by Gerrit Updater [ 25/May/19 ] |
|
Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34957 |
| Comment by Gerrit Updater [ 07/Jun/19 ] |
|
Amir Shehata (ashehata@whamcloud.com) merged in patch https://review.whamcloud.com/34957/ |
| Comment by Joseph Gmitter (Inactive) [ 10/Jun/19 ] |
|
Work has landed as part of the MR Routing merge commit: https://review.whamcloud.com/#/c/34983/ |
| Comment by Gerrit Updater [ 03/Sep/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36040 |
| Comment by Gerrit Updater [ 04/Oct/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36040/ |