[LU-12291] Wrong NI selection on asymmetric Multi-rail environment Created: 13/May/19 Updated: 16/Oct/20 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Tatsushi Takamura | Assignee: | Tatsushi Takamura |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Epic/Theme: | lnet |
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
If the sending node is MultiRail and the receiving node is non-MultiRail,
REMOTE IB0 (non-MultiRail node)
↑
x
|
LOCAL IB0 IB1 <- always use IB0(not in round-robin fashion)
failure
If the receiving node is non-MultiRail, we check whether its device is normal or out of service and reset the device in case of failure. |
| Comments |
| Comment by Amir Shehata (Inactive) [ 16/May/19 ] |
|
The reason we always stick with the same device is because doing otherwise will confuse the non-MR peer. If the non-MR peer initiated the connection on a specific NID, it always expects communication from that same NID. If the MR node uses another NID, then it will consider it communication from a different node. The reset of the device on failure sounds interesting. How do you do that? |