[LU-12291] Wrong NI selection on asymmetric Multi-rail environment - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
None

Epic/Theme:
- lnet
Severity:
3
Rank (Obsolete):
9223372036854775807

Description

If the sending node is MultiRail and the receiving node is non-MultiRail,
the sending node use always the same NI (even if the sending NI is blocken, the blocken NI is used).
This may be the specification of MultiRail, but blocken device should be used.

    REMOTE  IB0              (non-MultiRail node)
            ↑
            x
            |
     LOCAL  IB0      IB1     <- always use IB0(not in round-robin fashion)
            failure

If the receiving node is non-MultiRail, we check whether its device is normal or out of service and reset the device in case of failure.

Attachments

Issue Links

mentioned in: Page No Confluence page found with the given URL.

Activity

[LU-12291] Wrong NI selection on asymmetric Multi-rail environment

Amir Shehata (Inactive) added a comment - 16/May/19 2:39 PM

The reason we always stick with the same device is because doing otherwise will confuse the non-MR peer. If the non-MR peer initiated the connection on a specific NID, it always expects communication from that same NID. If the MR node uses another NID, then it will consider it communication from a different node.

The reset of the device on failure sounds interesting. How do you do that?

Amir Shehata (Inactive) added a comment - 16/May/19 2:39 PM The reason we always stick with the same device is because doing otherwise will confuse the non-MR peer. If the non-MR peer initiated the connection on a specific NID, it always expects communication from that same NID. If the MR node uses another NID, then it will consider it communication from a different node. The reset of the device on failure sounds interesting. How do you do that?

People

Assignee:: Tatsushi Takamura

Reporter:: Tatsushi Takamura

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 13/May/19 9:25 AM

Updated:: 16/Oct/20 6:21 AM