Align LNet routing with Multi-Rail and LNet health (LU-11297)

[LU-12339] LNet Health: selecting loopback interface for sending Created: 25/May/19  Updated: 04/Oct/19  Resolved: 10/Jun/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0, Lustre 2.12.3

Type: Technical task Priority: Minor
Reporter: Amir Shehata (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Rank (Obsolete): 9223372036854775807

 Description   

In the following scenario

Lustre->LNetPrimaryNID with 0@lo
Discover is initiated on 0@lo
The peer is created with 0@lo and <addr>@<net>
The interface health of the peer's <addr>@<net> is decremented
LNetPut() to self
selection algorithm selects 0@lo to send to

This exposes an issue where we try and go through the peer credit management algorithm, but because there are no credits associated with 0@lo we end up indefinitely queuing the message. ptlrpc will then get stuck waiting for send completion on the message.

This was exposed via conf-sanity 32

 



 Comments   
Comment by Gerrit Updater [ 25/May/19 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34957
Subject: LU-12339 lnet: select LO interface for sending
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set: 1
Commit: d4fb79b10c186dbec3263885773e4b48f0e7cd6e

Comment by Gerrit Updater [ 07/Jun/19 ]

Amir Shehata (ashehata@whamcloud.com) merged in patch https://review.whamcloud.com/34957/
Subject: LU-12339 lnet: select LO interface for sending
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set:
Commit: 69d1535ebdac139c6b19db2bca5f65663fe88467

Comment by Joseph Gmitter (Inactive) [ 10/Jun/19 ]

Work has landed as part of the MR Routing merge commit: https://review.whamcloud.com/#/c/34983/

Comment by Gerrit Updater [ 03/Sep/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36040
Subject: LU-12339 lnet: select LO interface for sending
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 3d0d0149b832804c36d432370e23ed91b8ec43c8

Comment by Gerrit Updater [ 04/Oct/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36040/
Subject: LU-12339 lnet: select LO interface for sending
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: bab7da820e36d3c00e888704fc2c8d6022786c42

Generated at Sat Feb 10 02:51:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.