[LU-11272] LNet Health: handle routing special case Created: 21/Aug/18  Updated: 04/Sep/18  Resolved: 04/Sep/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Blocker
Reporter: Amir Shehata (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-9120 LNet Network Health Feature Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

There are two issues:

  1. A router checker ping can timeout, causing the mdh to be invalidated. We need to recreate the mdh in that case
  2. When re-transmitting a message, even if the peer is marked as down we should re-transmit the message to fulfill it's retry quota.


 Comments   
Comment by Gerrit Updater [ 21/Aug/18 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33043
Subject: LU-11272 lnet: router handling
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9381aeb15f68789d1e65cc3c5b6201362f4423dd

Comment by Gerrit Updater [ 21/Aug/18 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33046
Subject: LU-11272 lnet: router handling
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set: 1
Commit: 0917bef280bf0abe7821c255d8d5f74f359bc9e2

Comment by Gerrit Updater [ 04/Sep/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33043/
Subject: LU-11272 lnet: router handling
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 05becd69bc0c79fde00f0fddf4935ed8d8e3beb3

Comment by Peter Jones [ 04/Sep/18 ]

Landed for 2.12

Generated at Sat Feb 10 02:42:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.