[LU-14650] o2iblnd: unexpectedly long tx timeouts reported Created: 28/Apr/21  Updated: 28/Apr/21

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Serguei Smirnov Assignee: Serguei Smirnov
Resolution: Unresolved Votes: 0
Labels: lnet, o2iblnd

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Messages similar to the following:

(o2iblnd_cb.c:3516:kiblnd_check_conns()) Timed out tx for 172.26.13.33@o2ib: 665162 seconds

indicate that sometimes tx timeout is calculated incorrectly when LND is queuing the message for transmission. 

Quick inspection of the code appears to confirm that 

kiblnd_launch_tx()

is able to bypass proper tx_deadline initialisation under certain conditions. Submitting this ticket to investigate how to fix this properly as well as better understand the effects.


Generated at Sat Feb 10 03:11:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.