[LU-14650] o2iblnd: unexpectedly long tx timeouts reported Created: 28/Apr/21 Updated: 28/Apr/21 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Serguei Smirnov | Assignee: | Serguei Smirnov |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | lnet, o2iblnd | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
Messages similar to the following:
(o2iblnd_cb.c:3516:kiblnd_check_conns()) Timed out tx for 172.26.13.33@o2ib: 665162 seconds
indicate that sometimes tx timeout is calculated incorrectly when LND is queuing the message for transmission. Quick inspection of the code appears to confirm that kiblnd_launch_tx() is able to bypass proper tx_deadline initialisation under certain conditions. Submitting this ticket to investigate how to fix this properly as well as better understand the effects. |