[LU-5534] LNET_EVENT_SEND delayed until after RPC had timed out Created: 22/Aug/14  Updated: 02/Jun/15

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Li Wei (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 15406

 Description   

On Lola, some OBD_PING RPCs had "rq_real_sent == 0" when they timed out. This indicates the LNET_EVENT_SEND events hadn't occur by then. The RPCs were supposed to go through two routers, on which we had 0.1% message drop rules applied either way. One occasion looks like this:

Aug 18 07:07:34 lola-24 kernel: Lustre: 3738:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [se
nt 1408370847/real 0]  req@ffff880ff9115000 x1476487418549148/t0(0) o400->soaked-MDT0000-mdc-ffff8810329a9800@192.168.1.108@o2ib:12/10 lens 2
24/224 e 0 to 1 dl 1408370854 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 18 07:07:34 lola-24 kernel: Lustre: 3741:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [se
nt 1408370847/real 0]  req@ffff880fd1896c00 x1476487418549160/t0(0) o400->soaked-OST0001-osc-ffff8810329a9800@192.168.1.103@o2ib:28/4 lens 22
4/224 e 0 to 1 dl 1408370854 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 18 07:07:34 lola-24 kernel: Lustre: soaked-OST0001-osc-ffff8810329a9800: Connection to soaked-OST0001 (at 192.168.1.103@o2ib) was lost; i
n progress operations using this service will wait for recovery to complete
Aug 18 07:07:34 lola-24 kernel: Lustre: 3743:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [se
nt 1408370847/real 0]  req@ffff880ffecaec00 x1476487418549168/t0(0) o400->soaked-OST0003-osc-ffff8810329a9800@192.168.1.105@o2ib:28/4 lens 22
4/224 e 0 to 1 dl 1408370854 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 18 07:07:34 lola-24 kernel: Lustre: soaked-OST0003-osc-ffff8810329a9800: Connection to soaked-OST0003 (at 192.168.1.105@o2ib) was lost; i
n progress operations using this service will wait for recovery to complete
Aug 18 07:07:34 lola-24 kernel: Lustre: soaked-OST0001-osc-ffff8810329a9800: Connection restored to soaked-OST0001 (at 192.168.1.103@o2ib)
Aug 18 07:07:34 lola-24 kernel: Lustre: soaked-OST0003-osc-ffff8810329a9800: Connection restored to soaked-OST0003 (at 192.168.1.105@o2ib)
Aug 18 07:07:34 lola-24 kernel: Lustre: soaked-MDT0000-mdc-ffff8810329a9800: Connection to soaked-MDT0000 (at 192.168.1.108@o2ib) was lost; i
n progress operations using this service will wait for recovery to complete
Aug 18 07:07:34 lola-24 kernel: Lustre: soaked-MDT0000-mdc-ffff8810329a9800: Connection restored to soaked-MDT0000 (at 192.168.1.108@o2ib)

Generated at Sat Feb 10 01:52:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.