[LU-5534] LNET_EVENT_SEND delayed until after RPC had timed out Created: 22/Aug/14 Updated: 02/Jun/15 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Li Wei (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 15406 |
| Description |
|
On Lola, some OBD_PING RPCs had "rq_real_sent == 0" when they timed out. This indicates the LNET_EVENT_SEND events hadn't occur by then. The RPCs were supposed to go through two routers, on which we had 0.1% message drop rules applied either way. One occasion looks like this: Aug 18 07:07:34 lola-24 kernel: Lustre: 3738:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [se nt 1408370847/real 0] req@ffff880ff9115000 x1476487418549148/t0(0) o400->soaked-MDT0000-mdc-ffff8810329a9800@192.168.1.108@o2ib:12/10 lens 2 24/224 e 0 to 1 dl 1408370854 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 Aug 18 07:07:34 lola-24 kernel: Lustre: 3741:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [se nt 1408370847/real 0] req@ffff880fd1896c00 x1476487418549160/t0(0) o400->soaked-OST0001-osc-ffff8810329a9800@192.168.1.103@o2ib:28/4 lens 22 4/224 e 0 to 1 dl 1408370854 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 Aug 18 07:07:34 lola-24 kernel: Lustre: soaked-OST0001-osc-ffff8810329a9800: Connection to soaked-OST0001 (at 192.168.1.103@o2ib) was lost; i n progress operations using this service will wait for recovery to complete Aug 18 07:07:34 lola-24 kernel: Lustre: 3743:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [se nt 1408370847/real 0] req@ffff880ffecaec00 x1476487418549168/t0(0) o400->soaked-OST0003-osc-ffff8810329a9800@192.168.1.105@o2ib:28/4 lens 22 4/224 e 0 to 1 dl 1408370854 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 Aug 18 07:07:34 lola-24 kernel: Lustre: soaked-OST0003-osc-ffff8810329a9800: Connection to soaked-OST0003 (at 192.168.1.105@o2ib) was lost; i n progress operations using this service will wait for recovery to complete Aug 18 07:07:34 lola-24 kernel: Lustre: soaked-OST0001-osc-ffff8810329a9800: Connection restored to soaked-OST0001 (at 192.168.1.103@o2ib) Aug 18 07:07:34 lola-24 kernel: Lustre: soaked-OST0003-osc-ffff8810329a9800: Connection restored to soaked-OST0003 (at 192.168.1.105@o2ib) Aug 18 07:07:34 lola-24 kernel: Lustre: soaked-MDT0000-mdc-ffff8810329a9800: Connection to soaked-MDT0000 (at 192.168.1.108@o2ib) was lost; i n progress operations using this service will wait for recovery to complete Aug 18 07:07:34 lola-24 kernel: Lustre: soaked-MDT0000-mdc-ffff8810329a9800: Connection restored to soaked-MDT0000 (at 192.168.1.108@o2ib) |