Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
None
-
Lustre 2.4.0
-
Hyperion/RHEL6
-
3
-
6132
Description
Running mdtest, file-per-process. A single client times out a request, then the MDS enter the 'waiting on 1 RPC' state, all clients eventually get -EBUSY. This bug is to show the sequence as I'm currently seeing it on Hyperion.
The first client error:
2013-01-15 13:11:26 Lustre: 13481:0:(client.c:1836:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1358283526/real 1358283526] req@ffff88030ca02400 x1424249167482022/t0(0) o101->lustre-MDT0000-mdc-ffff88033c3dac00@192.168.127.6@o2ib1:12/10 lens 592/1136 e 3 to 1 dl 1358284286 ref 2 fl Rpc:XP/0/ffffffff rc 0/-1 2013-01-15 13:11:26 Lustre: 13481:0:(client.c:1836:ptlrpc_expire_one_request()) Skipped 1 previous similar message 2013-01-15 13:11:26 Lustre: lustre-MDT0000-mdc-ffff88033c3dac00: Connection to lustre-MDT0000 (at 192.168.127.6@o2ib1) was lost; in progress operations using this service will wait for recovery to complete 2013-01-15 13:11:26 LustreError: 11-0: an error occurred while communicating with 192.168.127.6@o2ib1. The mds_connect operation failed with -16
The MDS log
Jan 15 13:09:09 hyperion-rst6 kernel: Lustre: 10570:0:(service.c:1290:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-23), not sending early reply Jan 15 13:09:09 hyperion-rst6 kernel: req@ffff88012a324850 x1424249167482022/t0(0) o101->ad6c708f-3715-63c3-9874-6577bf49a8f7@192.168.117.84@o2ib1:0/0 lens 592/1152 e 3 to 0 dl 1358284154 ref 2 fl Interpret:/0/0 rc 0/0 Jan 15 13:09:09 hyperion-rst6 kernel: Lustre: 10570:0:(service.c:1290:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-23), not sending early reply Jan 15 13:09:09 hyperion-rst6 kernel: req@ffff880161327850 x1424249159092332/t0(0) o101->7ae936ba-6abb-4279-4d8d-6075df2b44ca@192.168.116.112@o2ib1:0/0 lens 592/1152 e 3 to 0 dl 1358284154 ref 2 fl Interpret:/0/0 rc 0/0 Jan 15 13:09:10 hyperion-rst6 kernel: Lustre: 7251:0:(service.c:1290:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-23), not sending early reply Jan 15 13:09:10 hyperion-rst6 kernel: req@ffff88015b7d1050 x1424249191598931/t0(0) o35->67291746-09fb-9e08-cd08-b1a1abb10ad0@192.168.119.118@o2ib1:0/0 lens 392/2024 e 3 to 0 dl 1358284155 ref 2 fl Interpret:/0/0 rc 0/0 Jan 15 13:11:26 hyperion-rst6 kernel: Lustre: lustre-MDT0000: Client ad6c708f-3715-63c3-9874-6577bf49a8f7 (at 192.168.117.84@o2ib1) reconnecting Jan 15 13:11:26 hyperion-rst6 kernel: Lustre: lustre-MDT0000: Client 7ae936ba-6abb-4279-4d8d-6075df2b44ca (at 192.168.116.112@o2ib1) refused reconnection, still busy with 1 active RPCs
Requires restart of MDS to clear.
Attachments
Issue Links
- is related to
-
LU-793 Reconnections should not be refused when there is a request in progress from this client.
- Resolved