Details
-
Question/Request
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.10.0
-
None
-
9223372036854775807
Description
Hi,
When I re-mount mdts, MDT try to recovery all clients. But the mdts recovery status never completed ( several hours). The recovery process will not to be end even reach hard limit (900). I can only abort recovery process by "lctl --device MDTxxx abort_recovery".
1.Can I abort recovery process by "mount.lustre -o avort_recov xx" ? I can not execute command sucessful. (command hang)
2.Can I evict all stalled clients to end the recovery process?
3.Is there any side effect to abort recovery process by "lctl abort_recovery" ?
Thanks.
[ 570.019086] LNet: 14185:0:(o2iblnd_cb.c:3209:kiblnd_check_conns()) Timed out tx for 192.168.5.202@o2ib6: 7 seconds
[ 622.166399] Lustre: 18740:0:(ldlm_lib.c:1784:extend_recovery_timer()) hpcfs-MDT0000: extended recovery timer reaching hard limit: 900, extend: 1
[ 622.973445] Lustre: 26593:0:(ldlm_lib.c:1784:extend_recovery_timer()) hpcfs-MDT0001: extended recovery timer reaching hard limit: 900, extend: 1
[ 622.973448] Lustre: 26593:0:(ldlm_lib.c:1784:extend_recovery_timer()) Skipped 2 previous similar messages
[ 682.170260] Lustre: 18740:0:(ldlm_lib.c:1784:extend_recovery_timer()) hpcfs-MDT0000: extended recovery timer reaching hard limit: 900, extend: 1
[ 682.170264] Lustre: 18740:0:(ldlm_lib.c:1784:extend_recovery_timer()) Skipped 2 previous similar messages
[ 684.027392] LNet: 14185:0:(o2iblnd_cb.c:3209:kiblnd_check_conns()) Timed out tx for 192.168.7.202@o2ib8: 7 seconds
[ 684.027397] LNet: 14185:0:(o2iblnd_cb.c:3209:kiblnd_check_conns()) Skipped 3 previous similar messages
[ 693.713019] Lustre: 14236:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1505204590/real 0] req@ffff887c5585ad00 x1578320959384656/t0(0) o38->hpcfs-MDT0000-osp-MDT0001@192.168.8.202@o2ib9:24/4 lens 520/544 e 0 to 1 dl 1505204596 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
[ 693.713024] Lustre: 14236:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 4 previous similar messages
[ 737.721925] LustreError: 11-0: hpcfs-OST0000-osc-MDT0001: operation ost_connect to node 192.168.17.212@o2ib18 failed: rc = -19