Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9973

MDT recovery status never completed

    XMLWordPrintable

Details

    • Question/Request
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.10.0
    • None
    • 9223372036854775807

    Description

      Hi,

      When I re-mount mdts, MDT try to recovery all clients. But the mdts recovery status never completed ( several hours). The recovery process will not to be end even reach hard limit (900). I can only abort recovery process by "lctl --device MDTxxx abort_recovery".
      1.Can I abort recovery process by "mount.lustre -o avort_recov xx" ? I can not execute command sucessful. (command hang)
      2.Can I evict all stalled clients to end the recovery process?
      3.Is there any side effect to abort recovery process by "lctl abort_recovery" ?

      Thanks.

      [ 570.019086] LNet: 14185:0:(o2iblnd_cb.c:3209:kiblnd_check_conns()) Timed out tx for 192.168.5.202@o2ib6: 7 seconds
      [ 622.166399] Lustre: 18740:0:(ldlm_lib.c:1784:extend_recovery_timer()) hpcfs-MDT0000: extended recovery timer reaching hard limit: 900, extend: 1
      [ 622.973445] Lustre: 26593:0:(ldlm_lib.c:1784:extend_recovery_timer()) hpcfs-MDT0001: extended recovery timer reaching hard limit: 900, extend: 1
      [ 622.973448] Lustre: 26593:0:(ldlm_lib.c:1784:extend_recovery_timer()) Skipped 2 previous similar messages
      [ 682.170260] Lustre: 18740:0:(ldlm_lib.c:1784:extend_recovery_timer()) hpcfs-MDT0000: extended recovery timer reaching hard limit: 900, extend: 1
      [ 682.170264] Lustre: 18740:0:(ldlm_lib.c:1784:extend_recovery_timer()) Skipped 2 previous similar messages
      [ 684.027392] LNet: 14185:0:(o2iblnd_cb.c:3209:kiblnd_check_conns()) Timed out tx for 192.168.7.202@o2ib8: 7 seconds
      [ 684.027397] LNet: 14185:0:(o2iblnd_cb.c:3209:kiblnd_check_conns()) Skipped 3 previous similar messages
      [ 693.713019] Lustre: 14236:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1505204590/real 0] req@ffff887c5585ad00 x1578320959384656/t0(0) o38->hpcfs-MDT0000-osp-MDT0001@192.168.8.202@o2ib9:24/4 lens 520/544 e 0 to 1 dl 1505204596 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
      [ 693.713024] Lustre: 14236:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 4 previous similar messages
      [ 737.721925] LustreError: 11-0: hpcfs-OST0000-osc-MDT0001: operation ost_connect to node 192.168.17.212@o2ib18 failed: rc = -19

      Attachments

        Activity

          People

            laisiyao Lai Siyao
            sebg-crd-pm sebg-crd-pm (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: