Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14318

Add the option to limit the overall recovery time

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Currently, the recovery time could be extended to several hours if the recovery
      between MDT is stuck or some other reasons, which can cause the cluster to be
      unavailable for long time if no one is noticed and use "lctl" to abort the recovery,
      then there is some option to be needed to add to limit the overall recovery time.

      Attachments

        Issue Links

          Activity

            [LU-14318] Add the option to limit the overall recovery time

            Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41424
            Subject: LU-14318 ldlm: don't wait other MDT forever
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 4c2b2b26bd29b1a8f522fbf705a3d145cb06eb9f

            gerrit Gerrit Updater added a comment - Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41424 Subject: LU-14318 ldlm: don't wait other MDT forever Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4c2b2b26bd29b1a8f522fbf705a3d145cb06eb9f

            HI,
            this patch is meant to solve the recovery hung caused by some unresponsive MDT during recovery, the patch in LU-13608 is meant to fix
            the "lctl abort_recovery" hung when it was used to abort the recovery.

            Okay, I will create another patch to allow the recovery to continue if some of MDTs other than MDT0000 is unavailable during recovery.
            Thanks!

            hongchao.zhang Hongchao Zhang added a comment - HI, this patch is meant to solve the recovery hung caused by some unresponsive MDT during recovery, the patch in LU-13608 is meant to fix the "lctl abort_recovery" hung when it was used to abort the recovery. Okay, I will create another patch to allow the recovery to continue if some of MDTs other than MDT0000 is unavailable during recovery. Thanks!

            Hongchao, is this patch only needed for the case when the remote MDT is not available at all, or is there some other problem like LU-13608 that is causing recovery to be stuck for a long time even when all of the MDTs are available?

            Rather than timing out recovery for a remote MDT completely, it would probably be better to keep the recovery for that MDT pending until the MDT is available again, and then do the remote recovery when the MDTs reconnect. That might only be a small (or no) difference in the normal case when all of the MDTs are available at mount, but I think this may give a very important improvement when some MDTs are unavailable.

            The big improvement would be if MDT0000 and other MDTs are available at restart time, it would complete recovery with all those MDTs quickly, and not block access to files/directories that are on available MDTs. It would allow most client access to work, and only remote/striped directories would be blocked and/or time out (allow CTRL-C for client processes). This would be better for users, if they are mostly using remote directories for subtrees of the filesystem, since only the subtrees on the missing MDTs would be inaccessible.

            Eventually, having mirrored entries for ROOT/ on several/all MDTs could allow the filesystem to be accessible even if MDT0000 is unavailable, but that is definitely a separate project.

            adilger Andreas Dilger added a comment - Hongchao, is this patch only needed for the case when the remote MDT is not available at all, or is there some other problem like LU-13608 that is causing recovery to be stuck for a long time even when all of the MDTs are available? Rather than timing out recovery for a remote MDT completely, it would probably be better to keep the recovery for that MDT pending until the MDT is available again, and then do the remote recovery when the MDTs reconnect. That might only be a small (or no) difference in the normal case when all of the MDTs are available at mount, but I think this may give a very important improvement when some MDTs are unavailable. The big improvement would be if MDT0000 and other MDTs are available at restart time, it would complete recovery with all those MDTs quickly, and not block access to files/directories that are on available MDTs. It would allow most client access to work, and only remote/striped directories would be blocked and/or time out (allow CTRL-C for client processes). This would be better for users, if they are mostly using remote directories for subtrees of the filesystem, since only the subtrees on the missing MDTs would be inaccessible. Eventually, having mirrored entries for ROOT/ on several/all MDTs could allow the filesystem to be accessible even if MDT0000 is unavailable, but that is definitely a separate project.

            Currently, if some other MDT can't be connected during the MDT recovery, the recovery process can be extended to
            last several hours (maybe forever if not aborted by the lctl) in "check_for_recovery_ready"

            static int check_for_recovery_ready(struct lu_target *lut)
            {
                    struct obd_device *obd = lut->lut_obd;
                    unsigned int clnts = atomic_read(&obd->obd_connected_clients);
            
                    CDEBUG(D_HA,
                           "connected %d stale %d max_recoverable_clients %d abort %d expired %d\n",
                           clnts, obd->obd_stale_clients,
                           atomic_read(&obd->obd_max_recoverable_clients),
                           obd->obd_abort_recovery, obd->obd_recovery_expired);
            
                    if (!obd->obd_abort_recovery && !obd->obd_recovery_expired) {
                            LASSERT(clnts <=
                                    atomic_read(&obd->obd_max_recoverable_clients));
                            if (clnts + obd->obd_stale_clients <
                                atomic_read(&obd->obd_max_recoverable_clients))
                                    return 0;
                    }
            
                    if (!obd->obd_abort_recov_mdt && lut->lut_tdtd != NULL) {
                            if (!lut->lut_tdtd->tdtd_replay_ready &&
                                !obd->obd_abort_recovery && !obd->obd_stopping) {
                                    /*
                                     * Let's extend recovery timer, in case the recovery
                                     * timer expired, and some clients got evicted
                                     */
                                    extend_recovery_timer(obd, obd->obd_recovery_timeout, <--- the recovery will be extended even if the timer expired
                                                          true);
                                    CDEBUG(D_HA,
                                           "%s update recovery is not ready, extend recovery %d\n",
                                           obd->obd_name, obd->obd_recovery_timeout);
                                    return 0;
                            }
                    }
            
                    return 1;
            }
            
            hongchao.zhang Hongchao Zhang added a comment - Currently, if some other MDT can't be connected during the MDT recovery, the recovery process can be extended to last several hours (maybe forever if not aborted by the lctl) in "check_for_recovery_ready" static int check_for_recovery_ready(struct lu_target *lut) { struct obd_device *obd = lut->lut_obd; unsigned int clnts = atomic_read(&obd->obd_connected_clients); CDEBUG(D_HA, "connected %d stale %d max_recoverable_clients %d abort %d expired %d\n", clnts, obd->obd_stale_clients, atomic_read(&obd->obd_max_recoverable_clients), obd->obd_abort_recovery, obd->obd_recovery_expired); if (!obd->obd_abort_recovery && !obd->obd_recovery_expired) { LASSERT(clnts <= atomic_read(&obd->obd_max_recoverable_clients)); if (clnts + obd->obd_stale_clients < atomic_read(&obd->obd_max_recoverable_clients)) return 0; } if (!obd->obd_abort_recov_mdt && lut->lut_tdtd != NULL) { if (!lut->lut_tdtd->tdtd_replay_ready && !obd->obd_abort_recovery && !obd->obd_stopping) { /* * Let's extend recovery timer, in case the recovery * timer expired, and some clients got evicted */ extend_recovery_timer(obd, obd->obd_recovery_timeout, <--- the recovery will be extended even if the timer expired true); CDEBUG(D_HA, "%s update recovery is not ready, extend recovery %d\n", obd->obd_name, obd->obd_recovery_timeout); return 0; } } return 1; }

            Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41171
            Subject: LU-14318 ldlm: add recovery time limit
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 2
            Commit: 3f8c53ec33e518658d91d7ff3d07e13f5ad43327

            hongchao.zhang Hongchao Zhang added a comment - Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41171 Subject: LU-14318 ldlm: add recovery time limit Project: fs/lustre-release Branch: master Current Patch Set: 2 Commit: 3f8c53ec33e518658d91d7ff3d07e13f5ad43327

            People

              hongchao.zhang Hongchao Zhang
              hongchao.zhang Hongchao Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: