[LU-14318] Add the option to limit the overall recovery time Created: 11/Jan/21  Updated: 14/Sep/21

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Hongchao Zhang Assignee: Hongchao Zhang
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-13608 MDT stuck in WAITING, abort_recov stu... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Currently, the recovery time could be extended to several hours if the recovery
between MDT is stuck or some other reasons, which can cause the cluster to be
unavailable for long time if no one is noticed and use "lctl" to abort the recovery,
then there is some option to be needed to add to limit the overall recovery time.



 Comments   
Comment by Hongchao Zhang [ 14/Jan/21 ]

Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41171
Subject: LU-14318 ldlm: add recovery time limit
Project: fs/lustre-release
Branch: master
Current Patch Set: 2
Commit: 3f8c53ec33e518658d91d7ff3d07e13f5ad43327

Comment by Hongchao Zhang [ 18/Jan/21 ]

Currently, if some other MDT can't be connected during the MDT recovery, the recovery process can be extended to
last several hours (maybe forever if not aborted by the lctl) in "check_for_recovery_ready"

static int check_for_recovery_ready(struct lu_target *lut)
{
        struct obd_device *obd = lut->lut_obd;
        unsigned int clnts = atomic_read(&obd->obd_connected_clients);

        CDEBUG(D_HA,
               "connected %d stale %d max_recoverable_clients %d abort %d expired %d\n",
               clnts, obd->obd_stale_clients,
               atomic_read(&obd->obd_max_recoverable_clients),
               obd->obd_abort_recovery, obd->obd_recovery_expired);

        if (!obd->obd_abort_recovery && !obd->obd_recovery_expired) {
                LASSERT(clnts <=
                        atomic_read(&obd->obd_max_recoverable_clients));
                if (clnts + obd->obd_stale_clients <
                    atomic_read(&obd->obd_max_recoverable_clients))
                        return 0;
        }

        if (!obd->obd_abort_recov_mdt && lut->lut_tdtd != NULL) {
                if (!lut->lut_tdtd->tdtd_replay_ready &&
                    !obd->obd_abort_recovery && !obd->obd_stopping) {
                        /*
                         * Let's extend recovery timer, in case the recovery
                         * timer expired, and some clients got evicted
                         */
                        extend_recovery_timer(obd, obd->obd_recovery_timeout, <--- the recovery will be extended even if the timer expired
                                              true);
                        CDEBUG(D_HA,
                               "%s update recovery is not ready, extend recovery %d\n",
                               obd->obd_name, obd->obd_recovery_timeout);
                        return 0;
                }
        }

        return 1;
}
Comment by Andreas Dilger [ 29/Jan/21 ]

Hongchao, is this patch only needed for the case when the remote MDT is not available at all, or is there some other problem like LU-13608 that is causing recovery to be stuck for a long time even when all of the MDTs are available?

Rather than timing out recovery for a remote MDT completely, it would probably be better to keep the recovery for that MDT pending until the MDT is available again, and then do the remote recovery when the MDTs reconnect. That might only be a small (or no) difference in the normal case when all of the MDTs are available at mount, but I think this may give a very important improvement when some MDTs are unavailable.

The big improvement would be if MDT0000 and other MDTs are available at restart time, it would complete recovery with all those MDTs quickly, and not block access to files/directories that are on available MDTs. It would allow most client access to work, and only remote/striped directories would be blocked and/or time out (allow CTRL-C for client processes). This would be better for users, if they are mostly using remote directories for subtrees of the filesystem, since only the subtrees on the missing MDTs would be inaccessible.

Eventually, having mirrored entries for ROOT/ on several/all MDTs could allow the filesystem to be accessible even if MDT0000 is unavailable, but that is definitely a separate project.

Comment by Hongchao Zhang [ 01/Feb/21 ]

HI,
this patch is meant to solve the recovery hung caused by some unresponsive MDT during recovery, the patch in LU-13608 is meant to fix
the "lctl abort_recovery" hung when it was used to abort the recovery.

Okay, I will create another patch to allow the recovery to continue if some of MDTs other than MDT0000 is unavailable during recovery.
Thanks!

Comment by Gerrit Updater [ 05/Feb/21 ]

Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41424
Subject: LU-14318 ldlm: don't wait other MDT forever
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4c2b2b26bd29b1a8f522fbf705a3d145cb06eb9f

Generated at Sat Feb 10 03:08:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.