[LU-10902] mdd_changelog_user_purge() must check current number of users safer and earlier to disable ChangeLogs recording if none Created: 11/Apr/18  Updated: 06/May/18  Resolved: 06/May/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Minor
Reporter: Bruno Faccini (Inactive) Assignee: Bruno Faccini (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

During my analysis of a situation where a ChangeLogs user's purge was taking a very long time (due to single user never being unregistered and idle since ages), I have found that the current code (still true in master) in mdd_changelog_user_purge() only disable ChangeLogs recording when the last user's purge has been synchronously done, meaning that more records will unnecessary continue to be recorded (and compete adding overhead) until the full purge completes :

int mdd_changelog_user_purge(const struct lu_env *env,
                             struct mdd_device *mdd, __u32 id)
{
        struct mdd_changelog_user_purge mcup = {
                .mcup_id = id,
                .mcup_found = false,
                .mcup_usercount = 0,
                .mcup_minrec = ULLONG_MAX,
        };
        struct llog_ctxt *ctxt;
        int rc;

        ENTRY;

        CDEBUG(D_IOCTL, "%s: Purge request: id=%u\n",
               mdd2obd_dev(mdd)->obd_name, id);

        ctxt = llog_get_context(mdd2obd_dev(mdd),
                                LLOG_CHANGELOG_USER_ORIG_CTXT);
        if (ctxt == NULL ||
            (ctxt->loc_handle->lgh_hdr->llh_flags & LLOG_F_IS_CAT) == 0)
                GOTO(out, rc = -ENXIO);

        rc = llog_cat_process(env, ctxt->loc_handle,
                              mdd_changelog_user_purge_cb, &mcup,
                              0, 0);

        if ((rc == 0) && mcup.mcup_found) {
                CDEBUG(D_IOCTL, "%s: Purging changelog entries for user %d "
                       "record=%llu\n",
                       mdd2obd_dev(mdd)->obd_name, id, mcup.mcup_minrec);
                /* Cancelling record 0 destroys the entire changelog, make sure
                   we don't do that unless we mean it. */
                if (mcup.mcup_minrec != 0 || mcup.mcup_usercount == 0) {
                        rc = mdd_changelog_llog_cancel(env, mdd, <<<<<< sync, and can take a long 
                                                       mcup.mcup_minrec); <<<<<<<<< time if huge backlog
                }
        } else {
                CWARN("%s: No changelog for user %u; rc=%d\n",
                      mdd2obd_dev(mdd)->obd_name, id, rc);
                GOTO(out, rc = -ENOENT);
        }

        if ((rc == 0) && (mcup.mcup_usercount == 0)) {
                /* No more users; turn changelogs off */
                CDEBUG(D_IOCTL, "turning off changelogs\n");
                rc = mdd_changelog_off(env, mdd); <<<<<<<<< ChangeLog recording being stopped
        }

        EXIT;
out:
        if (ctxt != NULL)
                llog_ctxt_put(ctxt);

        return rc;
}

Also, ChangeLog recording is being stopped without any regard nor protection against the fact a new user may have registered in the interval, leading to a situation where a user can be registered but not ChangeLog recording will happen.

Will push a patch soon to fix both problems.



 Comments   
Comment by Gerrit Updater [ 16/Apr/18 ]

Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: https://review.whamcloud.com/32007
Subject: LU-10902 mdd: force disable changelogs early and safely
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6ccf8c1cbafcfdc27ad1285b6f6f260bfc08b05e

Comment by Bruno Faccini (Inactive) [ 19/Apr/18 ]

A use case leading to such scenario could occur if an old and unused changelog user is being unregistered and a new one is being registerd (like for a RobinHood operations restart from scratch including full-scan and start of Changelogs gather in parallel) but previous purge has still not completed due to some huge records back-log.

Comment by Gerrit Updater [ 06/May/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/32007/
Subject: LU-10902 mdd: force disable changelogs early and safely
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 99cc32b013d411335a694c5f94a83cef839c68c4

Comment by Peter Jones [ 06/May/18 ]

Landed for 2.12

Generated at Sat Feb 10 02:39:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.