Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
The changelog garbage collection enabled by LU-12871 is too lazy. It will only purge an idle changelog user and its records if the changelog itself is nearly full:
if (likely(mdd->mdd_changelog_gc && mdd->mdd_cl.mc_gc_task == MDD_CHLG_GC_NONE && ktime_get_real_seconds() - mdd->mdd_cl.mc_gc_time > mdd->mdd_changelog_min_gc_interval)) { if (unlikely(llog_cat_free_space(ctxt->loc_handle) <= mdd->mdd_changelog_min_free_cat_entries || OBD_FAIL_CHECK(OBD_FAIL_FORCE_GC_THREAD))) { CWARN("%s:%s low on changelog_catalog free entries, " "starting ChangeLog garbage collection thread\n", obd->obd_name, OBD_FAIL_CHECK(OBD_FAIL_FORCE_GC_THREAD) ? " simulate" : "");
The default mdd_changelog_min_free_cat_entries=2 and mdd_changelog_min_gc_interval=3600 so it will only check every hour if the changelog is within 2x65000 = 130000 entries of overflowing (out of ~4B entries), even if the changelog has been idle for weeks (with reduced settings, just to verify it is not evicted):
# lctl get_param mdd.*.changelog* | mdd.myth-MDT0000.changelog_deniednext=60 | mdd.myth-MDT0000.changelog_gc=1 mdd.myth-MDT0000.changelog_max_idle_indexes=20800000 mdd.myth-MDT0000.changelog_max_idle_time=2500000 mdd.myth-MDT0000.changelog_min_free_cat_entries=2 mdd.myth-MDT0000.changelog_min_gc_interval=3600 mdd.myth-MDT0000.changelog_size=3857464008 mdd.myth-MDT0000.changelog_mask= MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO CLOSE LYOUT TRUNC SAT TR XATTR HSM MTIME CTIME MIGRT FLRW RESYNC mdd.myth-MDT0000.changelog_users= current index: 98130425 ID index (idle seconds) cl3 77315666 (2512315)
It would be better to evict idle changelog users after a week or two, which is plenty of time to get a broken consumer working again, even if the log isn't totally full.
Attachments
Issue Links
- is related to
-
LU-14626 Idle Changelog user not deregistered
-
- Resolved
-
-
LU-17290 Don't deregister idle changelog consumers
-
- Open
-
-
LU-13772 mdt: changelog_deregister takes too long
-
- Resolved
-
-
LU-15524 initiate changelog GC by lack of free space
-
- Resolved
-
- is related to
-
LU-12871 enable changelog garbage collection by default
-
- Resolved
-
-
LU-14688 Changelog cancel improvement
-
- Resolved
-
-
LU-13055 add ability for named Changelog consumers
-
- Closed
-
I realize this ticket is closed, but this seems an appropriate place to ask:
Deregistration of an idle consumer is a heavy penalty, requiring a user to re-register and restart their consumer process with a new ID. Wouldn't it make more sense to mark this consumer internally as "idle" and simply ignore it during the lowest-unconsumed-record check? Then if consumer does come back to life, it still has access the (remaining) changelog records (and we remove the "idle" flag). Less impact on users for an intermittent consumer. Idle consumers can be reported in changelog_users. And @Mikhail_Pershin's concern about evicting an apparently idle consumer on an idle system.