Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
The changelog garbage collection enabled by LU-12871 is too lazy. It will only purge an idle changelog user and its records if the changelog itself is nearly full:
if (likely(mdd->mdd_changelog_gc && mdd->mdd_cl.mc_gc_task == MDD_CHLG_GC_NONE && ktime_get_real_seconds() - mdd->mdd_cl.mc_gc_time > mdd->mdd_changelog_min_gc_interval)) { if (unlikely(llog_cat_free_space(ctxt->loc_handle) <= mdd->mdd_changelog_min_free_cat_entries || OBD_FAIL_CHECK(OBD_FAIL_FORCE_GC_THREAD))) { CWARN("%s:%s low on changelog_catalog free entries, " "starting ChangeLog garbage collection thread\n", obd->obd_name, OBD_FAIL_CHECK(OBD_FAIL_FORCE_GC_THREAD) ? " simulate" : "");
The default mdd_changelog_min_free_cat_entries=2 and mdd_changelog_min_gc_interval=3600 so it will only check every hour if the changelog is within 2x65000 = 130000 entries of overflowing (out of ~4B entries), even if the changelog has been idle for weeks (with reduced settings, just to verify it is not evicted):
# lctl get_param mdd.*.changelog* | mdd.myth-MDT0000.changelog_deniednext=60 | mdd.myth-MDT0000.changelog_gc=1 mdd.myth-MDT0000.changelog_max_idle_indexes=20800000 mdd.myth-MDT0000.changelog_max_idle_time=2500000 mdd.myth-MDT0000.changelog_min_free_cat_entries=2 mdd.myth-MDT0000.changelog_min_gc_interval=3600 mdd.myth-MDT0000.changelog_size=3857464008 mdd.myth-MDT0000.changelog_mask= MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO CLOSE LYOUT TRUNC SAT TR XATTR HSM MTIME CTIME MIGRT FLRW RESYNC mdd.myth-MDT0000.changelog_users= current index: 98130425 ID index (idle seconds) cl3 77315666 (2512315)
It would be better to evict idle changelog users after a week or two, which is plenty of time to get a broken consumer working again, even if the log isn't totally full.
Attachments
Issue Links
- is related to
-
LU-14626 Idle Changelog user not deregistered
- Resolved
-
LU-17290 Don't deregister idle changelog consumers
- Open
-
LU-13772 mdt: changelog_deregister takes too long
- Resolved
-
LU-15524 initiate changelog GC by lack of free space
- Resolved
- is related to
-
LU-12871 enable changelog garbage collection by default
- Resolved
-
LU-14688 Changelog cancel improvement
- Resolved
-
LU-13055 add ability for named Changelog consumers
- Closed