Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14699

changelog garbage collection is too lax

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      The changelog garbage collection enabled by LU-12871 is too lazy. It will only purge an idle changelog user and its records if the changelog itself is nearly full:

              if (likely(mdd->mdd_changelog_gc &&
                           mdd->mdd_cl.mc_gc_task == MDD_CHLG_GC_NONE &&
                           ktime_get_real_seconds() - mdd->mdd_cl.mc_gc_time >
                              mdd->mdd_changelog_min_gc_interval)) {
                      if (unlikely(llog_cat_free_space(ctxt->loc_handle) <=
                                   mdd->mdd_changelog_min_free_cat_entries ||
                                   OBD_FAIL_CHECK(OBD_FAIL_FORCE_GC_THREAD))) {
                              CWARN("%s:%s low on changelog_catalog free entries, "
                                    "starting ChangeLog garbage collection thread\n",
                                    obd->obd_name,
                                    OBD_FAIL_CHECK(OBD_FAIL_FORCE_GC_THREAD) ?
                                      " simulate" : "");
      

      The default mdd_changelog_min_free_cat_entries=2 and mdd_changelog_min_gc_interval=3600 so it will only check every hour if the changelog is within 2x65000 = 130000 entries of overflowing (out of ~4B entries), even if the changelog has been idle for weeks (with reduced settings, just to verify it is not evicted):

      # lctl get_param mdd.*.changelog*                       |
      mdd.myth-MDT0000.changelog_deniednext=60                                        |
      mdd.myth-MDT0000.changelog_gc=1                                                 
      mdd.myth-MDT0000.changelog_max_idle_indexes=20800000                            
      mdd.myth-MDT0000.changelog_max_idle_time=2500000                                
      mdd.myth-MDT0000.changelog_min_free_cat_entries=2                               
      mdd.myth-MDT0000.changelog_min_gc_interval=3600                                 
      mdd.myth-MDT0000.changelog_size=3857464008                                      
      mdd.myth-MDT0000.changelog_mask=                                                
      MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO CLOSE LYOUT TRUNC SAT
      TR XATTR HSM MTIME CTIME MIGRT FLRW RESYNC                                      
      mdd.myth-MDT0000.changelog_users=
      current index: 98130425
      ID    index (idle seconds)
      cl3   77315666 (2512315)
      

      It would be better to evict idle changelog users after a week or two, which is plenty of time to get a broken consumer working again, even if the log isn't totally full.

      Attachments

        Issue Links

          Activity

            People

              tappro Mikhail Pershin
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: