Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11426

2/2 Olafs agree: changelog entries are emitted out of order

Details

    • 3
    • 9223372036854775807

    Description

      When storing changelogs we do not strictly order the entries with respect to index.

      int mdd_changelog_store(const struct lu_env *env, struct mdd_device *mdd,
                              struct llog_changelog_rec *rec, struct thandle *th)
      {
              ...
      
              /* llog_lvfs_write_rec sets the llog tail len */
              rec->cr_hdr.lrh_type = CHANGELOG_REC;
              rec->cr.cr_time = cl_time();
      
              spin_lock(&mdd->mdd_cl.mc_lock);
              /* NB: I suppose it's possible llog_add adds out of order wrt cr_index,
               * but as long as the MDD transactions are ordered correctly for e.g.
               * rename conflicts, I don't think this should matter. */
              rec->cr.cr_index = ++mdd->mdd_cl.mc_index;
              spin_unlock(&mdd->mdd_cl.mc_lock);
      
              ctxt = llog_get_context(obd, LLOG_CHANGELOG_ORIG_CTXT);
              if (ctxt == NULL)
                      return -ENXIO;
      
              llog_th = thandle_get_sub(env, th, ctxt->loc_handle->lgh_obj);
              if (IS_ERR(llog_th))
                      GOTO(out_put, rc = PTR_ERR(llog_th));
      
              /* nested journal transaction */
              rc = llog_add(env, ctxt->loc_handle, &rec->cr_hdr, NULL, llog_th);
      

      This may not be a bug in itself but it means that a changelog consumer must account for gaps in the sequence when clearing changelog entries.

      Attachments

        Issue Links

          Activity

            [LU-11426] 2/2 Olafs agree: changelog entries are emitted out of order

            We have also

            Lustre commit d7bb6647cd4dd26949bceb6a099cd606623aff2b ("LU-11626 mdc: hold obd while processing changelog")

            The other patch is LU-12533 but I don't know if that is more a feature or a fix so its not clear in that case.

             

            simmonsja James A Simmons added a comment - We have also Lustre commit d7bb6647cd4dd26949bceb6a099cd606623aff2b (" LU-11626 mdc: hold obd while processing changelog") The other patch is LU-12533 but I don't know if that is more a feature or a fix so its not clear in that case.  
            pjones Peter Jones added a comment -

            simmonsja could you please be more specific as to what is missing from b2_12 before the benefits of this patch could be realized?

            pjones Peter Jones added a comment - simmonsja could you please be more specific as to what is missing from b2_12 before the benefits of this patch could be realized?

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36316
            Subject: LU-11426 llog: changelog records reordering
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 2b7ea4e3cf43f18b1e1354044d024e085701946e

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36316 Subject: LU-11426 llog: changelog records reordering Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 2b7ea4e3cf43f18b1e1354044d024e085701946e

            To back port this fix will require a bunch of other change log fixes to land first. One of them is the add polling patch. Is that patch needed for 2.12 LTS?

            simmonsja James A Simmons added a comment - To back port this fix will require a bunch of other change log fixes to land first. One of them is the add polling patch. Is that patch needed for 2.12 LTS?
            pjones Peter Jones added a comment -

            Yes absolutely - a number of sites have hit this

            pjones Peter Jones added a comment - Yes absolutely - a number of sites have hit this

            Hi Peter,
            Can we consider a backport to LTS12? (2.12.4?)

            bruno.travouillon Bruno Travouillon (Inactive) added a comment - Hi Peter, Can we consider a backport to LTS12? (2.12.4?)
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36187/
            Subject: LU-11426 llog: changelog records reordering
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1fa0a984c5c3863d8f40b3b0d63c3d08cfa1a9f0

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36187/ Subject: LU-11426 llog: changelog records reordering Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1fa0a984c5c3863d8f40b3b0d63c3d08cfa1a9f0

            Andrew Perepechko (c17827@cray.com) uploaded a new patch: https://review.whamcloud.com/36187
            Subject: LU-11426 llog: changelog records reordering
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: a4611acc55049002eb6736d8fc504aab23e4a71c

            gerrit Gerrit Updater added a comment - Andrew Perepechko (c17827@cray.com) uploaded a new patch: https://review.whamcloud.com/36187 Subject: LU-11426 llog: changelog records reordering Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a4611acc55049002eb6736d8fc504aab23e4a71c

            HPE is also working on this issue with Cray, and we've been testing code that appears to improve matters. (Cray bug id may be LUS-7691)

            olaf Olaf Weber (Inactive) added a comment - HPE is also working on this issue with Cray, and we've been testing code that appears to improve matters. (Cray bug id may be LUS-7691)
            olaf Olaf Weber (Inactive) added a comment - - edited

            The DMF7 changelog processor tracks the indices of records it has read but not yet cleared. If there appears to be a gap in that sequence it will not clear beyond that hap until at least 1000 newer records have been seen. At that point it assumes that the record in the gap will never turn up, reports it as "lost", and allows clearing the log beyond the gap.

            If, according to the patch, that window should be extended from 1000 to 1024*2 = 2048, we can certainly do that, as it is a simple tunable of the program.

            However, when I disable clearing the changelog, and then read it using lfs, I can easily verify that most lost records do not show up within the next million records. In a handful of cases records show up inconsistently. When that happens, they always appear to be part of a small out-of-order sequence.

            For what it is worth, the dmf changelog processor has never, to my knowledge, encountered LU-11205.

            olaf Olaf Weber (Inactive) added a comment - - edited The DMF7 changelog processor tracks the indices of records it has read but not yet cleared. If there appears to be a gap in that sequence it will not clear beyond that hap until at least 1000 newer records have been seen. At that point it assumes that the record in the gap will never turn up, reports it as "lost", and allows clearing the log beyond the gap. If, according to the patch, that window should be extended from 1000 to 1024*2 = 2048, we can certainly do that, as it is a simple tunable of the program. However, when I disable clearing the changelog, and then read it using lfs, I can easily verify that most lost records do not show up within the next million records. In a handful of cases records show up inconsistently. When that happens, they always appear to be part of a small out-of-order sequence. For what it is worth, the dmf changelog processor has never, to my knowledge, encountered  LU-11205 .

            People

              qian_wc Qian Yingjin
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: