Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7427

DNE3: multiple entries for BATCHID

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.8.0
    • 9223372036854775807

    Description

      In current DNE implementation (2.8.0), the DNE update records will be cancelled only if

      1. all of updates of this operation have been committed disk.
      2. all of operation with smaller batchid has been committed. And BATCHID has been updated.

      If one operation fails or stucks somewhere, then all of update logs of the following operation will not be cancelled even all of its updates have been committed, which will cause a very long recovery time, because it needs to retrieve all of update log for recovery, which is observed in DNE failover soak-test.

      So we can have multiple entries for batchid, i.e. records multiple batchids in BATCHID file, so even if one operation is stucked, it can still update the batchid, until all of entries are used. (similar as multiple last rcvd entry)

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              di.wang Di Wang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: