Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14678

ldiskfs fast commit feature

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.15.0
    • 9223372036854775807

    Description

      The ext4 "fast commit" feature allows some kinds of operations on inodes to be committed to a per-inode journal area in order to increase concurrency of inode updates, similar to the ZIL/SLOG in ZFS.

      This has implications to the Lustre recovery protocol, since it may be that transactions commit out-of-order on some inode updates, and this could be problematic if e.g. inode versions (storing the Lustre transno) are updated while the main transaction is lost, or if these updates are fast-journaled on the inodes in a different order between two distinct transactions. The fast journal commits for inodes are dropped once the main transaction has been committed, so this is a transient state. However, since recovery has a very large number of potential states, and happens rarely, it would be possible for data corrupting bugs to exist in this code without being found by simple testing.

      Since the fast commit feature has the potential to increase concurrency, and (in theory) updates to a single inode should be serialized by DLM locking, we should be able to use this feature to Lustre's advantage, but given the complexity seen previously with the ZFS ZIL (LU-4009, still open) this feature should be refused by mkfs.lustre and ldiskfs mount until such a time it is known to be safe (possibly with a "force-fast-commit" mount option analogous to "force-over-XXX" to limit exposure to bugs in huge filesystems).

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: