Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19237

NFS v4 reexport support

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Medium
    • Lustre 2.17.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      There were several reports of Lustre reexport via nfsv4 having a stale metadata problem that initially was determined to be lack of nanosecond timestamps support, but upon further research it looks like starting from about Linux kernel v5.5 the new i_version inode field is supposed to help even with coarse timestamp resolution.

       

      Here's wha thte comment in linux/iversion.h says:

       * The inode->i_version field:
       * ---------------------------
       * The change attribute (i_version) is mandated by NFSv4 and is mostly for
       * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
       * appear larger to observers if there was an explicit change to the inode's
       * data or metadata since it was last queried.
       *
       * An explicit change is one that would ordinarily result in a change to the
       * inode status change time (aka ctime). i_version must appear to change, even
       * if the ctime does not (since the whole point is to avoid missing updates due
       * to timestamp granularity). If POSIX or other relevant spec mandates that the
       * ctime must change due to an operation, then the i_version counter must be
       * incremented as well.
      
       * Not all filesystems properly implement the i_version counter. Subsystems that
       * want to use i_version field on an inode should first check whether the
       * filesystem sets the SB_I_VERSION flag (usually via the IS_I_VERSION macro).
       *
       * Those that set SB_I_VERSION will automatically have their i_version counter
       * incremented on writes to normal files. If the SB_I_VERSION is not set, then
       * the VFS will not touch it on writes, and the filesystem can use it how it
       * wishes. Note that the filesystem is always responsible for updating the
       * i_version on namespace changes in directories (mkdir, rmdir, unlink, etc.).
       * We consider these sorts of filesystems to have a kernel-managed i_version.
       *
       * It may be impractical for filesystems to keep i_version updates atomic with
       * respect to the changes that cause them.  They should, however, guarantee
       * that i_version updates are never visible before the changes that caused
       * them.  Also, i_version updates should never be delayed longer than it takes
       * the original change to reach disk.
       * 

       

      With this in mind the simpliest way here is probably to set SB_I_VERSION for Lustre and then update the blocking AST to handler in llite to increase i_version for directories when an update (or update and some other bits) lock loss. Since inode normally does not go away, a re-lookup and re-obtaining the lock would find the inode and reuse it with the now increased i_version.

      It's unclear how actual monotonically increasing i_version is important in practice.

       

      Alternatively we can store i_version on disk (We already do for ldiskfs backend) and then transfer it to client, but that constitutes a protocol change for a not very clear benefit.

      Attachments

        Activity

          People

            green Oleg Drokin
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: