Details
-
Bug
-
Resolution: Fixed
-
Medium
-
None
-
None
-
3
-
9223372036854775807
Description
There were several reports of Lustre reexport via nfsv4 having a stale metadata problem that initially was determined to be lack of nanosecond timestamps support, but upon further research it looks like starting from about Linux kernel v5.5 the new i_version inode field is supposed to help even with coarse timestamp resolution.
Here's wha thte comment in linux/iversion.h says:
* The inode->i_version field: * --------------------------- * The change attribute (i_version) is mandated by NFSv4 and is mostly for * knfsd, but is also used for other purposes (e.g. IMA). The i_version must * appear larger to observers if there was an explicit change to the inode's * data or metadata since it was last queried. * * An explicit change is one that would ordinarily result in a change to the * inode status change time (aka ctime). i_version must appear to change, even * if the ctime does not (since the whole point is to avoid missing updates due * to timestamp granularity). If POSIX or other relevant spec mandates that the * ctime must change due to an operation, then the i_version counter must be * incremented as well. * Not all filesystems properly implement the i_version counter. Subsystems that * want to use i_version field on an inode should first check whether the * filesystem sets the SB_I_VERSION flag (usually via the IS_I_VERSION macro). * * Those that set SB_I_VERSION will automatically have their i_version counter * incremented on writes to normal files. If the SB_I_VERSION is not set, then * the VFS will not touch it on writes, and the filesystem can use it how it * wishes. Note that the filesystem is always responsible for updating the * i_version on namespace changes in directories (mkdir, rmdir, unlink, etc.). * We consider these sorts of filesystems to have a kernel-managed i_version. * * It may be impractical for filesystems to keep i_version updates atomic with * respect to the changes that cause them. They should, however, guarantee * that i_version updates are never visible before the changes that caused * them. Also, i_version updates should never be delayed longer than it takes * the original change to reach disk. *
With this in mind the simpliest way here is probably to set SB_I_VERSION for Lustre and then update the blocking AST to handler in llite to increase i_version for directories when an update (or update and some other bits) lock loss. Since inode normally does not go away, a re-lookup and re-obtaining the lock would find the inode and reuse it with the now increased i_version.
It's unclear how actual monotonically increasing i_version is important in practice.
Alternatively we can store i_version on disk (We already do for ldiskfs backend) and then transfer it to client, but that constitutes a protocol change for a not very clear benefit.