Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Medium
Fix Version/s: Lustre 2.17.0
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

There were several reports of Lustre reexport via nfsv4 having a stale metadata problem that initially was determined to be lack of nanosecond timestamps support, but upon further research it looks like starting from about Linux kernel v5.5 the new i_version inode field is supposed to help even with coarse timestamp resolution.

Here's wha thte comment in linux/iversion.h says:

 * The inode->i_version field:
 * ---------------------------
 * The change attribute (i_version) is mandated by NFSv4 and is mostly for
 * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
 * appear larger to observers if there was an explicit change to the inode's
 * data or metadata since it was last queried.
 *
 * An explicit change is one that would ordinarily result in a change to the
 * inode status change time (aka ctime). i_version must appear to change, even
 * if the ctime does not (since the whole point is to avoid missing updates due
 * to timestamp granularity). If POSIX or other relevant spec mandates that the
 * ctime must change due to an operation, then the i_version counter must be
 * incremented as well.

 * Not all filesystems properly implement the i_version counter. Subsystems that
 * want to use i_version field on an inode should first check whether the
 * filesystem sets the SB_I_VERSION flag (usually via the IS_I_VERSION macro).
 *
 * Those that set SB_I_VERSION will automatically have their i_version counter
 * incremented on writes to normal files. If the SB_I_VERSION is not set, then
 * the VFS will not touch it on writes, and the filesystem can use it how it
 * wishes. Note that the filesystem is always responsible for updating the
 * i_version on namespace changes in directories (mkdir, rmdir, unlink, etc.).
 * We consider these sorts of filesystems to have a kernel-managed i_version.
 *
 * It may be impractical for filesystems to keep i_version updates atomic with
 * respect to the changes that cause them.  They should, however, guarantee
 * that i_version updates are never visible before the changes that caused
 * them.  Also, i_version updates should never be delayed longer than it takes
 * the original change to reach disk.
 *

With this in mind the simpliest way here is probably to set SB_I_VERSION for Lustre and then update the blocking AST to handler in llite to increase i_version for directories when an update (or update and some other bits) lock loss. Since inode normally does not go away, a re-lookup and re-obtaining the lock would find the inode and reuse it with the now increased i_version.

It's unclear how actual monotonically increasing i_version is important in practice.

Alternatively we can store i_version on disk (We already do for ldiskfs backend) and then transfer it to client, but that constitutes a protocol change for a not very clear benefit.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

test.c
3 kB
05/Aug/25 9:56 PM

Activity

People

Assignee:: Oleg Drokin

Reporter:: Oleg Drokin

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 05/Aug/25 9:55 PM

Updated:: 04/Sep/25 1:11 PM

Resolved:: 28/Aug/25 12:53 PM