Details
-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
Due to a complex kernel and Lustre interaction, NFS exports of Lustre using the kernel NFS server will sometimes transiently incorrectly report 0 size files for files which have data in them. This can cause application confusion due to the unexpected EOF. This only occurs with heavy memory pressure on the Lustre client/NFS server.
Details:
There is a conflict between how Lustre clears pages and how splice as used by the kernel NFS server handles read operations. When a page is flushed from Lustre either by lock cancellation or due to memory pressure, Lustre must mark the page as not uptodate, or stale data can be read. (LU-14541, LU-16160)
This means Lustre will sometimes return pages marked NOT uptodate in read() and filemap_fault(). For these two cases, we detect this and retry the read inside Lustre, so the read completes normally.
However, splice() is unusual in that it calls in to Lustre, but then makes a separate page validity check (confirm()) after the call - read() and filemap_fault() do not perform a separate check. This allows the file system to control the implementation and handle if a page is not uptodate. Splice is unusual in having a separate check which the file system can’t control. This is the root of the problem.
When splice confirm() sees a page which is !uptodate, it decides the page has been truncated from the file. If this is the first page, NFS decides the file is zero length.
This can result in files transiently showing as zero length.
This can occur when Lustre is exported using the kernel NFS server and there is significant memory pressure on the NFS server (which is the Lustre client) and a mixed read/write workload from the NFS client.
There are no other major users of splice(), so it's likely NFS is the only case vulnerable to this. In theory splice on a pipe could hit this as well, but it would generate an error rather than conclude the file was empty.
We're working on a fix for this currently.