[LU-9512] Implement O_TMPFILE for Lustre Created: 16/May/17 Updated: 05/May/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Upstream, Lustre 2.11.0 |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major |
| Reporter: | Andreas Dilger | Assignee: | Arshad Hussain |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | lug23dd, medium | ||
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
We should implement O_TMPFILE for Lustre. We already have a similar interface for creating volatile files for lfs_migrate(). This should be hooked into the VFS O_TMPFILE mechanism for applications to use. From the open(2) man page in RHEL8.5: O_TMPFILE (since Linux 3.11)
Create an unnamed temporary regular file. The pathname argument
specifies a directory; an unnamed inode will be created in that
directory's filesystem. Anything written to the resulting file
will be lost when the last file descriptor is closed, unless the
file is given a name.
O_TMPFILE must be specified with one of O_RDWR or O_WRONLY and,
optionally, O_EXCL. If O_EXCL is not specified, then linkat(2)
can be used to link the temporary file into the filesystem,
making it permanent, using code like the following:
char path[PATH_MAX];
fd = open("/path/to/dir", O_TMPFILE | O_RDWR,
S_IRUSR | S_IWUSR);
/* File I/O on 'fd'... */
snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd);
linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file",
AT_SYMLINK_FOLLOW);
In this case, the open() mode argument determines the file
permission mode, as with O_CREAT.
Specifying O_EXCL in conjunction with O_TMPFILE prevents a temporary
file from being linked into the filesystem in the above manner.
(Note that the meaning of O_EXCL in this case is different from
the meaning of O_EXCL otherwise.)
There are two main use cases for O_TMPFILE:
* Improved tmpfile(3) functionality: race-free creation of
temporary files that (1) are automatically deleted when closed;
(2) can never be reached via any pathname; (3) are not subject
to symlink attacks; and (4) do not require the caller to devise
unique names.
* Creating a file that is initially invisible, which is then
populated with data and adjusted to have appropriate filesystem
attributes (fchown(2), fchmod(2), fsetxattr(2), etc.) before
being atomically linked into the filesystem in a fully formed
state (using linkat(2) as described above).
O_TMPFILE requires support by the underlying filesystem; only a
subset of Linux filesystems provide that support. In the initial
implementation, support was provided in the ext2, ext3, ext4,
UDF, Minix, and shmem filesystems. Support for other filesystems
has subsequently been added as follows: XFS (Linux 3.15); Btrfs
(Linux 3.16); F2FS (Linux 3.16); and ubifs (Linux 4.9)
|
| Comments |
| Comment by Nathan Rutman [ 14/Mar/19 ] |
|
+1 The MDS could give each client some number of FIDs to use for these temp files, along with algorithmic layouts, to move the MDS entirely out of the create path |
| Comment by Andreas Dilger [ 15/Mar/19 ] |
|
Each client already has a preallocated range of FIDs (two SEQ values, which they typically each use for 128k creates). The real problem is that the client can't "open" a file without contacting the MDS, since it won't have an inode allocated, and if a client allocated lots of O_TMPFILE files and then tried to write them later it may run out of space. Doing something like Oleg's Write Back Cache would allow the inode to be opened in the client RAM. |
| Comment by Nathan Rutman [ 08/Sep/22 ] |
|
Just found this again. This sure is an ugly hack
rc = snprintf(volatile_file, sizeof(volatile_file),
"%s/%s:%.4X:%.4X:fd=%.2d", parent,
LUSTRE_VOLATILE_HDR, mdt_index,
random_value, fd);
|
| Comment by Andreas Dilger [ 08/Sep/22 ] |
|
Definitely a hack: /* lustre volatile file support * file name header: .^L^S^T^R:volatile" */ #define LUSTRE_VOLATILE_HDR ".\x0c\x13\x14\x12:VOLATILE" but still much less bad than "silly rename" for NFS. I don't think it would be hard to internally map files created with the VFS O_TMPFILE onto LUSTRE_VOLATILE_HDR. It would lose the ability of directly selecting which MDT the file was created on (so quite sub-optimal for striped directories if the temp file is linked into the directory later), but it should be OK for normal usage since a "pathname" (parent directory) is specified and this could be used to determine the parent MDT. |
| Comment by Andreas Dilger [ 01/May/23 ] |
|
Assign to Arshad after discussion at LUG'23 Developer Day. |
| Comment by Gerrit Updater [ 05/May/23 ] |
|
"Arshad Hussain <arshad.hussain@aeoncomputing.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50871 |