[LU-9512] Implement O_TMPFILE for Lustre Created: 16/May/17  Updated: 05/May/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Upstream, Lustre 2.11.0
Fix Version/s: None

Type: New Feature Priority: Major
Reporter: Andreas Dilger Assignee: Arshad Hussain
Resolution: Unresolved Votes: 0
Labels: lug23dd, medium

Rank (Obsolete): 9223372036854775807

 Description   

We should implement O_TMPFILE for Lustre. We already have a similar interface for creating volatile files for lfs_migrate(). This should be hooked into the VFS O_TMPFILE mechanism for applications to use.

From the open(2) man page in RHEL8.5:

       O_TMPFILE (since Linux 3.11)
              Create an unnamed temporary regular file.  The pathname argument
              specifies a directory; an unnamed inode will be created in that
              directory's filesystem.  Anything written to the resulting file
              will be lost when the last file descriptor is closed, unless the
              file is given a name.

              O_TMPFILE must be specified with one of O_RDWR or O_WRONLY  and,
              optionally, O_EXCL.  If O_EXCL is not specified, then linkat(2)
              can be used to link the temporary file into the filesystem,
              making it permanent, using code like the following:

                  char path[PATH_MAX];
                  fd = open("/path/to/dir", O_TMPFILE | O_RDWR,
                                            S_IRUSR | S_IWUSR);

                  /* File I/O on 'fd'... */

                  snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd);
                  linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file",
                         AT_SYMLINK_FOLLOW);

              In this case, the open() mode argument determines the file
              permission mode, as with O_CREAT.

              Specifying O_EXCL in conjunction with O_TMPFILE prevents a temporary
              file from being linked into the filesystem in the above manner.
              (Note that the meaning of O_EXCL in this case is different from
              the meaning of O_EXCL otherwise.)

              There are two main use cases for O_TMPFILE:

              *  Improved tmpfile(3) functionality: race-free creation of
                 temporary files that (1) are automatically deleted when closed;
                 (2) can never be reached via any pathname; (3) are not subject
                 to symlink attacks; and (4) do not require the caller to devise
                 unique names.

              *  Creating a file that is initially invisible, which is then
                 populated with data and adjusted to have appropriate filesystem
                 attributes (fchown(2), fchmod(2), fsetxattr(2), etc.) before
                 being atomically linked into the filesystem in a fully formed
                 state (using linkat(2) as described above).

              O_TMPFILE requires support by the underlying filesystem; only a
              subset of Linux filesystems provide that support.  In the initial
              implementation, support was provided in the ext2, ext3, ext4,
              UDF, Minix, and shmem filesystems.  Support for other filesystems
              has subsequently been added as follows: XFS (Linux 3.15); Btrfs
              (Linux 3.16); F2FS (Linux 3.16); and ubifs (Linux 4.9)


 Comments   
Comment by Nathan Rutman [ 14/Mar/19 ]

+1

The MDS could give each client some number of FIDs to use for these temp files, along with algorithmic layouts, to move the MDS entirely out of the create path

Comment by Andreas Dilger [ 15/Mar/19 ]

Each client already has a preallocated range of FIDs (two SEQ values, which they typically each use for 128k creates). The real problem is that the client can't "open" a file without contacting the MDS, since it won't have an inode allocated, and if a client allocated lots of O_TMPFILE files and then tried to write them later it may run out of space.

Doing something like Oleg's Write Back Cache would allow the inode to be opened in the client RAM.

Comment by Nathan Rutman [ 08/Sep/22 ]

Just found this again. This sure is an ugly hack

		rc = snprintf(volatile_file, sizeof(volatile_file),
			      "%s/%s:%.4X:%.4X:fd=%.2d", parent,
			      LUSTRE_VOLATILE_HDR, mdt_index,
			      random_value, fd);
Comment by Andreas Dilger [ 08/Sep/22 ]

Definitely a hack:

/* lustre volatile file support
 * file name header: .^L^S^T^R:volatile"
 */
#define LUSTRE_VOLATILE_HDR    ".\x0c\x13\x14\x12:VOLATILE"

but still much less bad than "silly rename" for NFS.

I don't think it would be hard to internally map files created with the VFS O_TMPFILE onto LUSTRE_VOLATILE_HDR. It would lose the ability of directly selecting which MDT the file was created on (so quite sub-optimal for striped directories if the temp file is linked into the directory later), but it should be OK for normal usage since a "pathname" (parent directory) is specified and this could be used to determine the parent MDT.

Comment by Andreas Dilger [ 01/May/23 ]

Assign to Arshad after discussion at LUG'23 Developer Day.

Comment by Gerrit Updater [ 05/May/23 ]

"Arshad Hussain <arshad.hussain@aeoncomputing.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50871
Subject: LU-9512 utils: O_TMPFILE support
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 49dca55a6587d11872794c9f5b3605122b37b713

Generated at Sat Feb 10 02:26:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.