Details

    • New Feature
    • Resolution: Unresolved
    • Major
    • None
    • Upstream, Lustre 2.11.0
    • 9223372036854775807

    Description

      We should implement O_TMPFILE for Lustre. We already have a similar interface for creating volatile files for lfs_migrate(). This should be hooked into the VFS O_TMPFILE mechanism for applications to use.

      From the open(2) man page in RHEL8.5:

             O_TMPFILE (since Linux 3.11)
                    Create an unnamed temporary regular file.  The pathname argument
                    specifies a directory; an unnamed inode will be created in that
                    directory's filesystem.  Anything written to the resulting file
                    will be lost when the last file descriptor is closed, unless the
                    file is given a name.
      
                    O_TMPFILE must be specified with one of O_RDWR or O_WRONLY  and,
                    optionally, O_EXCL.  If O_EXCL is not specified, then linkat(2)
                    can be used to link the temporary file into the filesystem,
                    making it permanent, using code like the following:
      
                        char path[PATH_MAX];
                        fd = open("/path/to/dir", O_TMPFILE | O_RDWR,
                                                  S_IRUSR | S_IWUSR);
      
                        /* File I/O on 'fd'... */
      
                        snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd);
                        linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file",
                               AT_SYMLINK_FOLLOW);
      
                    In this case, the open() mode argument determines the file
                    permission mode, as with O_CREAT.
      
                    Specifying O_EXCL in conjunction with O_TMPFILE prevents a temporary
                    file from being linked into the filesystem in the above manner.
                    (Note that the meaning of O_EXCL in this case is different from
                    the meaning of O_EXCL otherwise.)
      
                    There are two main use cases for O_TMPFILE:
      
                    *  Improved tmpfile(3) functionality: race-free creation of
                       temporary files that (1) are automatically deleted when closed;
                       (2) can never be reached via any pathname; (3) are not subject
                       to symlink attacks; and (4) do not require the caller to devise
                       unique names.
      
                    *  Creating a file that is initially invisible, which is then
                       populated with data and adjusted to have appropriate filesystem
                       attributes (fchown(2), fchmod(2), fsetxattr(2), etc.) before
                       being atomically linked into the filesystem in a fully formed
                       state (using linkat(2) as described above).
      
                    O_TMPFILE requires support by the underlying filesystem; only a
                    subset of Linux filesystems provide that support.  In the initial
                    implementation, support was provided in the ext2, ext3, ext4,
                    UDF, Minix, and shmem filesystems.  Support for other filesystems
                    has subsequently been added as follows: XFS (Linux 3.15); Btrfs
                    (Linux 3.16); F2FS (Linux 3.16); and ubifs (Linux 4.9)
      

      Attachments

        Issue Links

          Activity

            [LU-9512] Implement O_TMPFILE for Lustre

            I filed LU-18844 to track adding the I_LINKABLE flag on Lustre volatile files. That may be an easier (and more useful) way to integrate Lustre volatile files with the ability to link the files into the namespace after use. O_TMPFILE is definitely still useful, but for Lustre-specific tools that want to optimize MDT allocation it is still better to create a volatile file on a specific MDT.

            adilger Andreas Dilger added a comment - I filed LU-18844 to track adding the I_LINKABLE flag on Lustre volatile files. That may be an easier (and more useful) way to integrate Lustre volatile files with the ability to link the files into the namespace after use. O_TMPFILE is definitely still useful, but for Lustre-specific tools that want to optimize MDT allocation it is still better to create a volatile file on a specific MDT.

            "Arshad Hussain <arshad.hussain@aeoncomputing.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50871
            Subject: LU-9512 utils: O_TMPFILE support
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 49dca55a6587d11872794c9f5b3605122b37b713

            gerrit Gerrit Updater added a comment - "Arshad Hussain <arshad.hussain@aeoncomputing.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50871 Subject: LU-9512 utils: O_TMPFILE support Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 49dca55a6587d11872794c9f5b3605122b37b713

            Assign to Arshad after discussion at LUG'23 Developer Day.

            adilger Andreas Dilger added a comment - Assign to Arshad after discussion at LUG'23 Developer Day.
            adilger Andreas Dilger added a comment - - edited

            Definitely a hack:

            /* lustre volatile file support
             * file name header: .^L^S^T^R:volatile"
             */
            #define LUSTRE_VOLATILE_HDR    ".\x0c\x13\x14\x12:VOLATILE"
            

            but still much less bad than "silly rename" for NFS.

            I don't think it would be hard to internally map files created with the VFS O_TMPFILE onto LUSTRE_VOLATILE_HDR. It would lose the ability of directly selecting which MDT the file was created on (so quite sub-optimal for striped directories if the temp file is linked into the directory later), but it should be OK for normal usage since a "pathname" (parent directory) is specified and this could be used to determine the parent MDT.

            adilger Andreas Dilger added a comment - - edited Definitely a hack: /* lustre volatile file support * file name header: .^L^S^T^R: volatile " */ #define LUSTRE_VOLATILE_HDR ".\x0c\x13\x14\x12:VOLATILE" but still much less bad than "silly rename" for NFS. I don't think it would be hard to internally map files created with the VFS O_TMPFILE onto LUSTRE_VOLATILE_HDR . It would lose the ability of directly selecting which MDT the file was created on (so quite sub-optimal for striped directories if the temp file is linked into the directory later), but it should be OK for normal usage since a " pathname " (parent directory) is specified and this could be used to determine the parent MDT.

            Just found this again. This sure is an ugly hack

            		rc = snprintf(volatile_file, sizeof(volatile_file),
            			      "%s/%s:%.4X:%.4X:fd=%.2d", parent,
            			      LUSTRE_VOLATILE_HDR, mdt_index,
            			      random_value, fd);
            
            nrutman Nathan Rutman added a comment - Just found this again. This sure is an ugly hack rc = snprintf(volatile_file, sizeof(volatile_file), "%s/%s:%.4X:%.4X:fd=%.2d" , parent, LUSTRE_VOLATILE_HDR, mdt_index, random_value, fd);

            Each client already has a preallocated range of FIDs (two SEQ values, which they typically each use for 128k creates). The real problem is that the client can't "open" a file without contacting the MDS, since it won't have an inode allocated, and if a client allocated lots of O_TMPFILE files and then tried to write them later it may run out of space.

            Doing something like Oleg's Write Back Cache would allow the inode to be opened in the client RAM.

            adilger Andreas Dilger added a comment - Each client already has a preallocated range of FIDs (two SEQ values, which they typically each use for 128k creates). The real problem is that the client can't "open" a file without contacting the MDS, since it won't have an inode allocated, and if a client allocated lots of O_TMPFILE files and then tried to write them later it may run out of space. Doing something like Oleg's Write Back Cache would allow the inode to be opened in the client RAM.

            +1

            The MDS could give each client some number of FIDs to use for these temp files, along with algorithmic layouts, to move the MDS entirely out of the create path

            nrutman Nathan Rutman added a comment - +1 The MDS could give each client some number of FIDs to use for these temp files, along with algorithmic layouts, to move the MDS entirely out of the create path

            People

              arshad512 Arshad Hussain
              adilger Andreas Dilger
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated: