Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17124

fiemap FIEMAP_FLAG_SYNC flag expects filemap_write_and_wait() or similar

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      fiemap FIEMAP_FLAG_SYNC can race while client is writing data to disk

      In such a case fiemap() call returns that the data is not on disk (no data for the range) and cp can just truncates (or sparse fill) based on the size of the file/extent.

      strace shows that FIEMAP_FLAG_SYNC was sent. Further and the user reports that a 'sync; cp <blah>' does not fail and a newer cp that uses copy_file_range() also does not fail.

      Looking further FIEMAP_FLAG_SYNC expects the data to be on disk aka filemap_write_and_wait() not just filemap_fdatawrite()

      Attachments

        Issue Links

          Activity

            [LU-17124] fiemap FIEMAP_FLAG_SYNC flag expects filemap_write_and_wait() or similar

            A ticket exist for copy_file_range(). I just never got the cycles to implement for Lustre. Also RHEL7 doesn't support a proper hook for copy_file_range.

            simmonsja James A Simmons added a comment - A ticket exist for copy_file_range(). I just never got the cycles to implement for Lustre. Also RHEL7 doesn't support a proper hook for copy_file_range.
            lflis Lukasz Flis added a comment -

            Thank you for the patch. We tested the changes and unfortunately we still see corrupted destnation files. 

             

            lflis Lukasz Flis added a comment - Thank you for the patch. We tested the changes and unfortunately we still see corrupted destnation files.   

            Please try the patch and see if the sync() occurs early enough to resolve the corruption you are finding.

            If not we may need to implement a file-system specific copy_file_range()

            Thanks!

            stancheff Shaun Tancheff added a comment - Please try the patch and see if the sync() occurs early enough to resolve the corruption you are finding. If not we may need to implement a file-system specific copy_file_range() Thanks!

            "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53140
            Subject: LU-17124 llite: sync on splice write
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c6c5e7c6664fcaeb4b4df51f6e0562f35a91d985

            gerrit Gerrit Updater added a comment - "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53140 Subject: LU-17124 llite: sync on splice write Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c6c5e7c6664fcaeb4b4df51f6e0562f35a91d985
            lflis Lukasz Flis added a comment - - edited

            Quick update.
            Forcing fsync(fd_out) before copy_file_range() call fixes the problem with truncated output files

             

             /*
             *   gcc -c -Wall -Werror -fpic ./this.c -o cfr.o
             *   gcc -ldl -shared -o ./cfr.so ./cfr.o
             *   LD_PRELOAD=/full_path_to/cfr.so cp A B
             *
             * */
            
            #define _GNU_SOURCE 1
            #include <dlfcn.h>
            #include <unistd.h>
            #include <stdio.h>
            #include <sys/types.h>
            #include <sys/stat.h>
            #include <fcntl.h>
            
            typedef ssize_t *(*cfr_t)(int, loff_t*, int, loff_t*, size_t, unsigned int);
            static cfr_t p = NULL;
            ssize_t copy_file_range(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out, size_t len, unsigned int flags)
            {
                fsync(fd_out);
                ssize_t r=-1;
                if (p == NULL) {
                    p =  dlsym(RTLD_NEXT, "copy_file_range");
                    if (p == NULL) {
                        /* Error handling */
                        return r;
                    }
                }    r = (ssize_t) p(fd_in, off_in, fd_out, off_out, len,flags);
                return r;
            }

             

             

             

             

             

            lflis Lukasz Flis added a comment - - edited Quick update. Forcing fsync(fd_out) before copy_file_range() call fixes the problem with truncated output files   /*  *   gcc -c -Wall -Werror -fpic ./ this .c -o cfr.o  *   gcc -ldl -shared -o ./cfr.so ./cfr.o  *   LD_PRELOAD=/full_path_to/cfr.so cp A B  *  * */ #define _GNU_SOURCE 1 #include <dlfcn.h> #include <unistd.h> #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> typedef ssize_t *(*cfr_t)( int , loff_t*, int , loff_t*, size_t, unsigned int ); static cfr_t p = NULL; ssize_t copy_file_range( int fd_in, loff_t *off_in, int fd_out, loff_t *off_out, size_t len, unsigned int flags) {     fsync(fd_out);     ssize_t r=-1;     if (p == NULL) {         p =  dlsym(RTLD_NEXT, "copy_file_range" );         if (p == NULL) {             /* Error handling */             return r;         }     }    r = (ssize_t) p(fd_in, off_in, fd_out, off_out, len,flags);     return r; }          
            lflis Lukasz Flis added a comment -

            Two traces in the attachment, both captured for the same file

            #good
            ...
            uname({sysname="Linux", nodename="t0006", ...}) = 0
            copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = 62924467
            copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = 0
            ...
            #bad
            ...
            uname({sysname="Linux", nodename="t0006", ...}) = 0
            copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = 19005440
            copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = 1441792
            copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = 0            <= short read?
            ...
            
            

             

            lflis Lukasz Flis added a comment - Two traces in the attachment, both captured for the same file #good ... uname({sysname="Linux", nodename="t0006", ...}) = 0 copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = 62924467 copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = 0 ... #bad ... uname({sysname="Linux", nodename="t0006", ...}) = 0 copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = 19005440 copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = 1441792 copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = 0 <= short read? ...  

            People

              stancheff Shaun Tancheff
              stancheff Shaun Tancheff
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: