Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16025

Read past file size after truncate from another client

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.16.0, Lustre 2.15.3
    • Lustre 2.12.8
    • None
    • RHEL 8.5 kernel 4.18.0-348.23.1.el8
      Whamcloud release 2.12.8
    • 3
    • 9223372036854775807

    Description

      It was found that if a file which was already accessed on node1 is truncated by node2 and then opened and read again by node1, calls to read() can get data past the end of the file. The extra bytes are filled with zeroes.

      This is usually not seen as a call to (f)stat() on the file before being opened or read will actually trigger a glimpse lock, refreshing the actual file size on node1, which is the case for most usual unix tools.

      Note that reading the same file on node2 a second time will actually get it right.

       

      Here is a reproducer:{}

       

      [seb@node1 ~]$ cat mycat.c
      #include <stdio.h>
      #include <unistd.h>
      #include <fcntl.h>
      int main(int argc, char **argv)
      {
              int i;
              int s;
              int fd;
              char buffer[4096];
              for (i = 1; i < argc; i++) {
                      fd = open(argv[i], O_RDONLY);
                      if (fd < 0) {
                              perror("Could not open file");
                              return (1);
                      }
                      while ( (s = read(fd, buffer, sizeof(buffer))) > 0) {
                              write(1, buffer, s);
                      }
                      close(fd);
              }
      } 
      [seb@node1 ~]$ gcc -Wall mycat.c -o mycat 

       

       

      [seb@node1 ~]$ cp somefile.txt /lustre/seb/somefile.txt
      # Trigger a read of the file on node2 to fill up the inode informations
      # Notice the file at this time is several megabytes
      [seb@node1 ~]$ ssh node2 'cat /lustre/seb/somefile.txt > /dev/null; ls -l /lustre/seb/somefile.txt'
      -rw-r----- 1 seb seb 114052401 Jun 16 13:51 /lustre/seb/somefile.txt
      [seb@node1 ~]$ truncate -s 100000 /lustre/seb/somefile.txt
      # Now read the file from a remote node making sure no stat() call occurs before
      [seb@node1 ~]$ ssh node2 '~/mycat /lustre/seb/somefile.txt | hexdump -C | tail -n 4'
      00018690  63 73 74 30 31 5b 4f 53  54 3a 33 36 5d 0a 32 30  |cst01[OST:36].20| 
      000186a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................| 
      * 
      00019000 
      # bytes above 100000 up to 102400 (completing the last page) are showing zeroes
      [seb@node1 ~]$ ssh node2 '~/mycat /lustre/seb/somefile.txt | hexdump -C | tail -n 4'
      00018670  31 31 33 37 34 38 20 20  36 33 38 37 39 38 31 37  |113748  63879817| 
      00018680  31 36 20 20 34 36 25 20  2f 6c 75 73 2f 68 31 74  |16  46% /lus/h1t| 
      00018690  63 73 74 30 31 5b 4f 53  54 3a 33 36 5d 0a 32 30  |cst01[OST:36].20| 
      000186a0 

      This was done with a simple ftruncate() call here, but the same problem occurs with an open( "/lustre/seb/somefile.txt", O_WRONLY|O_TRUNC) + writes to a lower size than original.

       

      Attachments

        Issue Links

          Activity

            [LU-16025] Read past file size after truncate from another client
            bobijam Zhenyu Xu added a comment -

            short read reset ki_pos issue is tracked at LU-17482

            bobijam Zhenyu Xu added a comment - short read reset ki_pos issue is tracked at LU-17482

            "Zhenyu Xu <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53827
            Subject: LU-16025 llite: short read could mess up next read offset
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: be710a787d4af182d51866dd5dc09b71b880688b

            gerrit Gerrit Updater added a comment - "Zhenyu Xu <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53827 Subject: LU-16025 llite: short read could mess up next read offset Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: be710a787d4af182d51866dd5dc09b71b880688b
            bobijam Zhenyu Xu added a comment -

            iocb->ki_pos could be wrong so that next read could start from wrong offset.

            bobijam Zhenyu Xu added a comment - iocb->ki_pos could be wrong so that next read could start from wrong offset.

            Bobi commented in Gerrit - He tried it and my suggested approach doesn't work (for anyone following along here).

            https://review.whamcloud.com/c/fs/lustre-release/+/50689

            Basically the fix adds slight overhead to each read, so it's more significant for smaller reads and probably can't even be measured for large reads.  But we need the fix and the cost isn't too high, so that's where we're at for now.

            paf0186 Patrick Farrell added a comment - Bobi commented in Gerrit - He tried it and my suggested approach doesn't work (for anyone following along here). https://review.whamcloud.com/c/fs/lustre-release/+/50689 Basically the fix adds slight overhead to each read, so it's more significant for smaller reads and probably can't even be measured for large reads.  But we need the fix and the cost isn't too high, so that's where we're at for now.

            OK, that's I thought. Thanks for confirmation, Patrick. It's a trade off to fix an critical problem first and will imporve the codes later if it's possible.

            sihara Shuichi Ihara added a comment - OK, that's I thought. Thanks for confirmation, Patrick. It's a trade off to fix an critical problem first and will imporve the codes later if it's possible.

            bobijam , I posted a comment in https://review.whamcloud.com/c/fs/lustre-release/+/50689 - I think I know why we have a regression in 4K reads, and maybe have a suggestion?  But Ihara, fyi, this patch is an important correctness fix so definitely no revert.

            paf0186 Patrick Farrell added a comment - bobijam , I posted a comment in https://review.whamcloud.com/c/fs/lustre-release/+/50689 - I think I know why we have a regression in 4K reads, and maybe have a suggestion?  But Ihara, fyi, this patch is an important correctness fix so definitely no revert.
            sihara Shuichi Ihara added a comment - - edited

            an constant small performance regression (-10%) detected after patch https://review.whamcloud.com/#/c/fs/lustre-release/+/50689 landed in b2_15.

            a workload is single process and 4k buffered IO

            # salloc -p 40n -N 1 --ntasks-per-node=1 /usr/mpi/gcc/openmpi-4.1.5a1/bin/mpirun --allow-run-as-root /work/tools/bin/ior -i 1 -w -r -b 256g -t 4k -C -Q 2 -e -vv -F -o /exafs/d0/d1/d2/ior.single.out/file
            

            here is list of b2_15 branch

            - snip -
            a4e89bffe6 LU-16369 ldiskfs: do not check enc context at lookup 
            Max Write: 539.39 MiB/sec (565.60 MB/sec)
            Max Read:  2503.16 MiB/sec (2624.75 MB/sec)
            
            48d247fcc9 LU-16025 llite: adjust read count as file got truncated
            Max Write: 538.69 MiB/sec (564.85 MB/sec)
            Max Read:  2416.74 MiB/sec (2534.13 MB/sec)
            
            b74560d74a LU-16286 ldiskfs: reimplement nodelalloc optimization
            Max Write: 533.40 MiB/sec (559.31 MB/sec)
            Max Read:  2761.09 MiB/sec (2895.21 MB/sec)
            

            Is this expected performance degradation to solve original defect, or is it possible to fix without performance impacts?

            NOTE:
            the performance was back after reverted patch https://review.whamcloud.com/#/c/fs/lustre-release/+/50689 from b2_15

            Max Write: 529.53 MiB/sec (555.25 MB/sec)
            Max Read:  2704.63 MiB/sec (2836.01 MB/sec)
            
            sihara Shuichi Ihara added a comment - - edited an constant small performance regression (-10%) detected after patch https://review.whamcloud.com/#/c/fs/lustre-release/+/50689 landed in b2_15. a workload is single process and 4k buffered IO # salloc -p 40n -N 1 --ntasks-per-node=1 /usr/mpi/gcc/openmpi-4.1.5a1/bin/mpirun --allow-run-as-root /work/tools/bin/ior -i 1 -w -r -b 256g -t 4k -C -Q 2 -e -vv -F -o /exafs/d0/d1/d2/ior.single.out/file here is list of b2_15 branch - snip - a4e89bffe6 LU-16369 ldiskfs: do not check enc context at lookup Max Write: 539.39 MiB/sec (565.60 MB/sec) Max Read:  2503.16 MiB/sec (2624.75 MB/sec) 48d247fcc9 LU-16025 llite: adjust read count as file got truncated Max Write: 538.69 MiB/sec (564.85 MB/sec) Max Read: 2416.74 MiB/sec (2534.13 MB/sec) b74560d74a LU-16286 ldiskfs: reimplement nodelalloc optimization Max Write: 533.40 MiB/sec (559.31 MB/sec) Max Read: 2761.09 MiB/sec (2895.21 MB/sec) Is this expected performance degradation to solve original defect, or is it possible to fix without performance impacts? NOTE: the performance was back after reverted patch https://review.whamcloud.com/#/c/fs/lustre-release/+/50689 from b2_15 Max Write: 529.53 MiB/sec (555.25 MB/sec) Max Read: 2704.63 MiB/sec (2836.01 MB/sec)

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50689/
            Subject: LU-16025 llite: adjust read count as file got truncated
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set:
            Commit: 48d247fcc97533ae10b204dd059c908344ab1d06

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50689/ Subject: LU-16025 llite: adjust read count as file got truncated Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: 48d247fcc97533ae10b204dd059c908344ab1d06

            "Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50689
            Subject: LU-16025 llite: adjust read count as file got truncated
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: 8acf9a3ebd3f45b193436b18a4d115d3327c7a56

            gerrit Gerrit Updater added a comment - "Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50689 Subject: LU-16025 llite: adjust read count as file got truncated Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: 8acf9a3ebd3f45b193436b18a4d115d3327c7a56
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/47896/
            Subject: LU-16025 llite: adjust read count as file got truncated
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 4468f6c9d92448cb72c5a616ec74653e83ee8e10

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/47896/ Subject: LU-16025 llite: adjust read count as file got truncated Project: fs/lustre-release Branch: master Current Patch Set: Commit: 4468f6c9d92448cb72c5a616ec74653e83ee8e10

            People

              bobijam Zhenyu Xu
              hxing Xing Huang
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: