Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16025

Read past file size after truncate from another client

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.16.0, Lustre 2.15.3
    • Lustre 2.12.8
    • None
    • RHEL 8.5 kernel 4.18.0-348.23.1.el8
      Whamcloud release 2.12.8
    • 3
    • 9223372036854775807

    Description

      It was found that if a file which was already accessed on node1 is truncated by node2 and then opened and read again by node1, calls to read() can get data past the end of the file. The extra bytes are filled with zeroes.

      This is usually not seen as a call to (f)stat() on the file before being opened or read will actually trigger a glimpse lock, refreshing the actual file size on node1, which is the case for most usual unix tools.

      Note that reading the same file on node2 a second time will actually get it right.

       

      Here is a reproducer:{}

       

      [seb@node1 ~]$ cat mycat.c
      #include <stdio.h>
      #include <unistd.h>
      #include <fcntl.h>
      int main(int argc, char **argv)
      {
              int i;
              int s;
              int fd;
              char buffer[4096];
              for (i = 1; i < argc; i++) {
                      fd = open(argv[i], O_RDONLY);
                      if (fd < 0) {
                              perror("Could not open file");
                              return (1);
                      }
                      while ( (s = read(fd, buffer, sizeof(buffer))) > 0) {
                              write(1, buffer, s);
                      }
                      close(fd);
              }
      } 
      [seb@node1 ~]$ gcc -Wall mycat.c -o mycat 

       

       

      [seb@node1 ~]$ cp somefile.txt /lustre/seb/somefile.txt
      # Trigger a read of the file on node2 to fill up the inode informations
      # Notice the file at this time is several megabytes
      [seb@node1 ~]$ ssh node2 'cat /lustre/seb/somefile.txt > /dev/null; ls -l /lustre/seb/somefile.txt'
      -rw-r----- 1 seb seb 114052401 Jun 16 13:51 /lustre/seb/somefile.txt
      [seb@node1 ~]$ truncate -s 100000 /lustre/seb/somefile.txt
      # Now read the file from a remote node making sure no stat() call occurs before
      [seb@node1 ~]$ ssh node2 '~/mycat /lustre/seb/somefile.txt | hexdump -C | tail -n 4'
      00018690  63 73 74 30 31 5b 4f 53  54 3a 33 36 5d 0a 32 30  |cst01[OST:36].20| 
      000186a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................| 
      * 
      00019000 
      # bytes above 100000 up to 102400 (completing the last page) are showing zeroes
      [seb@node1 ~]$ ssh node2 '~/mycat /lustre/seb/somefile.txt | hexdump -C | tail -n 4'
      00018670  31 31 33 37 34 38 20 20  36 33 38 37 39 38 31 37  |113748  63879817| 
      00018680  31 36 20 20 34 36 25 20  2f 6c 75 73 2f 68 31 74  |16  46% /lus/h1t| 
      00018690  63 73 74 30 31 5b 4f 53  54 3a 33 36 5d 0a 32 30  |cst01[OST:36].20| 
      000186a0 

      This was done with a simple ftruncate() call here, but the same problem occurs with an open( "/lustre/seb/somefile.txt", O_WRONLY|O_TRUNC) + writes to a lower size than original.

       

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              hxing Xing Huang
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: