Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.12.8
-
None
-
RHEL 8.5 kernel 4.18.0-348.23.1.el8
Whamcloud release 2.12.8
-
3
-
9223372036854775807
Description
It was found that if a file which was already accessed on node1 is truncated by node2 and then opened and read again by node1, calls to read() can get data past the end of the file. The extra bytes are filled with zeroes.
This is usually not seen as a call to (f)stat() on the file before being opened or read will actually trigger a glimpse lock, refreshing the actual file size on node1, which is the case for most usual unix tools.
Note that reading the same file on node2 a second time will actually get it right.
Here is a reproducer:{}
[seb@node1 ~]$ cat mycat.c #include <stdio.h> #include <unistd.h> #include <fcntl.h> int main(int argc, char **argv) { int i; int s; int fd; char buffer[4096]; for (i = 1; i < argc; i++) { fd = open(argv[i], O_RDONLY); if (fd < 0) { perror("Could not open file"); return (1); } while ( (s = read(fd, buffer, sizeof(buffer))) > 0) { write(1, buffer, s); } close(fd); } } [seb@node1 ~]$ gcc -Wall mycat.c -o mycat
[seb@node1 ~]$ cp somefile.txt /lustre/seb/somefile.txt # Trigger a read of the file on node2 to fill up the inode informations # Notice the file at this time is several megabytes [seb@node1 ~]$ ssh node2 'cat /lustre/seb/somefile.txt > /dev/null; ls -l /lustre/seb/somefile.txt' -rw-r----- 1 seb seb 114052401 Jun 16 13:51 /lustre/seb/somefile.txt [seb@node1 ~]$ truncate -s 100000 /lustre/seb/somefile.txt # Now read the file from a remote node making sure no stat() call occurs before [seb@node1 ~]$ ssh node2 '~/mycat /lustre/seb/somefile.txt | hexdump -C | tail -n 4' 00018690 63 73 74 30 31 5b 4f 53 54 3a 33 36 5d 0a 32 30 |cst01[OST:36].20| 000186a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00019000 # bytes above 100000 up to 102400 (completing the last page) are showing zeroes [seb@node1 ~]$ ssh node2 '~/mycat /lustre/seb/somefile.txt | hexdump -C | tail -n 4' 00018670 31 31 33 37 34 38 20 20 36 33 38 37 39 38 31 37 |113748 63879817| 00018680 31 36 20 20 34 36 25 20 2f 6c 75 73 2f 68 31 74 |16 46% /lus/h1t| 00018690 63 73 74 30 31 5b 4f 53 54 3a 33 36 5d 0a 32 30 |cst01[OST:36].20| 000186a0
This was done with a simple ftruncate() call here, but the same problem occurs with an open( "/lustre/seb/somefile.txt", O_WRONLY|O_TRUNC) + writes to a lower size than original.
Attachments
Issue Links
- is related to
-
LU-17993 "ASSERTION( i == pv->ldp_count ) failed" is ll_direct_rw_pages
- Resolved
-
LU-18468 Md5 chksum fails on the mirror file created by dd with bs=4k after doing mirror resync and delete
- Resolved
-
LU-17469 GPF in ll_writepages()-> lov_io_init+0x25b/0x510
- Resolved
- is related to
-
LU-17482 short read does not set ki_pos correctly
- Resolved