Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11053

problem with loop device associated with lustre file

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      lustre's file open does not bring the file size info in-core inode. losetup avoids stat for the backing lustre file before LOOP_SET_FD:

      open("/mnt/lustre/ext4image", O_RDWR)    = 3
      open("/dev/loop0", O_RDWR)              = 4
      ioctl(4, LOOP_SET_FD, 0x3)              = 0
      stat("/mnt/lustre/ext4image"...
      

      So losetup creates block device inode with zero size which eventually leads to failure on umounting the loop device at BUG_ON(!buffer_mapped(bh)) in submit_bh():

      <2>kernel BUG at fs/buffer.c:3157!
      <4>invalid opcode: 0000 [#1] SMP
      <4>last sysfs file: /sys/devices/system/cpu/online
      <4>CPU 37
      ...
      <4>Pid: 6751, comm: umount Not tainted 2.6.32-696.18.7.el6.x86_64 #1 BULL bullx blade/CHPD
      <4>RIP: 0010:[<ffffffff811d0962>]  [<ffffffff811d0962>] submit_bh+0x152/0x1f0
      <4>RSP: 0018:ffff88107483bd68  EFLAGS: 00010246
      <4>RAX: 0000000000000005 RBX: ffff88087c7fbd60 RCX: 0000000000000017
      ...
      <4>Call Trace:
      <4> [<ffffffff811d2973>] __sync_dirty_buffer+0x53/0xf0
      <4> [<ffffffff811d2a23>] sync_dirty_buffer+0x13/0x20
      <4> [<ffffffffa0d1877b>] ext2_sync_super+0x5b/0x70 [ext2]
      <4> [<ffffffffa0d19733>] ext2_put_super+0x133/0x150 [ext2]
      <4> [<ffffffff8119cc4b>] generic_shutdown_super+0x5b/0xe0
      <4> [<ffffffff8119cd01>] kill_block_super+0x31/0x50
      <4> [<ffffffff8119d4d7>] deactivate_super+0x57/0x80
      <4> [<ffffffff811bd50f>] mntput_no_expire+0xbf/0x110
      <4> [<ffffffff811be05b>] sys_umount+0x7b/0x3a0
      

      The modification to sanity.sh:test_54c (from Andrew Perepechko) illustrates the problem.

      diff --git a/lustre/tests/sanity.sh b/lustre/tests/sanity.sh
      index c6b292b029..d2961dbf0e 100755
      --- a/lustre/tests/sanity.sh
      +++ b/lustre/tests/sanity.sh
      @@ -4656,10 +4656,15 @@ test_54c() {
              mknod $loopdev b 7 $LOOPNUM
              echo "make a loop file system with $DIR/$tfile on $loopdev ($LOOPNUM)."
              dd if=/dev/zero of=$DIR/$tfile bs=$(get_page_size client) seek=1024 count=1 > /dev/null
      +       mkfs.ext2 $DIR/$tfile  || error "mke2fs on $DIR/$tfile "
      +       test_mkdir $DIR/$tdir
      +
      +       cancel_lru_locks mdc
      +       cancel_lru_locks osc
      +
              losetup $loopdev $DIR/$tfile ||
                      error "can't set up $loopdev for $DIR/$tfile"
      -       mkfs.ext2 $loopdev || error "mke2fs on $loopdev"
      -       test_mkdir $DIR/$tdir
      +
              mount -t ext2 $loopdev $DIR/$tdir ||
                      error "error mounting $loopdev on $DIR/$tdir"
              dd if=/dev/zero of=$DIR/$tdir/tmp bs=$(get_page_size client) count=30 ||
      

      Attachments

        Activity

          [LU-11053] problem with loop device associated with lustre file
          rajeevm Rajeev Mishra added a comment - - edited

          This issue is related to caching and may not be visible on all the filesystems. However, it is observed on Lustre.
          I have identified two possible fixes:
          1. Perform a sync within the loop device to ensure the actual file size is obtained. This can be done by calling vfs_fsync() before reading the file size, as shown below (in block/loop.c):

          static loff_t get_size(loff_t offset, loff_t sizelimit, struct file *file)
          {
                loff_t loopsize;       /* Sync to ensure accurate file size */   
                vfs_fsync(file, 0); <<<< this is added  
                loopsize = i_size_read(file->f_mapping->host);
          

          2. Ensure the application performs a sync after writing to guarantee that all data is properly accounted for. This can be done using fsync(backing_fd), as I did in the provided test case.

          rajeevm Rajeev Mishra added a comment - - edited This issue is related to caching and may not be visible on all the filesystems. However, it is observed on Lustre. I have identified two possible fixes: 1. Perform a sync within the loop device to ensure the actual file size is obtained. This can be done by calling vfs_fsync() before reading the file size, as shown below (in block/loop.c): static loff_t get_size(loff_t offset, loff_t sizelimit, struct file *file) { loff_t loopsize; /* Sync to ensure accurate file size */ vfs_fsync(file, 0); <<<< this is added loopsize = i_size_read(file->f_mapping->host); 2. Ensure the application performs a sync after writing to guarantee that all data is properly accounted for. This can be done using fsync(backing_fd), as I did in the provided test case.

          The "loopsize=1" could potentially happen because the inode i_blocks count has not been updated due to the "dd" write not being synchronous and no blocks are allocated to the file yet. IIRC, the Lustre client will return the blocks count =1 while the file is only in the client cache, so that userspace applications (e.g. "cp") do not incorrectly assume the whole file is sparse and not try to read it at all...

          Does this problem reproduce if you use "dd ... oflag=sync" or "dd ... oflag=direct" so that the backing OST blocks are immediately allocated and the block count is synchronously returned to the client? Have you tried it with a patched kernel as suggested above to force losetup to fetch the correct file size and blocks from the OSTs?

          adilger Andreas Dilger added a comment - The "loopsize=1" could potentially happen because the inode i_blocks count has not been updated due to the "dd" write not being synchronous and no blocks are allocated to the file yet. IIRC, the Lustre client will return the blocks count =1 while the file is only in the client cache, so that userspace applications (e.g. "cp") do not incorrectly assume the whole file is sparse and not try to read it at all... Does this problem reproduce if you use " dd ... oflag=sync " or " dd ... oflag=direct " so that the backing OST blocks are immediately allocated and the block count is synchronously returned to the client? Have you tried it with a patched kernel as suggested above to force losetup to fetch the correct file size and blocks from the OSTs?

          The  i_size_read on the file reports following 
          [ 1395.572922] Rajeev loop offset =0 loopsize=512 <<< in case of NFS 
          [ 1395.573708] loop3: detected capacity change from 0 to 512
          [ 1427.038818] Rajeev loop offset =0 loopsize=1 <<< in case of Lustre

          checking further as why Lustre is reporting wrong i_size

          rajeevm Rajeev Mishra added a comment - The  i_size_read on the file reports following  [ 1395.572922] Rajeev loop offset =0 loopsize=512 <<< in case of NFS  [ 1395.573708] loop3: detected capacity change from 0 to 512 [ 1427.038818] Rajeev loop offset =0 loopsize=1 <<< in case of Lustre checking further as why Lustre is reporting wrong i_size

          The problem on NFS was different. I further debuded it and realized the block size of file was the issue on NFS
          sudo dd if=/dev/zero bs=1 count=512 of=./backing_file.img
          512+0 records in
          512+0 records out
          512 bytes copied, 1.25623 s, 0.4 kB/s
          [rocky@test2-fcntl test]$ sudo ./a.out_nfs
          data len =13, back file_size = 512 loop_file sz = 0
          debug=super warning error emerg console
          Wrote 13 bytes to /dev/loop3
          lustre case
          [rocky@test2-fcntl test]$ sudo dd if=/dev/zero bs=1 count=512 of=/mnt/lustre/backing_file.img
          512+0 records in
          512+0 records out
          512 bytes copied, 0.00515372 s, 99.3 kB/s
          [rocky@test2-fcntl test]$ sudo ./a.out_lustr
          data len =13, back file_size = 512 loop_file sz = 0
          debug=super warning error emerg console
          write to loop device: No space left on device

          rajeevm Rajeev Mishra added a comment - The problem on NFS was different. I further debuded it and realized the block size of file was the issue on NFS sudo dd if=/dev/zero bs=1 count=512 of=./backing_file.img 512+0 records in 512+0 records out 512 bytes copied, 1.25623 s, 0.4 kB/s [rocky@test2-fcntl test] $ sudo ./a.out_nfs data len =13, back file_size = 512 loop_file sz = 0 debug=super warning error emerg console Wrote 13 bytes to /dev/loop3 lustre case [rocky@test2-fcntl test] $ sudo dd if=/dev/zero bs=1 count=512 of=/mnt/lustre/backing_file.img 512+0 records in 512+0 records out 512 bytes copied, 0.00515372 s, 99.3 kB/s [rocky@test2-fcntl test] $ sudo ./a.out_lustr data len =13, back file_size = 512 loop_file sz = 0 debug=super warning error emerg console write to loop device: No space left on device
          rajeevm Rajeev Mishra added a comment - - edited

          Thanks, @Andreas! I really appreciate your swift response. The issue also occurs on Lustre also. I am running Rocky 8.7 version:

          Linux test-build-fcntl 4.18.0-425.3.1.el8.x86_64 #1 SMP Wed Nov 9 20:13:27 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

          It is a bit old. Thanks again for your help!. I will try the code changes, before that I will get some real new build like RHEL 9.6 to check if the problem still exit.

          rajeevm Rajeev Mishra added a comment - - edited Thanks, @Andreas! I really appreciate your swift response. The issue also occurs on Lustre also. I am running Rocky 8.7 version: Linux test-build-fcntl 4.18.0-425.3.1.el8.x86_64 #1 SMP Wed Nov 9 20:13:27 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux It is a bit old. Thanks again for your help!. I will try the code changes, before that I will get some real new build like RHEL 9.6 to check if the problem still exit.
          adilger Andreas Dilger added a comment - - edited

          rajeevm what distro are you testing with, and what version of losetup, or are you only able to reproduce this with the custom test program? In my Rocky 8.10 client, the losetup utility from util-linux-2.32.1-46.el8.x86_64 is calling lstat() to validate the whole pathname to the image file, so it is avoiding this issue for new releases.

          I think it would still be best to submit a patch upstream for drivers/block/loop.c::get_size() to ensure the file's inode->i_size is updated before the attach, so that it is correct regardless of what userspace is doing. This will be called for LOOP_SET_FD, LOOP_CHANGE_FD, LOOP_CONFIGURE, and LOOP_SET_STATUS. Something like the following (I haven't even compiled this):

          diff --git a/drivers/block/loop.c b/drivers/block/loop.c
          index 8f6761c27c..41f9e05873 100644
          --- a/drivers/block/loop.c
          +++ b/drivers/block/loop.c
          @@ -141,10 +141,14 @@ static int part_shift;
           
           static loff_t get_size(loff_t offset, loff_t sizelimit, struct file *file)
           {
          +       struct kstat stat;
                  loff_t loopsize;
           
          -       /* Compute loopsize in bytes */
          -       loopsize = i_size_read(file->f_mapping->host);
          +       /* Revalidate and get uptodate loopsize in bytes */
          +       if (vfs_getattr_nosec(file->f_path, &stat, STATX_SIZE,
          +                             AT_STATX_FORCE_SYNC))
          +               return 0;
          +       loopsize = stat.size;
                  if (offset > 0)
                          loopsize -= offset;
                  /* offset is beyond i_size, weird but possible */
          

          As I wrote previously, only mention NFS in your patch (not Lustre), or you may get some pushback about accepting the patch.

          adilger Andreas Dilger added a comment - - edited rajeevm what distro are you testing with, and what version of losetup , or are you only able to reproduce this with the custom test program? In my Rocky 8.10 client, the losetup utility from util-linux-2.32.1-46.el8.x86_64 is calling lstat() to validate the whole pathname to the image file, so it is avoiding this issue for new releases. I think it would still be best to submit a patch upstream for drivers/block/loop.c::get_size() to ensure the file's inode->i_size is updated before the attach, so that it is correct regardless of what userspace is doing. This will be called for LOOP_SET_FD , LOOP_CHANGE_FD , LOOP_CONFIGURE , and LOOP_SET_STATUS . Something like the following (I haven't even compiled this): diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 8f6761c27c..41f9e05873 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -141,10 +141,14 @@ static int part_shift; static loff_t get_size(loff_t offset, loff_t sizelimit, struct file *file) { + struct kstat stat; loff_t loopsize; - /* Compute loopsize in bytes */ - loopsize = i_size_read(file->f_mapping->host); + /* Revalidate and get uptodate loopsize in bytes */ + if (vfs_getattr_nosec(file->f_path, &stat, STATX_SIZE, + AT_STATX_FORCE_SYNC)) + return 0; + loopsize = stat.size; if (offset > 0) loopsize -= offset; /* offset is beyond i_size, weird but possible */ As I wrote previously, only mention NFS in your patch (not Lustre), or you may get some pushback about accepting the patch.

          Same issue on NFS also

          10.103.18.214:/home/rocky/build 49G 22G 28G 45% /home/rocky/test
          [rocky@test2-fcntl build]$ sudo ./a.out
          Error: Data too large for backing file.
          I changed the program to write to NFS.

          rajeevm Rajeev Mishra added a comment - Same issue on NFS also 10.103.18.214:/home/rocky/build 49G 22G 28G 45% /home/rocky/test [rocky@test2-fcntl build] $ sudo ./a.out Error: Data too large for backing file. I changed the program to write to NFS.
          rajeevm Rajeev Mishra added a comment - - edited

          I was able to re-create this problem with following test

          #include <stdio.h>
          #include <stdlib.h>
          #include <unistd.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
          #include <linux/loop.h>
          #include <errno.h>
          #include <string.h>
          #include <sys/stat.h>
          
          #define BACKING_FILE "/mnt/lustre/backing_file.img" 
          //#define BACKING_FILE "backing_file.img" 
          #define LOOP_DEVICE "/dev/loop3"       
          #define TEST_DATA "Hello, world!"
          
          int main() {
              int loop_fd, backing_fd, loop_control_fd;
              struct loop_info loopinfo;
              int loop_num;
              int loop_ctl_fd;
              char loop_dev[64];
          
              backing_fd = open(BACKING_FILE, O_RDWR | O_CREAT, 0644);
              if (backing_fd == -1) {
                  perror("open backing file");
                  return 1;
              }
          
              loop_control_fd = open("/dev/loop-control", O_RDWR);
              if (loop_control_fd == -1) {
                  perror("open /dev/loop-control");
                  close(backing_fd);
                  return 1;
              }
          
              loop_num = ioctl(loop_control_fd, LOOP_CTL_GET_FREE);
              if (loop_num < 0) {
                  perror("Failed to get a free loop device");
                  close(loop_ctl_fd);
                  return EXIT_FAILURE;
              }
              snprintf(loop_dev, sizeof(loop_dev), "/dev/loop%d", loop_num);
          
              loop_fd = open(loop_dev, O_RDWR);
                  perror("open loop device");
                  close(backing_fd);
                  close(loop_control_fd);
                  return 1;
              }
          
              if (ioctl(loop_fd, LOOP_SET_FD, backing_fd) == -1) {
                  perror("ioctl LOOP_SET_FD");
                  close(backing_fd);
                  close(loop_fd);
                  close(loop_control_fd);
                  return 1;
              }
          
              if (ioctl(loop_fd, LOOP_GET_STATUS, &loopinfo) == -1) {  // Corrected!
                  perror("ioctl LOOP_GET_STATUS");
                  close(backing_fd);
                  close(loop_control_fd);
                  return 1;
              }
          
              struct stat st;
              if (fstat(backing_fd, &st) == -1) {
                  perror("fstat backing file");
                  close(backing_fd);
                  close(loop_control_fd);
                  return 1;
          
              }
          
              off_t file_size = st.st_size;
              size_t data_len = strlen(TEST_DATA);
          
              if (data_len > file_size) {
                  fprintf(stderr, "Error: Data too large for backing file.\n");
                  close(backing_fd);
                  close(loop_control_fd);
                  return 1;
          
              }
          
              ssize_t bytes_written = write(loop_fd, TEST_DATA, strlen(TEST_DATA));
              if (bytes_written == -1) {
                  perror("write to loop device");
                  close(backing_fd);
                  close(loop_fd);
                  close(loop_control_fd);
                  return 1;
              }
          
              printf("Wrote %zd bytes to %s\n", bytes_written, LOOP_DEVICE);
          
              if (ioctl(loop_fd, LOOP_CLR_FD, 0) == -1) {
                  perror("ioctl LOOP_CLR_FD");
                  close(backing_fd);
                  close(loop_fd);
                  close(loop_control_fd);
                  return 1;
              }
          
              close(backing_fd);
              close(loop_fd);
              close(loop_control_fd);
          
              return 0;
          }
          

          The problem happens as:

          ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
          {       
                  struct file *file = iocb->ki_filp;
                  struct inode *bd_inode = bdev_file_inode(file);
                  loff_t size = i_size_read(bd_inode);
                  struct blk_plug plug;
                  size_t shorted = 0; 
                  ssize_t ret;
                  
                  if (bdev_read_only(I_BDEV(bd_inode)))
                          return -EPERM;
                  
                  /* uswsusp needs write permission to the swap */
                  if (IS_SWAPFILE(bd_inode) && !hibernation_available())
                          return -ETXTBSY;
                  
                  if (!iov_iter_count(from))
                          return 0;
           
                  if (iocb->ki_pos >= size)
                          return -ENOSPC; <<<<
          

          That means the file offset is not good which seems to suggest that some thing is wrong with loop device or kernel loop device. The same test will work fine on the ext2, it fails only first time not next time

          Steps to re-create:

          $ sudo dd if=/dev/zero bs=1 count=512 of=/mnt/lustre/backing_file.img
          $ sudo ./a.out 
          write to loop device: No space left on device
          $ sudo ./a.out 
          Wrote 13 bytes to /dev/loop3
          $ 
          
          rajeevm Rajeev Mishra added a comment - - edited I was able to re-create this problem with following test #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <sys/ioctl.h> #include <linux/loop.h> #include <errno.h> #include <string.h> #include <sys/stat.h> #define BACKING_FILE "/mnt/lustre/backing_file.img" //#define BACKING_FILE "backing_file.img" #define LOOP_DEVICE "/dev/loop3" #define TEST_DATA "Hello, world!" int main() { int loop_fd, backing_fd, loop_control_fd; struct loop_info loopinfo; int loop_num; int loop_ctl_fd; char loop_dev[64]; backing_fd = open(BACKING_FILE, O_RDWR | O_CREAT, 0644); if (backing_fd == -1) { perror( "open backing file" ); return 1; } loop_control_fd = open( "/dev/loop-control" , O_RDWR); if (loop_control_fd == -1) { perror( "open /dev/loop-control" ); close(backing_fd); return 1; } loop_num = ioctl(loop_control_fd, LOOP_CTL_GET_FREE); if (loop_num < 0) { perror( "Failed to get a free loop device" ); close(loop_ctl_fd); return EXIT_FAILURE; } snprintf(loop_dev, sizeof(loop_dev), "/dev/loop%d" , loop_num); loop_fd = open(loop_dev, O_RDWR); perror( "open loop device" ); close(backing_fd); close(loop_control_fd); return 1; } if (ioctl(loop_fd, LOOP_SET_FD, backing_fd) == -1) { perror( "ioctl LOOP_SET_FD" ); close(backing_fd); close(loop_fd); close(loop_control_fd); return 1; } if (ioctl(loop_fd, LOOP_GET_STATUS, &loopinfo) == -1) { // Corrected! perror( "ioctl LOOP_GET_STATUS" ); close(backing_fd); close(loop_control_fd); return 1; } struct stat st; if (fstat(backing_fd, &st) == -1) { perror( "fstat backing file" ); close(backing_fd); close(loop_control_fd); return 1; } off_t file_size = st.st_size; size_t data_len = strlen(TEST_DATA); if (data_len > file_size) { fprintf(stderr, "Error: Data too large for backing file.\n" ); close(backing_fd); close(loop_control_fd); return 1; } ssize_t bytes_written = write(loop_fd, TEST_DATA, strlen(TEST_DATA)); if (bytes_written == -1) { perror( "write to loop device" ); close(backing_fd); close(loop_fd); close(loop_control_fd); return 1; } printf( "Wrote %zd bytes to %s\n" , bytes_written, LOOP_DEVICE); if (ioctl(loop_fd, LOOP_CLR_FD, 0) == -1) { perror( "ioctl LOOP_CLR_FD" ); close(backing_fd); close(loop_fd); close(loop_control_fd); return 1; } close(backing_fd); close(loop_fd); close(loop_control_fd); return 0; } The problem happens as: ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp; struct inode *bd_inode = bdev_file_inode(file); loff_t size = i_size_read(bd_inode); struct blk_plug plug; size_t shorted = 0; ssize_t ret; if (bdev_read_only(I_BDEV(bd_inode))) return -EPERM; /* uswsusp needs write permission to the swap */ if (IS_SWAPFILE(bd_inode) && !hibernation_available()) return -ETXTBSY; if (!iov_iter_count(from)) return 0; if (iocb->ki_pos >= size) return -ENOSPC; <<<< That means the file offset is not good which seems to suggest that some thing is wrong with loop device or kernel loop device. The same test will work fine on the ext2, it fails only first time not next time Steps to re-create: $ sudo dd if=/dev/zero bs=1 count=512 of=/mnt/lustre/backing_file.img $ sudo ./a.out write to loop device: No space left on device $ sudo ./a.out Wrote 13 bytes to /dev/loop3 $

          If problem can be reproduced with NFS, then the best solution is to submit a patch upstream (don't mention Lustre, just NFS) to add a call to ->d_revalidate before accessing the size. This is the correct thing to do anyway - it only adds overhead when loopback files are being used.

          I tested this locally, but was unable to reproduce it on my Ubuntu client because losetup is calling lstat() on the filename before accessing it (using util-linux-ng-2.29.2:

          stat64("/dev/loop0", {st_mode=S_IFBLK|0660, st_rdev=makedev(7, 0), ...}) = 0
          lstat64("/myth", {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0
          lstat64("/myth/tmp", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=4096, ...}) = 0
          lstat64("/myth/tmp/MythDora-12.23-X64-DVD", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
          lstat64("/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", {st_mode=S_IFREG|0644, st_size=1243049984, ...}) = 0
          open("/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", O_RDWR|O_LARGEFILE|O_CLOEXEC) = 3
          open("/dev/loop0", O_RDWR|O_LARGEFILE|O_CLOEXEC) = 4ioctl(4, LOOP_SET_FD, 3)                = 0
          ioctl(4, LOOP_SET_STATUS64, {lo_offset=0, lo_number=0, lo_flags=0, lo_file_name="/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", ...}) = 0
          close(3)                                = 0
          stat64("/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", {st_mode=S_IFREG|0644, st_size=1243049984, ...}) = 0
          

          Running on an old RHEL6 client losetup from util-linux-ng-2.17.2 uses readlink() instead of stat():

          readlink("/myth", 0x7ffd359c6a50, 4096) = -1 EINVAL (Invalid argument)
          readlink("/myth/tmp", 0x7ffd359c6a50, 4096) = -1 EINVAL (Invalid argument)
          readlink("/myth/tmp/MythDora-12.23-X64-DVD", 0x7ffd359c6a50, 4096) = -1 EINVAL (Invalid argument)
          readlink("/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", 0x7ffd359c6a50, 4096) = -1 EINVAL (Invalid argument)
          ioctl(4, LOOP_SET_FD, 0x3)              = 0
          close(3)                                = 0
          ioctl(4, LOOP_SET_STATUS64, {offset=0, number=0, flags=0, file_name="/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", ...}) = 0
          close(4)                                = 0
          stat("/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", {st_mode=S_IFREG|0644, st_size=1243049984, ...}) = 0
          

          It may be that updating util-linux-ng will fix this for you, but it would still be good to fix it in the upstream kernel. As for fixing it in Lustre, I think it would be best to limit this workaround to the "losetup" binary so that it doesn't hurt all open operations.

          adilger Andreas Dilger added a comment - If problem can be reproduced with NFS, then the best solution is to submit a patch upstream (don't mention Lustre, just NFS) to add a call to ->d_revalidate before accessing the size. This is the correct thing to do anyway - it only adds overhead when loopback files are being used. I tested this locally, but was unable to reproduce it on my Ubuntu client because losetup is calling lstat() on the filename before accessing it (using util-linux-ng-2.29.2 : stat64("/dev/loop0", {st_mode=S_IFBLK|0660, st_rdev=makedev(7, 0), ...}) = 0 lstat64("/myth", {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0 lstat64("/myth/tmp", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=4096, ...}) = 0 lstat64("/myth/tmp/MythDora-12.23-X64-DVD", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 lstat64("/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", {st_mode=S_IFREG|0644, st_size=1243049984, ...}) = 0 open("/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", O_RDWR|O_LARGEFILE|O_CLOEXEC) = 3 open("/dev/loop0", O_RDWR|O_LARGEFILE|O_CLOEXEC) = 4ioctl(4, LOOP_SET_FD, 3) = 0 ioctl(4, LOOP_SET_STATUS64, {lo_offset=0, lo_number=0, lo_flags=0, lo_file_name="/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", ...}) = 0 close(3) = 0 stat64("/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", {st_mode=S_IFREG|0644, st_size=1243049984, ...}) = 0 Running on an old RHEL6 client losetup from util-linux-ng-2.17.2 uses readlink() instead of stat() : readlink("/myth", 0x7ffd359c6a50, 4096) = -1 EINVAL (Invalid argument) readlink("/myth/tmp", 0x7ffd359c6a50, 4096) = -1 EINVAL (Invalid argument) readlink("/myth/tmp/MythDora-12.23-X64-DVD", 0x7ffd359c6a50, 4096) = -1 EINVAL (Invalid argument) readlink("/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", 0x7ffd359c6a50, 4096) = -1 EINVAL (Invalid argument) ioctl(4, LOOP_SET_FD, 0x3) = 0 close(3) = 0 ioctl(4, LOOP_SET_STATUS64, {offset=0, number=0, flags=0, file_name="/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", ...}) = 0 close(4) = 0 stat("/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", {st_mode=S_IFREG|0644, st_size=1243049984, ...}) = 0 It may be that updating util-linux-ng will fix this for you, but it would still be good to fix it in the upstream kernel. As for fixing it in Lustre, I think it would be best to limit this workaround to the "losetup" binary so that it doesn't hurt all open operations.

          Andreas,

          I tried something like

          --- linux-3.10.0-862.14.4.el7.x86_64.orig/drivers/block/loop.c
          +++ linux-3.10.0-862.14.4.el7.x86_64/drivers/block/loop.c
          @@ -853,6 +853,7 @@ static int loop_set_fd(struct loop_devic
          +       struct kstat stat;
          ...
          +       if (vfs_fstat(arg, &stat))
          +               goto out_putf;
          

          That helps for lustre file, but does not in case of NFS which seems to retain outdated file size, although it has been changed on server right after NFS open.

          It seems that there is no suitable inode operation for the invalidation. Do you have a clue on how that could be done?
          Thanks

          PS: In case of NFS, the BUG_ON(!buffer_mapped(bh)) does not happen on umount.
          mount fails with "EXT4-fs (loop0): VFS: Can't find ext4 filesystem" when it tries to mount loop device associated with zero length nfs file.

          vsaveliev Vladimir Saveliev added a comment - Andreas, I tried something like --- linux-3.10.0-862.14.4.el7.x86_64.orig/drivers/block/loop.c +++ linux-3.10.0-862.14.4.el7.x86_64/drivers/block/loop.c @@ -853,6 +853,7 @@ static int loop_set_fd(struct loop_devic + struct kstat stat; ... + if (vfs_fstat(arg, &stat)) + goto out_putf; That helps for lustre file, but does not in case of NFS which seems to retain outdated file size, although it has been changed on server right after NFS open. It seems that there is no suitable inode operation for the invalidation. Do you have a clue on how that could be done? Thanks PS: In case of NFS, the BUG_ON(!buffer_mapped(bh)) does not happen on umount. mount fails with "EXT4-fs (loop0): VFS: Can't find ext4 filesystem" when it tries to mount loop device associated with zero length nfs file.

          This could be considered a bug in the kernel rather than in Lustre, that the losetup code does not revalidate the inode size before it is attaching the device to the inode. It seems likely that you could hit this same bug with NFS, which gives you a legitimate reason to submit a patch upstream and to the distro kernels. Getting this fixed in the upstream kernels would at least put an upper limit on how long we need to keep a workaround in the Lustre code.

          adilger Andreas Dilger added a comment - This could be considered a bug in the kernel rather than in Lustre, that the losetup code does not revalidate the inode size before it is attaching the device to the inode. It seems likely that you could hit this same bug with NFS, which gives you a legitimate reason to submit a patch upstream and to the distro kernels. Getting this fixed in the upstream kernels would at least put an upper limit on how long we need to keep a workaround in the Lustre code.

          People

            vsaveliev Vladimir Saveliev
            vsaveliev Vladimir Saveliev
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: