[LU-11053] problem with loop device associated with lustre file Created: 24/May/18  Updated: 27/May/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Vladimir Saveliev Assignee: Vladimir Saveliev
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

lustre's file open does not bring the file size info in-core inode. losetup avoids stat for the backing lustre file before LOOP_SET_FD:

open("/mnt/lustre/ext4image", O_RDWR)    = 3
open("/dev/loop0", O_RDWR)              = 4
ioctl(4, LOOP_SET_FD, 0x3)              = 0
stat("/mnt/lustre/ext4image"...

So losetup creates block device inode with zero size which eventually leads to failure on umounting the loop device at BUG_ON(!buffer_mapped(bh)) in submit_bh():

<2>kernel BUG at fs/buffer.c:3157!
<4>invalid opcode: 0000 [#1] SMP
<4>last sysfs file: /sys/devices/system/cpu/online
<4>CPU 37
...
<4>Pid: 6751, comm: umount Not tainted 2.6.32-696.18.7.el6.x86_64 #1 BULL bullx blade/CHPD
<4>RIP: 0010:[<ffffffff811d0962>]  [<ffffffff811d0962>] submit_bh+0x152/0x1f0
<4>RSP: 0018:ffff88107483bd68  EFLAGS: 00010246
<4>RAX: 0000000000000005 RBX: ffff88087c7fbd60 RCX: 0000000000000017
...
<4>Call Trace:
<4> [<ffffffff811d2973>] __sync_dirty_buffer+0x53/0xf0
<4> [<ffffffff811d2a23>] sync_dirty_buffer+0x13/0x20
<4> [<ffffffffa0d1877b>] ext2_sync_super+0x5b/0x70 [ext2]
<4> [<ffffffffa0d19733>] ext2_put_super+0x133/0x150 [ext2]
<4> [<ffffffff8119cc4b>] generic_shutdown_super+0x5b/0xe0
<4> [<ffffffff8119cd01>] kill_block_super+0x31/0x50
<4> [<ffffffff8119d4d7>] deactivate_super+0x57/0x80
<4> [<ffffffff811bd50f>] mntput_no_expire+0xbf/0x110
<4> [<ffffffff811be05b>] sys_umount+0x7b/0x3a0

The modification to sanity.sh:test_54c (from Andrew Perepechko) illustrates the problem.

diff --git a/lustre/tests/sanity.sh b/lustre/tests/sanity.sh
index c6b292b029..d2961dbf0e 100755
--- a/lustre/tests/sanity.sh
+++ b/lustre/tests/sanity.sh
@@ -4656,10 +4656,15 @@ test_54c() {
        mknod $loopdev b 7 $LOOPNUM
        echo "make a loop file system with $DIR/$tfile on $loopdev ($LOOPNUM)."
        dd if=/dev/zero of=$DIR/$tfile bs=$(get_page_size client) seek=1024 count=1 > /dev/null
+       mkfs.ext2 $DIR/$tfile  || error "mke2fs on $DIR/$tfile "
+       test_mkdir $DIR/$tdir
+
+       cancel_lru_locks mdc
+       cancel_lru_locks osc
+
        losetup $loopdev $DIR/$tfile ||
                error "can't set up $loopdev for $DIR/$tfile"
-       mkfs.ext2 $loopdev || error "mke2fs on $loopdev"
-       test_mkdir $DIR/$tdir
+
        mount -t ext2 $loopdev $DIR/$tdir ||
                error "error mounting $loopdev on $DIR/$tdir"
        dd if=/dev/zero of=$DIR/$tdir/tmp bs=$(get_page_size client) count=30 ||


 Comments   
Comment by Gerrit Updater [ 27/May/18 ]

Vladimir Saveliev (c17830@cray.com) uploaded a new patch: https://review.whamcloud.com/32565
Subject: LU-11053 llite: get file size in ll_file_open()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0078cef4a222051b88fd240f961bf4acafda1a6c

Comment by Gerrit Updater [ 29/May/18 ]

Vladimir Saveliev (c17830@cray.com) uploaded a new patch: https://review.whamcloud.com/32573
Subject: LU-11053 llite: check loop device file size before read
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: efb697a45b56388fc4aba38f176996b054072ff0

Comment by Andreas Dilger [ 08/Aug/18 ]

This could be considered a bug in the kernel rather than in Lustre, that the losetup code does not revalidate the inode size before it is attaching the device to the inode. It seems likely that you could hit this same bug with NFS, which gives you a legitimate reason to submit a patch upstream and to the distro kernels. Getting this fixed in the upstream kernels would at least put an upper limit on how long we need to keep a workaround in the Lustre code.

Comment by Vladimir Saveliev [ 27/May/19 ]

Andreas,

I tried something like

--- linux-3.10.0-862.14.4.el7.x86_64.orig/drivers/block/loop.c
+++ linux-3.10.0-862.14.4.el7.x86_64/drivers/block/loop.c
@@ -853,6 +853,7 @@ static int loop_set_fd(struct loop_devic
+       struct kstat stat;
...
+       if (vfs_fstat(arg, &stat))
+               goto out_putf;

That helps for lustre file, but does not in case of NFS which seems to retain outdated file size, although it has been changed on server right after NFS open.

It seems that there is no suitable inode operation for the invalidation. Do you have a clue on how that could be done?
Thanks

PS: In case of NFS, the BUG_ON(!buffer_mapped(bh)) does not happen on umount.
mount fails with "EXT4-fs (loop0): VFS: Can't find ext4 filesystem" when it tries to mount loop device associated with zero length nfs file.

Comment by Andreas Dilger [ 27/May/19 ]

If problem can be reproduced with NFS, then the best solution is to submit a patch upstream (don't mention Lustre, just NFS) to add a call to ->d_revalidate before accessing the size. This is the correct thing to do anyway - it only adds overhead when loopback files are being used.

I tested this locally, but was unable to reproduce it on my Ubuntu client because losetup is calling lstat() on the filename before accessing it (using util-linux-ng-2.29.2:

stat64("/dev/loop0", {st_mode=S_IFBLK|0660, st_rdev=makedev(7, 0), ...}) = 0
lstat64("/myth", {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0
lstat64("/myth/tmp", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=4096, ...}) = 0
lstat64("/myth/tmp/MythDora-12.23-X64-DVD", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", {st_mode=S_IFREG|0644, st_size=1243049984, ...}) = 0
open("/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", O_RDWR|O_LARGEFILE|O_CLOEXEC) = 3
open("/dev/loop0", O_RDWR|O_LARGEFILE|O_CLOEXEC) = 4ioctl(4, LOOP_SET_FD, 3)                = 0
ioctl(4, LOOP_SET_STATUS64, {lo_offset=0, lo_number=0, lo_flags=0, lo_file_name="/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", ...}) = 0
close(3)                                = 0
stat64("/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", {st_mode=S_IFREG|0644, st_size=1243049984, ...}) = 0

Running on an old RHEL6 client losetup from util-linux-ng-2.17.2 uses readlink() instead of stat():

readlink("/myth", 0x7ffd359c6a50, 4096) = -1 EINVAL (Invalid argument)
readlink("/myth/tmp", 0x7ffd359c6a50, 4096) = -1 EINVAL (Invalid argument)
readlink("/myth/tmp/MythDora-12.23-X64-DVD", 0x7ffd359c6a50, 4096) = -1 EINVAL (Invalid argument)
readlink("/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", 0x7ffd359c6a50, 4096) = -1 EINVAL (Invalid argument)
ioctl(4, LOOP_SET_FD, 0x3)              = 0
close(3)                                = 0
ioctl(4, LOOP_SET_STATUS64, {offset=0, number=0, flags=0, file_name="/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", ...}) = 0
close(4)                                = 0
stat("/myth/tmp/MythDora-12.23-X64-DVD/MythDora-12-x86_64-DVD.iso", {st_mode=S_IFREG|0644, st_size=1243049984, ...}) = 0

It may be that updating util-linux-ng will fix this for you, but it would still be good to fix it in the upstream kernel. As for fixing it in Lustre, I think it would be best to limit this workaround to the "losetup" binary so that it doesn't hurt all open operations.

Generated at Sat Feb 10 02:40:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.