[LU-8469] Sanity test 54c: Unable to unmount loop device Created: 02/Aug/16  Updated: 14/Dec/21  Resolved: 14/Dec/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Doug Oucharek (Inactive) Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

RHEL 7, master, VMs


Attachments: Text File sanity.test_54c.debug_log.MRtest01.1469766696.log     Text File sanity.test_54c.debug_log.MRtest02.1469766696.log     Text File sanity.test_54c.debug_log.MRtest03.1469766696.log     Text File sanity.test_54c.dmesg.MRtest01.1469766696.log     Text File sanity.test_54c.dmesg.MRtest02.1469766696.log     Text File sanity.test_54c.dmesg.MRtest03.1469766696.log     Text File sanity.test_54c.test_log.MRtest01.log    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Ran into a lock-up of sanity test 133g when it tried to unmount the Lustre file system. It kept printing this to the console:

/mnt/client is still busy, wait one second
/mnt/client is still busy, wait one second
/mnt/client is still busy, wait one second

I stopped the test and tried to unmount manually. That failed with "file system busy" error.

I found that a loop device was still mounted against the Lustre file system. This is a left over from test 54c.

Looking back at the logs, I can see that 54c is unable to unmount the loop device:

[ 1529.373768] Lustre: DEBUG MARKER: == sanity test 54c: block device works in lustre ===================================================== 21:31:35 (1469766695)
[ 1529.543207] EXT4-fs (loop3): mounting ext2 file system using the ext4 subsystem
[ 1529.550865] EXT4-fs (loop3): mounted filesystem without journal. Opts: (null)
[ 1529.810716] Lustre: DEBUG MARKER: sanity test_54c: @@@@@@ FAIL: test_54c failed with 32

I looked at other sanity runs which were successful, and found this:

== sanity test 54c: block device works in lustre ===================================================== 17:27:05 (1469838425)
make a loop file system with /mnt/client/f54c.sanity on /mnt/client/loop54c (3).
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.00140237 s, 2.9 MB/s
mke2fs 1.42.12.wc1 (15-Sep-2014)
Creating filesystem with 4100 1k blocks and 1032 inodes

Allocating group tables: done
Writing inode tables: done
Writing superblocks and filesystem accounting information: done

30+0 records in
30+0 records out
122880 bytes (123 kB) copied, 0.000805954 s, 152 MB/s
Filesystem 1K-blocks Used Available Use% Mounted on
/mnt/client/loop54c 3966 151 3610 5% /mnt/client/d54c.sanity
30+0 records in
30+0 records out
122880 bytes (123 kB) copied, 0.00015479 s, 794 MB/s
losetup: /mnt/client/loop54c: detach failed: No such device or address
losetup: /dev/loop3: detach failed: No such device or address
Resetting fail_loc on all nodes...done.
PASS 54c (1s)

So, 54c is passing when it cannot use the loop device?!?

This all seems very broken to me. 54c passes when we can't use the loop device (why can't we use it) and fail when we do mount the loop device because it cannot unmount it.



 Comments   
Comment by Doug Oucharek (Inactive) [ 02/Aug/16 ]

I'm attaching all the test 54c test logs for the failing case.

Comment by Oleg Drokin [ 02/Aug/16 ]
122880 bytes (123 kB) copied, 0.00015479 s, 794 MB/s
losetup: /mnt/client/loop54c: detach failed: No such device or address
losetup: /dev/loop3: detach failed: No such device or address

this is understandable:

cleanup_54c() {
        loopdev="$DIR/loop54c"

        trap 0
        $UMOUNT $DIR/$tdir || rc=$?
        losetup -d $loopdev || true
        losetup -d $LOOPDEV || true

So normally unmount when successful would detach loop device and the following losetups would error out which is what you have quoted in the ticket description- this is normal.
It would be good if the cleanup script checked that the loop device is setup to avoid confusing error messages, but otherwise it is pretty harmless.

Now the failure to unmount as in the attached log - that's a problem - something holds the mountpoint. when this happens would be great to see what was that. some background process got to it?
I assume by the time you get to test 133g failure you can just unmount the /mnt/client/d54c.sanity and it then can unmount stuff?
So we really need to add some debug in 54c if the unmount gfails to see what holds the mountpoint busy in there

Generated at Sat Feb 10 02:17:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.