[LU-1689] Test failure on test suite mmp, subtest test_8 Created: 29/Jul/12  Updated: 04/Jan/13  Resolved: 16/Aug/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0, Lustre 2.1.3
Fix Version/s: Lustre 2.3.0, Lustre 2.1.3

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Jian Yu
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-1435 failure on mmp.sh, test_8 "Device or ... Closed
Severity: 3
Rank (Obsolete): 4494

 Description   

This issue was created by maloo for Li Wei <liwei@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/33a228d8-d877-11e1-ba66-52540035b04c.

The sub-test test_8 failed with the following error:

test_8 failed with 2

== mmp test 8: mount during e2fsck =================================================================== 22:27:31 (1343453251)
Running e2fsck on the device /dev/lvm-MDS/P1 on mds1...
Mounting /dev/lvm-MDS/P1 on mds1...
CMD: client-27vm3 e2fsck -fy /dev/lvm-MDS/P1
client-27vm3: e2fsck 1.42.3.wc1 (28-May-2012)
CMD: client-27vm3 mkdir -p /mnt/mds1
CMD: client-27vm3 test -b /dev/lvm-MDS/P1
Starting mds1: -o user_xattr,acl  /dev/lvm-MDS/P1 /mnt/mds1
CMD: client-27vm3 mkdir -p /mnt/mds1; mount -t lustre -o user_xattr,acl  		                   /dev/lvm-MDS/P1 /mnt/mds1
client-27vm3: mount.lustre: mount /dev/dm-0 at /mnt/mds1 failed: Device or resource busy
Start of /dev/lvm-MDS/P1 on mds1 failed 16
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Setting filetype for entry 'last_rcvd' in / (2) to 1.
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

lustre-MDT0000: ***** FILE SYSTEM WAS MODIFIED *****
lustre-MDT0000: 122/122650624 files (11.5% non-contiguous), 15460665/61316096 blocks

Running e2fsck on the device /dev/lvm-OSS/P1 on ost1...
CMD: client-27vm4 e2fsck -fy /dev/lvm-OSS/P1
client-27vm4: e2fsck 1.42.3.wc1 (28-May-2012)
Mounting /dev/lvm-OSS/P1 on ost1...
CMD: client-27vm4 mkdir -p /mnt/ost1
CMD: client-27vm4 test -b /dev/lvm-OSS/P1
Starting ost1:   /dev/lvm-OSS/P1 /mnt/ost1
CMD: client-27vm4 mkdir -p /mnt/ost1; mount -t lustre   		                   /dev/lvm-OSS/P1 /mnt/ost1
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
lustre-OST0000: 100/1981440 files (1.0% non-contiguous), 243716/33790976 blocks
CMD: client-27vm4 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/1.4-gcc/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin::/sbin NAME=autotest_config sh rpc.sh set_default_debug \"0x33f0404\" \" 0xffb7e3ff\" 32 
CMD: client-27vm4 e2label /dev/lvm-OSS/P1 2>/dev/null
Started lustre-OST0000
 mmp test_8: @@@@@@ FAIL: mount /dev/lvm-OSS/P1 on ost1 should fail 
Dumping lctl log to /logdir/test_logs/2012-07-27/lustre-reviews-el6-x86_64-el5-x86_64__7972__-7f9bca970cd0/mmp.test_8.*.1343453321.log
CMD: client-27vm3,client-27vm4,client-27vm5,client-27vm6.lab.whamcloud.com /usr/sbin/lctl dk > /logdir/test_logs/2012-07-27/lustre-reviews-el6-x86_64-el5-x86_64__7972__-7f9bca970cd0/mmp.test_8.debug_log.\$(hostname -s).1343453321.log;
         dmesg > /logdir/test_logs/2012-07-27/lustre-reviews-el6-x86_64-el5-x86_64__7972__-7f9bca970cd0/mmp.test_8.dmesg.\$(hostname -s).1343453321.log
CMD: client-27vm4 grep -c /mnt/ost1' ' /proc/mounts
Stopping /mnt/ost1 (opts:) on client-27vm4
CMD: client-27vm4 umount -d /mnt/ost1
CMD: client-27vm4 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
 mmp test_8: @@@@@@ FAIL: test_8 failed with 2 
Dumping lctl log to /logdir/test_logs/2012-07-27/lustre-reviews-el6-x86_64-el5-x86_64__7972__-7f9bca970cd0/mmp.test_8.*.1343453348.log
CMD: client-27vm3,client-27vm4,client-27vm5,client-27vm6.lab.whamcloud.com /usr/sbin/lctl dk > /logdir/test_logs/2012-07-27/lustre-reviews-el6-x86_64-el5-x86_64__7972__-7f9bca970cd0/mmp.test_8.debug_log.\$(hostname -s).1343453348.log;
         dmesg > /logdir/test_logs/2012-07-27/lustre-reviews-el6-x86_64-el5-x86_64__7972__-7f9bca970cd0/mmp.test_8.dmesg.\$(hostname -s).1343453348.log

Info required for matching: mmp 8



 Comments   
Comment by Keith Mannthey (Inactive) [ 30/Jul/12 ]

I failed on mmp 8 as well. My code base is master + a small unrelated test change.
https://maloo.whamcloud.com/sub_tests/bed3ee2a-d87f-11e1-ba66-52540035b04c

CMD: client-26vm3 e2fsck -fy /dev/lvm-MDS/P1
CMD: client-26vm3 e2label /dev/lvm-MDS/P1 2>/dev/null
client-26vm3: e2fsck 1.42.3.wc1 (28-May-2012)
client-26vm3: e2fsck: Cannot continue, aborting.
client-26vm3: 
client-26vm3: 
/dev/lvm-MDS/P1 is in use.
Started lustre-MDT0000
 mmp test_8: @@@@@@ FAIL: mount /dev/lvm-MDS/P1 on mds1 should fail 

This looks like a similar failure just on the MDS rather than the OST.

Comment by Peter Jones [ 02/Aug/12 ]

Minh is going to look into this one

Comment by nasf (Inactive) [ 03/Aug/12 ]

Another failure instance:

https://maloo.whamcloud.com/test_sets/40afbf52-dd33-11e1-85a8-52540035b04c

Comment by Minh Diep [ 03/Aug/12 ]

I haven't been able to reproduce this manually. Perhaps this related to the vm we are using in the lab

Comment by Sarah Liu [ 03/Aug/12 ]

Hit this error when doing manual test on tag-2.2.92 OFED build with physical nodes
https://maloo.whamcloud.com/test_sets/25042774-dda7-11e1-85a8-52540035b04c

Comment by nasf (Inactive) [ 03/Aug/12 ]

another failure:

https://maloo.whamcloud.com/test_sets/7be7c896-ddae-11e1-85a8-52540035b04c

Comment by nasf (Inactive) [ 03/Aug/12 ]

Another similar failure:

https://maloo.whamcloud.com/test_sets/2a7e6362-dd8f-11e1-85a8-52540035b04c

Comment by Minh Diep [ 05/Aug/12 ]

This is a timing issue

CMD: fat-intel-3vm3 e2fsck -fy /dev/lvm-MDS/P1
fat-intel-3vm3: lnet.debug=0x33f0404
fat-intel-3vm3: lnet.subsystem_debug=0xffb7e3ff
fat-intel-3vm3: lnet.debug_mb=32
CMD: fat-intel-3vm3 e2label /dev/lvm-MDS/P1
fat-intel-3vm3: e2fsck 1.42.3.wc1 (28-May-2012)
fat-intel-3vm3: device /dev/mapper/lvm--MDS-P1 mounted by lustre per /proc/fs/lustre/mds/lustre-MDT0000/mntdev
fat-intel-3vm3: e2fsck: Cannot continue, aborting.
fat-intel-3vm3:
fat-intel-3vm3:
/dev/lvm-MDS/P1 is mounted.
Started lustre-MDT0000
mmp test_8: @@@@@@ FAIL: mount /dev/lvm-MDS/P1 on mds should fail

The mount command was executed before the e2fsck. Hence, the e2fsck failed instead of mount command fail as expected.

Comment by Jian Yu [ 07/Aug/12 ]

Yes, it's a timing issue. We have to update the script to make sure that mount operation is really performed during e2fsck.

Comment by Jinshan Xiong (Inactive) [ 08/Aug/12 ]

This ticket is blocking maloo test quite often. I've hit it 3 times in my test.

Comment by Minh Diep [ 11/Aug/12 ]

patch http://review.whamcloud.com/#change,3569

Comment by Jian Yu [ 13/Aug/12 ]

Another instance on Lustre 2.1.3 RC1:
https://maloo.whamcloud.com/test_sets/46ad1cae-e4bf-11e1-af05-52540035b04c

Comment by Peter Jones [ 13/Aug/12 ]

Yujian

Minh is on vacation this week so could you please take care of revising the latest fix for this issue?

Thanks

Peter

Comment by Jian Yu [ 14/Aug/12 ]

The patch in http://review.whamcloud.com/#change,3569 was updated.

Comment by Zhenyu Xu [ 14/Aug/12 ]

When the patch passes test, please port it to b2_1 as well, b2_1 branch autotest also suffers the same issue.

Comment by Jian Yu [ 14/Aug/12 ]

Patch for b2_1 branch: http://review.whamcloud.com/#change,3643

Comment by Peter Jones [ 16/Aug/12 ]

Landed for 2.3

Comment by Jian Yu [ 21/Aug/12 ]

The issue still occurred:
https://maloo.whamcloud.com/test_sets/ea813ece-eb26-11e1-b137-52540035b04c
https://maloo.whamcloud.com/test_sets/b8435700-eb29-11e1-ba73-52540035b04c

The new patch is in LU-1762.

Generated at Sat Feb 10 01:18:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.