[LU-1931] Test failure on test suite conf-sanity, subtest test_5d Created: 13/Sep/12  Updated: 02/Apr/13  Resolved: 02/Apr/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: zfs

Issue Links:
Duplicate
duplicates LU-2059 mgc to backup configuration on osd-ba... Resolved
Severity: 3
Rank (Obsolete): 4171

 Description   

This issue was created by maloo for Li Wei <liwei@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/0be5975e-f9b4-11e1-b8d8-52540035b04c.

The sub-test test_5d failed with the following error:

test failed to respond and timed out

From the test output:

== conf-sanity test 5d: mount with ost down ========================================================== 00:57:50 (1347004670)
Loading modules from /usr/lib64/lustre
detected 1 online CPUs by getconf
libcfs will create CPU partition based on online CPUs
../libcfs/libcfs/libcfs options: 'cpu_npartitions=0'
debug=-1
subsystem_debug=0xffb7e3ff
../lnet/lnet/lnet options: 'accept=all'
gss/krb5 is not supported
loading modules on client-19vm2,client-19vm3,client-19vm4
CMD: client-19vm2,client-19vm3,client-19vm4 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/1.4-gcc/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin: NAME=autotest_config sh rpc.sh load_modules_local 
client-19vm3: Loading modules from /usr/lib64/lustre
client-19vm3: detected 1 online CPUs by sysfs
client-19vm3: libcfs will create CPU partition based on online CPUs
client-19vm3: ../libcfs/libcfs/libcfs options: 'cpu_npartitions=0'
client-19vm2: Loading modules from /usr/lib64/lustre
client-19vm2: detected 1 online CPUs by getconf
client-19vm2: libcfs will create CPU partition based on online CPUs
client-19vm2: ../libcfs/libcfs/libcfs options: 'cpu_npartitions=0'
client-19vm4: Loading modules from /usr/lib64/lustre
client-19vm4: detected 1 online CPUs by sysfs
client-19vm4: libcfs will create CPU partition based on online CPUs
client-19vm4: ../libcfs/libcfs/libcfs options: 'cpu_npartitions=0'
client-19vm2: debug=0x33f0404
client-19vm3: debug=0x33f0404
client-19vm2: subsystem_debug=0xffb7e3ff
client-19vm3: subsystem_debug=0xffb7e3ff
client-19vm2: ../lnet/lnet/lnet options: 'accept=all'
client-19vm4: debug=0x33f0404
client-19vm4: subsystem_debug=0xffb7e3ff
client-19vm2: gss/krb5 is not supported
client-19vm3: gss/krb5 is not supported
client-19vm4: gss/krb5 is not supported
start ost1 service on client-19vm4
CMD: client-19vm4 mkdir -p /mnt/ost1
Starting ost1:   lustre-ost1/ost1 /mnt/ost1
CMD: client-19vm4 mkdir -p /mnt/ost1; mount -t lustre   		                   lustre-ost1/ost1 /mnt/ost1
client-19vm4: mount.lustre: mount lustre-ost1/ost1 at /mnt/ost1 failed: Input/output error
client-19vm4: Is the MGS running?
Start of lustre-ost1/ost1 on ost1 failed 5
start mds service on client-19vm3
CMD: client-19vm3 mkdir -p /mnt/mds1
CMD: client-19vm3 test -b /dev/lvm-MDS/P1
Starting mds1: -o user_xattr,acl  /dev/lvm-MDS/P1 /mnt/mds1
CMD: client-19vm3 mkdir -p /mnt/mds1; mount -t lustre -o user_xattr,acl  		                   /dev/lvm-MDS/P1 /mnt/mds1
CMD: client-19vm3 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/1.4-gcc/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin: NAME=autotest_config sh rpc.sh set_default_debug \"-1\" \" 0xffb7e3ff\" 32 
CMD: client-19vm3 e2label /dev/lvm-MDS/P1 2>/dev/null
Started lustre-MDT0000
stop ost1 service on client-19vm4
CMD: client-19vm4 grep -c /mnt/ost1' ' /proc/mounts
CMD: client-19vm4 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
mount lustre on /mnt/lustre.....
Starting client: client-19vm1.lab.whamcloud.com: -o user_xattr,acl,flock client-19vm3@tcp:/lustre /mnt/lustre
CMD: client-19vm1.lab.whamcloud.com mkdir -p /mnt/lustre
CMD: client-19vm1.lab.whamcloud.com mount -t lustre -o user_xattr,acl,flock client-19vm3@tcp:/lustre /mnt/lustre

Info required for matching: conf-sanity 5d



 Comments   
Comment by Keith Mannthey (Inactive) [ 13/Sep/12 ]

client-19vm2 (Client 2) rebooted at the start of this test for some unknown reason.

We need dmesg /var/log/messages logs from the system.

Comment by Minh Diep [ 29/Sep/12 ]

Keith, I can reproduce this in my vm (client-12vm[1-4]). it's hung at the same state right now. Please take a look or let me know when you want to look and I will reproduce it again. The way to do that is set USE_OFD=yes and OSTFSTYPE=zfs, LOAD_REMOTE_MODULE=true

Comment by Minh Diep [ 01/Oct/12 ]

this seems to be ORI-740, patch http://review.whamcloud.com/#change,3678. it needs to land on 2.3

Comment by Minh Diep [ 01/Oct/12 ]

== conf-sanity test 19b: start/stop OSTs without MDS == 09:45:26 (1349109926)
start ost1 service on client-12vm4
CMD: client-12vm4 mkdir -p /mnt/ost1
Starting ost1: lustre-ost1/ost1 /mnt/ost1
CMD: client-12vm4 mkdir -p /mnt/ost1; mount -t lustre lustre-ost1/ost1 /mnt/ost1
client-12vm4: mount.lustre: mount lustre-ost1/ost1 at /mnt/ost1 failed: Input/output error
client-12vm4: Is the MGS running?
Start of lustre-ost1/ost1 on ost1 failed 5
conf-sanity test_19b: @@@@@@ FAIL: test_19b failed with 1
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:3640:error_noexit()
= /usr/lib64/lustre/tests/test-framework.sh:3662:error()
= /usr/lib64/lustre/tests/test-framework.sh:3898:run_one()
= /usr/lib64/lustre/tests/test-framework.sh:3928:run_one_logged()
= /usr/lib64/lustre/tests/test-framework.sh:3799:run_test()
= /usr/lib64/lustre/tests/conf-sanity.sh:655:main()
Dumping lctl log to /tmp/test_logs/2012-10-01/092312/conf-sanity.test_19b.*.1349109952.log
CMD: client-12vm1.lab.whamcloud.com,client-12vm2,client-12vm3,client-12vm4 /usr/sbin/lctl dk > /tmp/test_logs/2012-10-01/092312/conf-sanity.test_19b.debug_log.\$(hostname -s).1349109952.log;
dmesg > /tmp/test_logs/2012-10-01/092312/conf-sanity.test_19b.dmesg.\$(hostname -s).1349109952.log
CMD: client-12vm1.lab.whamcloud.com,client-12vm2,client-12vm3,client-12vm4 rsync -az /tmp/test_logs/2012-10-01/092312/conf-sanity.test_19b.*.1349109952.log client-12vm1.lab.whamcloud.com:/tmp/test_logs/2012-10-01/092312
client-12vm1: Host key verification failed.
client-12vm1: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
client-12vm1: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
client-12vm3: Host key verification failed.
client-12vm3: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
client-12vm3: rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6]
client-12vm2: Host key verification failed.
client-12vm2: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
client-12vm2: rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6]
client-12vm4: Host key verification failed.
client-12vm4: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
client-12vm4: rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6]
test_19b returned 1
FAIL 19b (28s)

Comment by Alex Zhuravlev [ 01/Oct/12 ]

this is a dup of http://jira.whamcloud.com/browse/LU-2059

Comment by Alex Zhuravlev [ 01/Oct/12 ]

basically, all the tests where OST starts w/o MGS running should be disabled with ZFS untill LU-2059 is done.

Comment by Nathaniel Clark [ 02/Apr/13 ]

Per Alex's comment, closing this as duplicate of LU-2059.
This test was excepted by patch http://review.whamcloud.com/4234

Generated at Sat Feb 10 01:20:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.