Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1931

Test failure on test suite conf-sanity, subtest test_5d

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.3.0
    • 3
    • 4171

    Description

      This issue was created by maloo for Li Wei <liwei@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/0be5975e-f9b4-11e1-b8d8-52540035b04c.

      The sub-test test_5d failed with the following error:

      test failed to respond and timed out

      From the test output:

      == conf-sanity test 5d: mount with ost down ========================================================== 00:57:50 (1347004670)
      Loading modules from /usr/lib64/lustre
      detected 1 online CPUs by getconf
      libcfs will create CPU partition based on online CPUs
      ../libcfs/libcfs/libcfs options: 'cpu_npartitions=0'
      debug=-1
      subsystem_debug=0xffb7e3ff
      ../lnet/lnet/lnet options: 'accept=all'
      gss/krb5 is not supported
      loading modules on client-19vm2,client-19vm3,client-19vm4
      CMD: client-19vm2,client-19vm3,client-19vm4 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/1.4-gcc/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin: NAME=autotest_config sh rpc.sh load_modules_local 
      client-19vm3: Loading modules from /usr/lib64/lustre
      client-19vm3: detected 1 online CPUs by sysfs
      client-19vm3: libcfs will create CPU partition based on online CPUs
      client-19vm3: ../libcfs/libcfs/libcfs options: 'cpu_npartitions=0'
      client-19vm2: Loading modules from /usr/lib64/lustre
      client-19vm2: detected 1 online CPUs by getconf
      client-19vm2: libcfs will create CPU partition based on online CPUs
      client-19vm2: ../libcfs/libcfs/libcfs options: 'cpu_npartitions=0'
      client-19vm4: Loading modules from /usr/lib64/lustre
      client-19vm4: detected 1 online CPUs by sysfs
      client-19vm4: libcfs will create CPU partition based on online CPUs
      client-19vm4: ../libcfs/libcfs/libcfs options: 'cpu_npartitions=0'
      client-19vm2: debug=0x33f0404
      client-19vm3: debug=0x33f0404
      client-19vm2: subsystem_debug=0xffb7e3ff
      client-19vm3: subsystem_debug=0xffb7e3ff
      client-19vm2: ../lnet/lnet/lnet options: 'accept=all'
      client-19vm4: debug=0x33f0404
      client-19vm4: subsystem_debug=0xffb7e3ff
      client-19vm2: gss/krb5 is not supported
      client-19vm3: gss/krb5 is not supported
      client-19vm4: gss/krb5 is not supported
      start ost1 service on client-19vm4
      CMD: client-19vm4 mkdir -p /mnt/ost1
      Starting ost1:   lustre-ost1/ost1 /mnt/ost1
      CMD: client-19vm4 mkdir -p /mnt/ost1; mount -t lustre   		                   lustre-ost1/ost1 /mnt/ost1
      client-19vm4: mount.lustre: mount lustre-ost1/ost1 at /mnt/ost1 failed: Input/output error
      client-19vm4: Is the MGS running?
      Start of lustre-ost1/ost1 on ost1 failed 5
      start mds service on client-19vm3
      CMD: client-19vm3 mkdir -p /mnt/mds1
      CMD: client-19vm3 test -b /dev/lvm-MDS/P1
      Starting mds1: -o user_xattr,acl  /dev/lvm-MDS/P1 /mnt/mds1
      CMD: client-19vm3 mkdir -p /mnt/mds1; mount -t lustre -o user_xattr,acl  		                   /dev/lvm-MDS/P1 /mnt/mds1
      CMD: client-19vm3 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/1.4-gcc/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin: NAME=autotest_config sh rpc.sh set_default_debug \"-1\" \" 0xffb7e3ff\" 32 
      CMD: client-19vm3 e2label /dev/lvm-MDS/P1 2>/dev/null
      Started lustre-MDT0000
      stop ost1 service on client-19vm4
      CMD: client-19vm4 grep -c /mnt/ost1' ' /proc/mounts
      CMD: client-19vm4 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
      mount lustre on /mnt/lustre.....
      Starting client: client-19vm1.lab.whamcloud.com: -o user_xattr,acl,flock client-19vm3@tcp:/lustre /mnt/lustre
      CMD: client-19vm1.lab.whamcloud.com mkdir -p /mnt/lustre
      CMD: client-19vm1.lab.whamcloud.com mount -t lustre -o user_xattr,acl,flock client-19vm3@tcp:/lustre /mnt/lustre
      

      Info required for matching: conf-sanity 5d

      Attachments

        Issue Links

          Activity

            [LU-1931] Test failure on test suite conf-sanity, subtest test_5d

            Per Alex's comment, closing this as duplicate of LU-2059.
            This test was excepted by patch http://review.whamcloud.com/4234

            utopiabound Nathaniel Clark added a comment - Per Alex's comment, closing this as duplicate of LU-2059 . This test was excepted by patch http://review.whamcloud.com/4234

            basically, all the tests where OST starts w/o MGS running should be disabled with ZFS untill LU-2059 is done.

            bzzz Alex Zhuravlev added a comment - basically, all the tests where OST starts w/o MGS running should be disabled with ZFS untill LU-2059 is done.
            bzzz Alex Zhuravlev added a comment - this is a dup of http://jira.whamcloud.com/browse/LU-2059
            mdiep Minh Diep added a comment -

            == conf-sanity test 19b: start/stop OSTs without MDS == 09:45:26 (1349109926)
            start ost1 service on client-12vm4
            CMD: client-12vm4 mkdir -p /mnt/ost1
            Starting ost1: lustre-ost1/ost1 /mnt/ost1
            CMD: client-12vm4 mkdir -p /mnt/ost1; mount -t lustre lustre-ost1/ost1 /mnt/ost1
            client-12vm4: mount.lustre: mount lustre-ost1/ost1 at /mnt/ost1 failed: Input/output error
            client-12vm4: Is the MGS running?
            Start of lustre-ost1/ost1 on ost1 failed 5
            conf-sanity test_19b: @@@@@@ FAIL: test_19b failed with 1
            Trace dump:
            = /usr/lib64/lustre/tests/test-framework.sh:3640:error_noexit()
            = /usr/lib64/lustre/tests/test-framework.sh:3662:error()
            = /usr/lib64/lustre/tests/test-framework.sh:3898:run_one()
            = /usr/lib64/lustre/tests/test-framework.sh:3928:run_one_logged()
            = /usr/lib64/lustre/tests/test-framework.sh:3799:run_test()
            = /usr/lib64/lustre/tests/conf-sanity.sh:655:main()
            Dumping lctl log to /tmp/test_logs/2012-10-01/092312/conf-sanity.test_19b.*.1349109952.log
            CMD: client-12vm1.lab.whamcloud.com,client-12vm2,client-12vm3,client-12vm4 /usr/sbin/lctl dk > /tmp/test_logs/2012-10-01/092312/conf-sanity.test_19b.debug_log.\$(hostname -s).1349109952.log;
            dmesg > /tmp/test_logs/2012-10-01/092312/conf-sanity.test_19b.dmesg.\$(hostname -s).1349109952.log
            CMD: client-12vm1.lab.whamcloud.com,client-12vm2,client-12vm3,client-12vm4 rsync -az /tmp/test_logs/2012-10-01/092312/conf-sanity.test_19b.*.1349109952.log client-12vm1.lab.whamcloud.com:/tmp/test_logs/2012-10-01/092312
            client-12vm1: Host key verification failed.
            client-12vm1: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
            client-12vm1: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
            client-12vm3: Host key verification failed.
            client-12vm3: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
            client-12vm3: rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6]
            client-12vm2: Host key verification failed.
            client-12vm2: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
            client-12vm2: rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6]
            client-12vm4: Host key verification failed.
            client-12vm4: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
            client-12vm4: rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6]
            test_19b returned 1
            FAIL 19b (28s)

            mdiep Minh Diep added a comment - == conf-sanity test 19b: start/stop OSTs without MDS == 09:45:26 (1349109926) start ost1 service on client-12vm4 CMD: client-12vm4 mkdir -p /mnt/ost1 Starting ost1: lustre-ost1/ost1 /mnt/ost1 CMD: client-12vm4 mkdir -p /mnt/ost1; mount -t lustre lustre-ost1/ost1 /mnt/ost1 client-12vm4: mount.lustre: mount lustre-ost1/ost1 at /mnt/ost1 failed: Input/output error client-12vm4: Is the MGS running? Start of lustre-ost1/ost1 on ost1 failed 5 conf-sanity test_19b: @@@@@@ FAIL: test_19b failed with 1 Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:3640:error_noexit() = /usr/lib64/lustre/tests/test-framework.sh:3662:error() = /usr/lib64/lustre/tests/test-framework.sh:3898:run_one() = /usr/lib64/lustre/tests/test-framework.sh:3928:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:3799:run_test() = /usr/lib64/lustre/tests/conf-sanity.sh:655:main() Dumping lctl log to /tmp/test_logs/2012-10-01/092312/conf-sanity.test_19b.*.1349109952.log CMD: client-12vm1.lab.whamcloud.com,client-12vm2,client-12vm3,client-12vm4 /usr/sbin/lctl dk > /tmp/test_logs/2012-10-01/092312/conf-sanity.test_19b.debug_log.\$(hostname -s).1349109952.log; dmesg > /tmp/test_logs/2012-10-01/092312/conf-sanity.test_19b.dmesg.\$(hostname -s).1349109952.log CMD: client-12vm1.lab.whamcloud.com,client-12vm2,client-12vm3,client-12vm4 rsync -az /tmp/test_logs/2012-10-01/092312/conf-sanity.test_19b.*.1349109952.log client-12vm1.lab.whamcloud.com:/tmp/test_logs/2012-10-01/092312 client-12vm1: Host key verification failed. client-12vm1: rsync: connection unexpectedly closed (0 bytes received so far) [sender] client-12vm1: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6] client-12vm3: Host key verification failed. client-12vm3: rsync: connection unexpectedly closed (0 bytes received so far) [sender] client-12vm3: rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6] client-12vm2: Host key verification failed. client-12vm2: rsync: connection unexpectedly closed (0 bytes received so far) [sender] client-12vm2: rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6] client-12vm4: Host key verification failed. client-12vm4: rsync: connection unexpectedly closed (0 bytes received so far) [sender] client-12vm4: rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6] test_19b returned 1 FAIL 19b (28s)
            mdiep Minh Diep added a comment -

            this seems to be ORI-740, patch http://review.whamcloud.com/#change,3678. it needs to land on 2.3

            mdiep Minh Diep added a comment - this seems to be ORI-740, patch http://review.whamcloud.com/#change,3678 . it needs to land on 2.3
            mdiep Minh Diep added a comment -

            Keith, I can reproduce this in my vm (client-12vm[1-4]). it's hung at the same state right now. Please take a look or let me know when you want to look and I will reproduce it again. The way to do that is set USE_OFD=yes and OSTFSTYPE=zfs, LOAD_REMOTE_MODULE=true

            mdiep Minh Diep added a comment - Keith, I can reproduce this in my vm (client-12vm [1-4] ). it's hung at the same state right now. Please take a look or let me know when you want to look and I will reproduce it again. The way to do that is set USE_OFD=yes and OSTFSTYPE=zfs, LOAD_REMOTE_MODULE=true

            client-19vm2 (Client 2) rebooted at the start of this test for some unknown reason.

            We need dmesg /var/log/messages logs from the system.

            keith Keith Mannthey (Inactive) added a comment - client-19vm2 (Client 2) rebooted at the start of this test for some unknown reason. We need dmesg /var/log/messages logs from the system.

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: