Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2876

Test zfs timeout failure on test suite conf-sanity test_56

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • None
    • Lustre 2.4.0
    • 3
    • 6950

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/a043a2ea-807f-11e2-b777-52540035b04c.

      The sub-test test_56 failed with the following error:

      test failed to respond and timed out

      Info required for matching: conf-sanity 56

      Attachments

        Activity

          [LU-2876] Test zfs timeout failure on test suite conf-sanity test_56

          Duplicate of LU-2828. Actual crash is visible in console-log for test 53.

          jlevi Jodi Levi (Inactive) added a comment - Duplicate of LU-2828 . Actual crash is visible in console-log for test 53.
          ys Yang Sheng added a comment -

          From logs,

          Feb 26 11:08:15 wtm-19vm7 mrshd[7748]: root@wtm-19vm2.rosso.whamcloud.com as root: cmd='(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib64/lustre/tests; LUSTRE="/usr/lib64/lustre" sh -c "umount -d -f /mnt/mds1");echo XXRETCODE:$?'
          Feb 26 11:08:17 wtm-19vm7 kernel: LustreError: 137-5: lustre-MDT0000: Not available for connect from 10.10.16.202@tcp (stopping)
          Feb 26 11:08:18 wtm-19vm7 kernel: LustreError: 7388:0:(client.c:1048:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff8800439a3400 x1428061600874724/t0(0) o13->lustre-OST03e8-osc-MDT0000@10.10.16.202@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
          Feb 26 11:08:19 wtm-19vm7 /usr/sbin/gmond[1866]: Error 1 sending the modular data for cpu_user#012
          Feb 26 11:10:31 wtm-19vm7 kernel: imklog 5.8.10, log source = /proc/kmsg started.
          Feb 26 11:10:31 wtm-19vm7 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1638" x-info="http://www.rsyslog.com"] start
          Feb 26 11:10:31 wtm-19vm7 kernel: Initializing cgroup subsys cpuset
          Feb 26 11:10:31 wtm-19vm7 kernel: Initializing cgroup subsys cpu
          Feb 26 11:10:31 wtm-19vm7 kernel: Linux version 2.6.32-279.19.1.el6_lustre.ge7838ff.x86_64 (jenkins@builder-1-sde1-el6-x8664.lab.whamcloud.com) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Mon Feb 11 00:35:42 PST 2013
          Feb 26 11:10:31 wtm-19vm7 kernel: Command line: ro root=UUID=05eefdd8-3ae0-4f76-9518-7a22325a78c6 rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD console=ttyS0,115200 crashkernel=auto SYSFONT=latarcyrheb-sun16  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM
          Feb 26 11:10:31 wtm-19vm7 kernel: KERNEL supported cpus:
          Feb 26 11:10:31 wtm-19vm7 kernel:  Intel GenuineIntel
          Feb 26 11:10:31 wtm-19vm7 kernel:  AMD AuthenticAMD
          Feb 26 11:10:31 wtm-19vm7 kernel:  Centaur CentaurHauls
          Feb 26 11:10:31 wtm-19vm7 kernel: Disabled fast string operations
          Feb 26 11:10:31 wtm-19vm7 kernel: BIOS-provided physical RAM map:
          
          

          The mds host be reboot after umount issued. Client just waiting on pdsh finally timeout trigger the Sysrq. Don't know why it reboot just from maloo data. Maybe kernel crash? But i remember we always hold on the crash nodes instead of reboot it. So i think this issue not relate to lustre self.

          ys Yang Sheng added a comment - From logs, Feb 26 11:08:15 wtm-19vm7 mrshd[7748]: root@wtm-19vm2.rosso.whamcloud.com as root: cmd='(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib64/lustre/tests; LUSTRE="/usr/lib64/lustre" sh -c "umount -d -f /mnt/mds1");echo XXRETCODE:$?' Feb 26 11:08:17 wtm-19vm7 kernel: LustreError: 137-5: lustre-MDT0000: Not available for connect from 10.10.16.202@tcp (stopping) Feb 26 11:08:18 wtm-19vm7 kernel: LustreError: 7388:0:(client.c:1048:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8800439a3400 x1428061600874724/t0(0) o13->lustre-OST03e8-osc-MDT0000@10.10.16.202@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Feb 26 11:08:19 wtm-19vm7 /usr/sbin/gmond[1866]: Error 1 sending the modular data for cpu_user#012 Feb 26 11:10:31 wtm-19vm7 kernel: imklog 5.8.10, log source = /proc/kmsg started. Feb 26 11:10:31 wtm-19vm7 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1638" x-info="http://www.rsyslog.com"] start Feb 26 11:10:31 wtm-19vm7 kernel: Initializing cgroup subsys cpuset Feb 26 11:10:31 wtm-19vm7 kernel: Initializing cgroup subsys cpu Feb 26 11:10:31 wtm-19vm7 kernel: Linux version 2.6.32-279.19.1.el6_lustre.ge7838ff.x86_64 (jenkins@builder-1-sde1-el6-x8664.lab.whamcloud.com) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Mon Feb 11 00:35:42 PST 2013 Feb 26 11:10:31 wtm-19vm7 kernel: Command line: ro root=UUID=05eefdd8-3ae0-4f76-9518-7a22325a78c6 rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD console=ttyS0,115200 crashkernel=auto SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM Feb 26 11:10:31 wtm-19vm7 kernel: KERNEL supported cpus: Feb 26 11:10:31 wtm-19vm7 kernel: Intel GenuineIntel Feb 26 11:10:31 wtm-19vm7 kernel: AMD AuthenticAMD Feb 26 11:10:31 wtm-19vm7 kernel: Centaur CentaurHauls Feb 26 11:10:31 wtm-19vm7 kernel: Disabled fast string operations Feb 26 11:10:31 wtm-19vm7 kernel: BIOS-provided physical RAM map: The mds host be reboot after umount issued. Client just waiting on pdsh finally timeout trigger the Sysrq. Don't know why it reboot just from maloo data. Maybe kernel crash? But i remember we always hold on the crash nodes instead of reboot it. So i think this issue not relate to lustre self.

          People

            ys Yang Sheng
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: