[LU-2876] Test zfs timeout failure on test suite conf-sanity test_56 Created: 26/Feb/13  Updated: 19/Mar/13  Resolved: 19/Mar/13

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Yang Sheng
Resolution: Duplicate Votes: 0
Labels: LB, zfs

Severity: 3
Rank (Obsolete): 6950

 Description   

This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/a043a2ea-807f-11e2-b777-52540035b04c.

The sub-test test_56 failed with the following error:

test failed to respond and timed out

Info required for matching: conf-sanity 56



 Comments   
Comment by Yang Sheng [ 15/Mar/13 ]

From logs,

Feb 26 11:08:15 wtm-19vm7 mrshd[7748]: root@wtm-19vm2.rosso.whamcloud.com as root: cmd='(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib64/lustre/tests; LUSTRE="/usr/lib64/lustre" sh -c "umount -d -f /mnt/mds1");echo XXRETCODE:$?'
Feb 26 11:08:17 wtm-19vm7 kernel: LustreError: 137-5: lustre-MDT0000: Not available for connect from 10.10.16.202@tcp (stopping)
Feb 26 11:08:18 wtm-19vm7 kernel: LustreError: 7388:0:(client.c:1048:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff8800439a3400 x1428061600874724/t0(0) o13->lustre-OST03e8-osc-MDT0000@10.10.16.202@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
Feb 26 11:08:19 wtm-19vm7 /usr/sbin/gmond[1866]: Error 1 sending the modular data for cpu_user#012
Feb 26 11:10:31 wtm-19vm7 kernel: imklog 5.8.10, log source = /proc/kmsg started.
Feb 26 11:10:31 wtm-19vm7 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1638" x-info="http://www.rsyslog.com"] start
Feb 26 11:10:31 wtm-19vm7 kernel: Initializing cgroup subsys cpuset
Feb 26 11:10:31 wtm-19vm7 kernel: Initializing cgroup subsys cpu
Feb 26 11:10:31 wtm-19vm7 kernel: Linux version 2.6.32-279.19.1.el6_lustre.ge7838ff.x86_64 (jenkins@builder-1-sde1-el6-x8664.lab.whamcloud.com) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Mon Feb 11 00:35:42 PST 2013
Feb 26 11:10:31 wtm-19vm7 kernel: Command line: ro root=UUID=05eefdd8-3ae0-4f76-9518-7a22325a78c6 rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD console=ttyS0,115200 crashkernel=auto SYSFONT=latarcyrheb-sun16  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM
Feb 26 11:10:31 wtm-19vm7 kernel: KERNEL supported cpus:
Feb 26 11:10:31 wtm-19vm7 kernel:  Intel GenuineIntel
Feb 26 11:10:31 wtm-19vm7 kernel:  AMD AuthenticAMD
Feb 26 11:10:31 wtm-19vm7 kernel:  Centaur CentaurHauls
Feb 26 11:10:31 wtm-19vm7 kernel: Disabled fast string operations
Feb 26 11:10:31 wtm-19vm7 kernel: BIOS-provided physical RAM map:

The mds host be reboot after umount issued. Client just waiting on pdsh finally timeout trigger the Sysrq. Don't know why it reboot just from maloo data. Maybe kernel crash? But i remember we always hold on the crash nodes instead of reboot it. So i think this issue not relate to lustre self.

Comment by Jodi Levi (Inactive) [ 19/Mar/13 ]

Duplicate of LU-2828. Actual crash is visible in console-log for test 53.

Generated at Sat Feb 10 01:28:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.