Details
-
Bug
-
Resolution: Won't Fix
-
Minor
-
Lustre 2.1.0
-
None
-
Lustre Tag: v2_0_65_0
Lustre Build: http://newbuild.whamcloud.com/job/lustre-master/204/
e2fsprogs Build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/
Distro/Arch: RHEL6/x86_64(in-kernel OFED, kernel version: 2.6.32-131.2.1.el6)
ENABLE_QUOTA=yes
FAILURE_MODE=HARD
MGS/MDS Nodes: client-7-ib(active), client-8-ib(passive)
\ /
1 combined MGS/MDT
OSS Nodes: fat-amd-1-ib(active), fat-amd-2-ib(active)
\ /
OST1 (active in fat-amd-1-ib)
OST2 (active in fat-amd-2-ib)
OST3 (active in fat-amd-1-ib)
OST4 (active in fat-amd-2-ib)
OST5 (active in fat-amd-1-ib)
OST6 (active in fat-amd-2-ib)
Client Nodes: fat-amd-3-ib, client-[9,11,12,13]-ib
Lustre Tag: v2_0_65_0 Lustre Build: http://newbuild.whamcloud.com/job/lustre-master/204/ e2fsprogs Build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/ Distro/Arch: RHEL6/x86_64(in-kernel OFED, kernel version: 2.6.32-131.2.1.el6) ENABLE_QUOTA=yes FAILURE_MODE=HARD MGS/MDS Nodes: client-7-ib(active), client-8-ib(passive) \ / 1 combined MGS/MDT OSS Nodes: fat-amd-1-ib(active), fat-amd-2-ib(active) \ / OST1 (active in fat-amd-1-ib) OST2 (active in fat-amd-2-ib) OST3 (active in fat-amd-1-ib) OST4 (active in fat-amd-2-ib) OST5 (active in fat-amd-1-ib) OST6 (active in fat-amd-2-ib) Client Nodes: fat-amd-3-ib, client-[9,11,12,13]-ib
-
3
-
9003
Description
While running recovery-mds-scale test, it failed as follows after MDS failed over 12 times:
==== Checking the clients loads AFTER failover -- failure NOT OK Client load failed on node client-13-ib, rc=1 Client load failed during failover. Exiting Found the END_RUN_FILE file: /home/yujian/test_logs/end_run_file client-13-ib Client load failed on node client-13-ib client client-13-ib load stdout and debug files : /tmp/recovery-mds-scale.log_run_dbench.sh-client-13-ib /tmp/recovery-mds-scale.log_run_dbench.sh-client-13-ib.debug 2011-07-25 02:58:07 Terminating clients loads ... Duration: 43200 Server failover period: 600 seconds Exited after: 6723 seconds Number of failovers before exit: mds1: 12 times
/tmp/recovery-mds-scale.log_run_dbench.sh-client-13-ib:
copying /usr/share/dbench/client.txt to /mnt/lustre/d0.dbench-client-13-ib/client.txt running 'dbench 2' on /mnt/lustre/d0.dbench-client-13-ib at Mon Jul 25 02:56:38 PDT 2011 dbench PID=8460 dbench version 4.00 - Copyright Andrew Tridgell 1999-2004 Running for 600 seconds with load 'client.txt' and minimum warmup 120 secs 0 of 2 processes prepared for launch 0 sec 2 of 2 processes prepared for launch 0 sec releasing clients [678] open ./clients/client1/~dmtmp/PWRPNT/NEWPCB.PPT failed for handle 10013 (No such file or directory) (679) ERROR: handle 10013 was not found Child failed with status 1
/tmp/recovery-mds-scale.log_run_dbench.sh-client-13-ib.debug:
<~snip~> 2011-07-25 02:56:38: dbench run starting + mkdir -p /mnt/lustre/d0.dbench-client-13-ib + load_pid=8452 + wait 8452 + rundbench -D /mnt/lustre/d0.dbench-client-13-ib 2 touch: missing file operand Try `touch --help' for more information. + '[' 1 -eq 0 ']' ++ date '+%F %H:%M:%S' + echoerr '2011-07-25 02:56:39: dbench failed' + echo '2011-07-25 02:56:39: dbench failed' 2011-07-25 02:56:39: dbench failed
Syslog on the client node client-13-ib showed that:
Jul 25 02:56:39 client-13 kernel: LustreError: 8461:0:(llite_lib.c:1142:ll_md_setattr()) md_setattr fails: rc = -30 Jul 25 02:56:39 client-13 kernel: LustreError: 8461:0:(llite_lib.c:1142:ll_md_setattr()) Skipped 1 previous similar message
Maloo report: https://maloo.whamcloud.com/test_sets/c18c51da-b750-11e0-8bdf-52540025f9af
Please find more logs in the attached recovery-mds-scale-1311587892.tar.bz2.