-----============= acceptance-small: recovery-mds-scale ============----- Fri Jul 24 16:44:06 PDT 2015 Exit config file opensfs.sh c24: Exit config file opensfs.sh c01: Exit config file opensfs.sh c02: Exit config file opensfs.sh c05: Exit config file opensfs.sh c04: Exit config file opensfs.sh c08: Exit config file opensfs.sh c06: Exit config file opensfs.sh c07: Exit config file opensfs.sh c03: Exit config file opensfs.sh oss01: Exit config file opensfs.sh mds04: Exit config file opensfs.sh mds01: Exit config file opensfs.sh oss02: Exit config file opensfs.sh mds02: Exit config file opensfs.sh mds03: Exit config file opensfs.sh excepting tests: c02: Exit config file opensfs.sh c02: Checking config lustre mounted on /lustre/lustre c01: Exit config file opensfs.sh c06: Exit config file opensfs.sh c05: Exit config file opensfs.sh c08: Exit config file opensfs.sh c01: Checking config lustre mounted on /lustre/lustre c06: Checking config lustre mounted on /lustre/lustre c07: Exit config file opensfs.sh c05: Checking config lustre mounted on /lustre/lustre c03: Exit config file opensfs.sh c08: Checking config lustre mounted on /lustre/lustre c04: Exit config file opensfs.sh c07: Checking config lustre mounted on /lustre/lustre c24: Exit config file opensfs.sh c03: Checking config lustre mounted on /lustre/lustre c04: Checking config lustre mounted on /lustre/lustre c24: Checking config lustre mounted on /lustre/lustre Checking servers environments Checking clients c01,c02,c03,c04,c05,c06,c07,c08,c24 environments Using TIMEOUT=100 enable quota as required [HOST:c24] [old_mdt_qtype:none] [old_ost_qtype:none] [new_qtype:] mds01: warning: 'lctl conf_param' is deprecated, use 'lctl set_param -P' instead mds01: warning: 'lctl conf_param' is deprecated, use 'lctl set_param -P' instead Total disk size: 83624752 block-softlimit: 83625776 block-hardlimit: 87807064 inode-softlimit: 1235598 inode-hardlimit: 1297377 c01: Exit config file opensfs.sh c02: Exit config file opensfs.sh c08: Exit config file opensfs.sh c05: Exit config file opensfs.sh c07: Exit config file opensfs.sh c04: Exit config file opensfs.sh c03: Exit config file opensfs.sh c06: Exit config file opensfs.sh mds02: Exit config file opensfs.sh mds04: Exit config file opensfs.sh mds01: Exit config file opensfs.sh oss01: Exit config file opensfs.sh mds03: Exit config file opensfs.sh oss02: Exit config file opensfs.sh osd-ldiskfs.track_declares_assert=1 osd-ldiskfs.track_declares_assert=1 osd-ldiskfs.track_declares_assert=1 osd-ldiskfs.track_declares_assert=1 osd-ldiskfs.track_declares_assert=1 osd-ldiskfs.track_declares_assert=1 Stopping client c24 /lustre/lustre (opts:) c01: Exit config file opensfs.sh c02: Exit config file opensfs.sh c06: Exit config file opensfs.sh c05: Exit config file opensfs.sh c07: Exit config file opensfs.sh c04: Exit config file opensfs.sh c03: Exit config file opensfs.sh c08: Exit config file opensfs.sh == recovery-mds-scale test failover_mds: failover MDS == 16:44:16 (1437781456) Started client load: tar on c01 Started client load: dbench on c02 Started client load: dd on c03 Started client load: tar on c04 Started client load: dbench on c05 Started client load: dd on c06 Started client load: tar on c07 Started client load: dbench on c08 client loads pids: c03: 3542 c06: 3626 c05: 3588 c04: 3569 c02: 3526 c01: 3530 c08: 3677 c07: 3636 ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=0 DURATION=86400 PERIOD=600 Wait mds2 recovery complete before doing next failover... affected facets: mds2 mds02: Exit config file opensfs.sh mds02: *.lustre-MDT0001.recovery_status status: INACTIVE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c06: Exit config file opensfs.sh c03: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c06: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c07: Exit config file opensfs.sh c02: Exit config file opensfs.sh c08: Exit config file opensfs.sh c01: Exit config file opensfs.sh c04: Exit config file opensfs.sh c07: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c02: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c05: Exit config file opensfs.sh c01: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds2 Failing mds2 on mds02 + runas -u di.wang ssh 192.168.0.1 pm -0 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds02] Command completed successfully reboot facets: mds2 + runas -u di.wang ssh 192.168.0.1 pm -1 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds02] Command completed successfully Failover mds2 to mds02 16:44:54 (1437781494) waiting for mds02 network 900 secs ... waiting ping -c 1 -w 3 mds02, 895 secs left ... waiting ping -c 1 -w 3 mds02, 890 secs left ... waiting ping -c 1 -w 3 mds02, 885 secs left ... waiting ping -c 1 -w 3 mds02, 880 secs left ... waiting ping -c 1 -w 3 mds02, 875 secs left ... waiting ping -c 1 -w 3 mds02, 870 secs left ... waiting ping -c 1 -w 3 mds02, 865 secs left ... waiting ping -c 1 -w 3 mds02, 860 secs left ... waiting ping -c 1 -w 3 mds02, 855 secs left ... waiting ping -c 1 -w 3 mds02, 850 secs left ... waiting ping -c 1 -w 3 mds02, 845 secs left ... waiting ping -c 1 -w 3 mds02, 840 secs left ... waiting ping -c 1 -w 3 mds02, 835 secs left ... waiting ping -c 1 -w 3 mds02, 830 secs left ... waiting ping -c 1 -w 3 mds02, 825 secs left ... waiting ping -c 1 -w 3 mds02, 820 secs left ... waiting ping -c 1 -w 3 mds02, 815 secs left ... waiting ping -c 1 -w 3 mds02, 810 secs left ... waiting ping -c 1 -w 3 mds02, 805 secs left ... waiting ping -c 1 -w 3 mds02, 800 secs left ... waiting ping -c 1 -w 3 mds02, 795 secs left ... waiting ping -c 1 -w 3 mds02, 790 secs left ... 16:47:51 (1437781671) network interface is UP mount facets: mds2 Starting mds2: -o user_xattr,acl /dev/sdb1 /lustre/mds2 mds02: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 16383 mds02: Exit config file opensfs.sh Started lustre-MDT0001 ==== Checking the clients loads AFTER failover -- failure NOT OK mds2 has failed over 1 times, and counting... sleeping 375 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=225 DURATION=86400 PERIOD=600 Wait mds4 recovery complete before doing next failover... affected facets: mds4 mds04: Exit config file opensfs.sh mds04: *.lustre-MDT0003.recovery_status status: INACTIVE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c06: Exit config file opensfs.sh c03: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c06: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c02: Exit config file opensfs.sh c08: Exit config file opensfs.sh c05: Exit config file opensfs.sh c07: Exit config file opensfs.sh c04: Exit config file opensfs.sh c01: Exit config file opensfs.sh c02: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds4 Failing mds4 on mds04 + runas -u di.wang ssh 192.168.0.1 pm -0 mds04 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds04] Command completed successfully reboot facets: mds4 + runas -u di.wang ssh 192.168.0.1 pm -1 mds04 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds04] Command completed successfully Failover mds4 to mds04 16:54:35 (1437782075) waiting for mds04 network 900 secs ... waiting ping -c 1 -w 3 mds04, 895 secs left ... waiting ping -c 1 -w 3 mds04, 890 secs left ... waiting ping -c 1 -w 3 mds04, 885 secs left ... waiting ping -c 1 -w 3 mds04, 880 secs left ... waiting ping -c 1 -w 3 mds04, 875 secs left ... waiting ping -c 1 -w 3 mds04, 870 secs left ... waiting ping -c 1 -w 3 mds04, 865 secs left ... waiting ping -c 1 -w 3 mds04, 860 secs left ... waiting ping -c 1 -w 3 mds04, 855 secs left ... waiting ping -c 1 -w 3 mds04, 850 secs left ... waiting ping -c 1 -w 3 mds04, 845 secs left ... waiting ping -c 1 -w 3 mds04, 840 secs left ... waiting ping -c 1 -w 3 mds04, 835 secs left ... waiting ping -c 1 -w 3 mds04, 830 secs left ... waiting ping -c 1 -w 3 mds04, 825 secs left ... waiting ping -c 1 -w 3 mds04, 820 secs left ... waiting ping -c 1 -w 3 mds04, 815 secs left ... waiting ping -c 1 -w 3 mds04, 810 secs left ... waiting ping -c 1 -w 3 mds04, 805 secs left ... waiting ping -c 1 -w 3 mds04, 800 secs left ... waiting ping -c 1 -w 3 mds04, 795 secs left ... waiting ping -c 1 -w 3 mds04, 790 secs left ... 16:57:34 (1437782254) network interface is UP mount facets: mds4 Starting mds4: -o user_xattr,acl /dev/sdb1 /lustre/mds4 mds04: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 16383 mds04: Exit config file opensfs.sh Started lustre-MDT0003 ==== Checking the clients loads AFTER failover -- failure NOT OK mds4 has failed over 1 times, and counting... sleeping 389 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=812 DURATION=86400 PERIOD=600 Wait mds4 recovery complete before doing next failover... affected facets: mds4 mds04: Exit config file opensfs.sh mds04: *.lustre-MDT0003.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c06: Exit config file opensfs.sh c03: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c06: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c02: Exit config file opensfs.sh c07: Exit config file opensfs.sh c05: Exit config file opensfs.sh c01: Exit config file opensfs.sh c08: Exit config file opensfs.sh c02: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c04: Exit config file opensfs.sh c07: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds4 Failing mds4 on mds04 + runas -u di.wang ssh 192.168.0.1 pm -0 mds04 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds04] Command completed successfully reboot facets: mds4 + runas -u di.wang ssh 192.168.0.1 pm -1 mds04 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds04] Command completed successfully Failover mds4 to mds04 17:04:37 (1437782677) waiting for mds04 network 900 secs ... waiting ping -c 1 -w 3 mds04, 895 secs left ... waiting ping -c 1 -w 3 mds04, 890 secs left ... waiting ping -c 1 -w 3 mds04, 885 secs left ... waiting ping -c 1 -w 3 mds04, 880 secs left ... waiting ping -c 1 -w 3 mds04, 875 secs left ... waiting ping -c 1 -w 3 mds04, 870 secs left ... waiting ping -c 1 -w 3 mds04, 865 secs left ... waiting ping -c 1 -w 3 mds04, 860 secs left ... waiting ping -c 1 -w 3 mds04, 855 secs left ... waiting ping -c 1 -w 3 mds04, 850 secs left ... waiting ping -c 1 -w 3 mds04, 845 secs left ... waiting ping -c 1 -w 3 mds04, 840 secs left ... waiting ping -c 1 -w 3 mds04, 835 secs left ... waiting ping -c 1 -w 3 mds04, 830 secs left ... waiting ping -c 1 -w 3 mds04, 825 secs left ... waiting ping -c 1 -w 3 mds04, 820 secs left ... waiting ping -c 1 -w 3 mds04, 815 secs left ... waiting ping -c 1 -w 3 mds04, 810 secs left ... waiting ping -c 1 -w 3 mds04, 805 secs left ... waiting ping -c 1 -w 3 mds04, 800 secs left ... waiting ping -c 1 -w 3 mds04, 795 secs left ... waiting ping -c 1 -w 3 mds04, 790 secs left ... waiting ping -c 1 -w 3 mds04, 785 secs left ... 17:07:41 (1437782861) network interface is UP mount facets: mds4 Starting mds4: -o user_xattr,acl /dev/sdb1 /lustre/mds4 mds04: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 16383 mds04: Exit config file opensfs.sh Started lustre-MDT0003 ==== Checking the clients loads AFTER failover -- failure NOT OK mds4 has failed over 2 times, and counting... sleeping 382 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=1419 DURATION=86400 PERIOD=600 Wait mds4 recovery complete before doing next failover... affected facets: mds4 mds04: Exit config file opensfs.sh mds04: *.lustre-MDT0003.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c05: Exit config file opensfs.sh c02: Exit config file opensfs.sh c04: Exit config file opensfs.sh c03: Exit config file opensfs.sh c07: Exit config file opensfs.sh c08: Exit config file opensfs.sh c06: Exit config file opensfs.sh c05: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c02: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c01: Exit config file opensfs.sh c07: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c03: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c06: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds4 Failing mds4 on mds04 + runas -u di.wang ssh 192.168.0.1 pm -0 mds04 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds04] Command completed successfully reboot facets: mds4 + runas -u di.wang ssh 192.168.0.1 pm -1 mds04 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds04] Command completed successfully Failover mds4 to mds04 17:14:37 (1437783277) waiting for mds04 network 900 secs ... waiting ping -c 1 -w 3 mds04, 895 secs left ... waiting ping -c 1 -w 3 mds04, 890 secs left ... waiting ping -c 1 -w 3 mds04, 885 secs left ... waiting ping -c 1 -w 3 mds04, 880 secs left ... waiting ping -c 1 -w 3 mds04, 875 secs left ... waiting ping -c 1 -w 3 mds04, 870 secs left ... waiting ping -c 1 -w 3 mds04, 865 secs left ... waiting ping -c 1 -w 3 mds04, 860 secs left ... waiting ping -c 1 -w 3 mds04, 855 secs left ... waiting ping -c 1 -w 3 mds04, 850 secs left ... waiting ping -c 1 -w 3 mds04, 845 secs left ... waiting ping -c 1 -w 3 mds04, 840 secs left ... waiting ping -c 1 -w 3 mds04, 835 secs left ... waiting ping -c 1 -w 3 mds04, 830 secs left ... waiting ping -c 1 -w 3 mds04, 825 secs left ... waiting ping -c 1 -w 3 mds04, 820 secs left ... waiting ping -c 1 -w 3 mds04, 815 secs left ... waiting ping -c 1 -w 3 mds04, 810 secs left ... waiting ping -c 1 -w 3 mds04, 805 secs left ... waiting ping -c 1 -w 3 mds04, 800 secs left ... waiting ping -c 1 -w 3 mds04, 795 secs left ... waiting ping -c 1 -w 3 mds04, 790 secs left ... 17:17:35 (1437783455) network interface is UP mount facets: mds4 Starting mds4: -o user_xattr,acl /dev/sdb1 /lustre/mds4 mds04: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 16383 mds04: Exit config file opensfs.sh Started lustre-MDT0003 ==== Checking the clients loads AFTER failover -- failure NOT OK mds4 has failed over 3 times, and counting... sleeping 389 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=2013 DURATION=86400 PERIOD=600 Wait mds1 recovery complete before doing next failover... affected facets: mds1 mds01: Exit config file opensfs.sh mds01: *.lustre-MDT0000.recovery_status status: INACTIVE Checking clients are in FULL state before doing next failover... c06: Exit config file opensfs.sh c03: Exit config file opensfs.sh c06: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c03: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c02: Exit config file opensfs.sh c08: Exit config file opensfs.sh c05: Exit config file opensfs.sh c04: Exit config file opensfs.sh c07: Exit config file opensfs.sh c01: Exit config file opensfs.sh c02: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds1 Failing mds1 on mds01 + runas -u di.wang ssh 192.168.0.1 pm -0 mds01 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds01] Command completed successfully reboot facets: mds1 + runas -u di.wang ssh 192.168.0.1 pm -1 mds01 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds01] Command completed successfully Failover mds1 to mds01 17:24:39 (1437783879) waiting for mds01 network 900 secs ... waiting ping -c 1 -w 3 mds01, 895 secs left ... waiting ping -c 1 -w 3 mds01, 890 secs left ... waiting ping -c 1 -w 3 mds01, 885 secs left ... waiting ping -c 1 -w 3 mds01, 880 secs left ... waiting ping -c 1 -w 3 mds01, 875 secs left ... waiting ping -c 1 -w 3 mds01, 870 secs left ... waiting ping -c 1 -w 3 mds01, 865 secs left ... waiting ping -c 1 -w 3 mds01, 860 secs left ... waiting ping -c 1 -w 3 mds01, 855 secs left ... waiting ping -c 1 -w 3 mds01, 850 secs left ... waiting ping -c 1 -w 3 mds01, 845 secs left ... waiting ping -c 1 -w 3 mds01, 840 secs left ... waiting ping -c 1 -w 3 mds01, 835 secs left ... waiting ping -c 1 -w 3 mds01, 830 secs left ... waiting ping -c 1 -w 3 mds01, 825 secs left ... waiting ping -c 1 -w 3 mds01, 820 secs left ... waiting ping -c 1 -w 3 mds01, 815 secs left ... waiting ping -c 1 -w 3 mds01, 810 secs left ... waiting ping -c 1 -w 3 mds01, 805 secs left ... waiting ping -c 1 -w 3 mds01, 800 secs left ... waiting ping -c 1 -w 3 mds01, 795 secs left ... waiting ping -c 1 -w 3 mds01, 790 secs left ... waiting ping -c 1 -w 3 mds01, 785 secs left ... 17:27:43 (1437784063) network interface is UP mount facets: mds1 Starting mds1: -o user_xattr,acl /dev/sdf1 /lustre/mds1 mds01: mount.lustre: increased /sys/block/sdf/queue/max_sectors_kb from 1024 to 16383 mds01: Exit config file opensfs.sh Started lustre-MDT0000 ==== Checking the clients loads AFTER failover -- failure NOT OK mds1 has failed over 1 times, and counting... sleeping 382 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=2620 DURATION=86400 PERIOD=600 Wait mds2 recovery complete before doing next failover... affected facets: mds2 mds02: Exit config file opensfs.sh mds02: *.lustre-MDT0001.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c06: Exit config file opensfs.sh c03: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c06: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c02: Exit config file opensfs.sh c01: Exit config file opensfs.sh c05: Exit config file opensfs.sh c08: Exit config file opensfs.sh c04: Exit config file opensfs.sh c07: Exit config file opensfs.sh c02: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds2 Failing mds2 on mds02 + runas -u di.wang ssh 192.168.0.1 pm -0 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds02] Command completed successfully reboot facets: mds2 + runas -u di.wang ssh 192.168.0.1 pm -1 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds02] Command completed successfully Failover mds2 to mds02 17:34:42 (1437784482) waiting for mds02 network 900 secs ... waiting ping -c 1 -w 3 mds02, 895 secs left ... waiting ping -c 1 -w 3 mds02, 890 secs left ... waiting ping -c 1 -w 3 mds02, 885 secs left ... waiting ping -c 1 -w 3 mds02, 880 secs left ... waiting ping -c 1 -w 3 mds02, 875 secs left ... waiting ping -c 1 -w 3 mds02, 870 secs left ... waiting ping -c 1 -w 3 mds02, 865 secs left ... waiting ping -c 1 -w 3 mds02, 860 secs left ... waiting ping -c 1 -w 3 mds02, 855 secs left ... waiting ping -c 1 -w 3 mds02, 850 secs left ... waiting ping -c 1 -w 3 mds02, 845 secs left ... waiting ping -c 1 -w 3 mds02, 840 secs left ... waiting ping -c 1 -w 3 mds02, 835 secs left ... waiting ping -c 1 -w 3 mds02, 830 secs left ... waiting ping -c 1 -w 3 mds02, 825 secs left ... waiting ping -c 1 -w 3 mds02, 820 secs left ... waiting ping -c 1 -w 3 mds02, 815 secs left ... waiting ping -c 1 -w 3 mds02, 810 secs left ... waiting ping -c 1 -w 3 mds02, 805 secs left ... waiting ping -c 1 -w 3 mds02, 800 secs left ... waiting ping -c 1 -w 3 mds02, 795 secs left ... waiting ping -c 1 -w 3 mds02, 790 secs left ... 17:37:39 (1437784659) network interface is UP mount facets: mds2 Starting mds2: -o user_xattr,acl /dev/sdb1 /lustre/mds2 mds02: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 16383 mds02: Exit config file opensfs.sh Started lustre-MDT0001 ==== Checking the clients loads AFTER failover -- failure NOT OK mds2 has failed over 2 times, and counting... sleeping 386 seconds... striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 ststriped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 ststriped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 st==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=3216 DURATION=86400 PERIOD=600 Wait mds2 recovery complete before doing next failover... affected facets: mds2 mds02: Exit config file opensfs.sh mds02: *.lustre-MDT0001.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c06: Exit config file opensfs.sh c03: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c06: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c05: Exit config file opensfs.sh c07: Exit config file opensfs.sh c08: Exit config file opensfs.sh c02: Exit config file opensfs.sh c04: Exit config file opensfs.sh c05: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c01: Exit config file opensfs.sh c07: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c02: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds2 Failing mds2 on mds02 + runas -u di.wang ssh 192.168.0.1 pm -0 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds02] Command completed successfully reboot facets: mds2 + runas -u di.wang ssh 192.168.0.1 pm -1 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds02] Command completed successfully Failover mds2 to mds02 17:44:55 (1437785095) waiting for mds02 network 900 secs ... waiting ping -c 1 -w 3 mds02, 895 secs left ... waiting ping -c 1 -w 3 mds02, 890 secs left ... waiting ping -c 1 -w 3 mds02, 885 secs left ... waiting ping -c 1 -w 3 mds02, 880 secs left ... waiting ping -c 1 -w 3 mds02, 875 secs left ... waiting ping -c 1 -w 3 mds02, 870 secs left ... waiting ping -c 1 -w 3 mds02, 865 secs left ... waiting ping -c 1 -w 3 mds02, 860 secs left ... waiting ping -c 1 -w 3 mds02, 855 secs left ... waiting ping -c 1 -w 3 mds02, 850 secs left ... waiting ping -c 1 -w 3 mds02, 845 secs left ... waiting ping -c 1 -w 3 mds02, 840 secs left ... waiting ping -c 1 -w 3 mds02, 835 secs left ... waiting ping -c 1 -w 3 mds02, 830 secs left ... waiting ping -c 1 -w 3 mds02, 825 secs left ... waiting ping -c 1 -w 3 mds02, 820 secs left ... waiting ping -c 1 -w 3 mds02, 815 secs left ... waiting ping -c 1 -w 3 mds02, 810 secs left ... waiting ping -c 1 -w 3 mds02, 805 secs left ... waiting ping -c 1 -w 3 mds02, 800 secs left ... waiting ping -c 1 -w 3 mds02, 795 secs left ... waiting ping -c 1 -w 3 mds02, 790 secs left ... 17:47:52 (1437785272) network interface is UP mount facets: mds2 Starting mds2: -o user_xattr,acl /dev/sdb1 /lustre/mds2 mds02: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 16383 mds02: Exit config file opensfs.sh Started lustre-MDT0001 ==== Checking the clients loads AFTER failover -- failure NOT OK mds2 has failed over 3 times, and counting... sleeping 371 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=3831 DURATION=86400 PERIOD=600 Wait mds2 recovery complete before doing next failover... affected facets: mds2 mds02: Exit config file opensfs.sh mds02: *.lustre-MDT0001.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c06: Exit config file opensfs.sh c03: Exit config file opensfs.sh c06: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c03: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c07: Exit config file opensfs.sh c05: Exit config file opensfs.sh c02: Exit config file opensfs.sh c08: Exit config file opensfs.sh c01: Exit config file opensfs.sh c04: Exit config file opensfs.sh c07: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c02: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds2 Failing mds2 on mds02 + runas -u di.wang ssh 192.168.0.1 pm -0 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds02] Command completed successfully reboot facets: mds2 + runas -u di.wang ssh 192.168.0.1 pm -1 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds02] Command completed successfully Failover mds2 to mds02 17:54:55 (1437785695) waiting for mds02 network 900 secs ... waiting ping -c 1 -w 3 mds02, 895 secs left ... waiting ping -c 1 -w 3 mds02, 890 secs left ... waiting ping -c 1 -w 3 mds02, 885 secs left ... waiting ping -c 1 -w 3 mds02, 880 secs left ... waiting ping -c 1 -w 3 mds02, 875 secs left ... waiting ping -c 1 -w 3 mds02, 870 secs left ... waiting ping -c 1 -w 3 mds02, 865 secs left ... waiting ping -c 1 -w 3 mds02, 860 secs left ... waiting ping -c 1 -w 3 mds02, 855 secs left ... waiting ping -c 1 -w 3 mds02, 850 secs left ... waiting ping -c 1 -w 3 mds02, 845 secs left ... waiting ping -c 1 -w 3 mds02, 840 secs left ... waiting ping -c 1 -w 3 mds02, 835 secs left ... waiting ping -c 1 -w 3 mds02, 830 secs left ... waiting ping -c 1 -w 3 mds02, 825 secs left ... waiting ping -c 1 -w 3 mds02, 820 secs left ... waiting ping -c 1 -w 3 mds02, 815 secs left ... waiting ping -c 1 -w 3 mds02, 810 secs left ... waiting ping -c 1 -w 3 mds02, 805 secs left ... waiting ping -c 1 -w 3 mds02, 800 secs left ... waiting ping -c 1 -w 3 mds02, 795 secs left ... waiting ping -c 1 -w 3 mds02, 790 secs left ... 17:57:51 (1437785871) network interface is UP mount facets: mds2 Starting mds2: -o user_xattr,acl /dev/sdb1 /lustre/mds2 mds02: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 16383 mds02: Exit config file opensfs.sh Started lustre-MDT0001 ==== Checking the clients loads AFTER failover -- failure NOT OK mds2 has failed over 4 times, and counting... sleeping 372 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=4430 DURATION=86400 PERIOD=600 Wait mds3 recovery complete before doing next failover... affected facets: mds3 mds03: Exit config file opensfs.sh mds03: *.lustre-MDT0002.recovery_status status: INACTIVE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c03: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c06: Exit config file opensfs.sh c06: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c02: Exit config file opensfs.sh c05: Exit config file opensfs.sh c07: Exit config file opensfs.sh c08: Exit config file opensfs.sh c01: Exit config file opensfs.sh c04: Exit config file opensfs.sh c02: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds3 Failing mds3 on mds03 + runas -u di.wang ssh 192.168.0.1 pm -0 mds03 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds03] Command completed successfully reboot facets: mds3 + runas -u di.wang ssh 192.168.0.1 pm -1 mds03 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds03] Command completed successfully Failover mds3 to mds03 18:04:38 (1437786278) waiting for mds03 network 900 secs ... waiting ping -c 1 -w 3 mds03, 895 secs left ... waiting ping -c 1 -w 3 mds03, 890 secs left ... waiting ping -c 1 -w 3 mds03, 885 secs left ... waiting ping -c 1 -w 3 mds03, 880 secs left ... waiting ping -c 1 -w 3 mds03, 875 secs left ... waiting ping -c 1 -w 3 mds03, 870 secs left ... waiting ping -c 1 -w 3 mds03, 865 secs left ... waiting ping -c 1 -w 3 mds03, 860 secs left ... waiting ping -c 1 -w 3 mds03, 855 secs left ... waiting ping -c 1 -w 3 mds03, 850 secs left ... waiting ping -c 1 -w 3 mds03, 845 secs left ... waiting ping -c 1 -w 3 mds03, 840 secs left ... waiting ping -c 1 -w 3 mds03, 835 secs left ... waiting ping -c 1 -w 3 mds03, 830 secs left ... waiting ping -c 1 -w 3 mds03, 825 secs left ... waiting ping -c 1 -w 3 mds03, 820 secs left ... waiting ping -c 1 -w 3 mds03, 815 secs left ... waiting ping -c 1 -w 3 mds03, 810 secs left ... waiting ping -c 1 -w 3 mds03, 805 secs left ... waiting ping -c 1 -w 3 mds03, 800 secs left ... waiting ping -c 1 -w 3 mds03, 795 secs left ... waiting ping -c 1 -w 3 mds03, 790 secs left ... 18:07:36 (1437786456) network interface is UP mount facets: mds3 Starting mds3: -o user_xattr,acl /dev/sda1 /lustre/mds3 mds03: mount.lustre: increased /sys/block/sda/queue/max_sectors_kb from 1024 to 16383 mds03: Exit config file opensfs.sh Started lustre-MDT0002 ==== Checking the clients loads AFTER failover -- failure NOT OK mds3 has failed over 1 times, and counting... sleeping 391 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=5011 DURATION=86400 PERIOD=600 Wait mds4 recovery complete before doing next failover... affected facets: mds4 mds04: Exit config file opensfs.sh mds04: *.lustre-MDT0003.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c06: Exit config file opensfs.sh c03: Exit config file opensfs.sh c06: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c03: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c02: Exit config file opensfs.sh c01: Exit config file opensfs.sh c05: Exit config file opensfs.sh c04: Exit config file opensfs.sh c08: Exit config file opensfs.sh c02: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c07: Exit config file opensfs.sh c05: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds4 Failing mds4 on mds04 + runas -u di.wang ssh 192.168.0.1 pm -0 mds04 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds04] Command completed successfully reboot facets: mds4 + runas -u di.wang ssh 192.168.0.1 pm -1 mds04 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds04] Command completed successfully Failover mds4 to mds04 18:14:38 (1437786878) waiting for mds04 network 900 secs ... waiting ping -c 1 -w 3 mds04, 895 secs left ... waiting ping -c 1 -w 3 mds04, 890 secs left ... waiting ping -c 1 -w 3 mds04, 885 secs left ... waiting ping -c 1 -w 3 mds04, 880 secs left ... waiting ping -c 1 -w 3 mds04, 875 secs left ... waiting ping -c 1 -w 3 mds04, 870 secs left ... waiting ping -c 1 -w 3 mds04, 865 secs left ... waiting ping -c 1 -w 3 mds04, 860 secs left ... waiting ping -c 1 -w 3 mds04, 855 secs left ... waiting ping -c 1 -w 3 mds04, 850 secs left ... waiting ping -c 1 -w 3 mds04, 845 secs left ... waiting ping -c 1 -w 3 mds04, 840 secs left ... waiting ping -c 1 -w 3 mds04, 835 secs left ... waiting ping -c 1 -w 3 mds04, 830 secs left ... waiting ping -c 1 -w 3 mds04, 825 secs left ... waiting ping -c 1 -w 3 mds04, 820 secs left ... waiting ping -c 1 -w 3 mds04, 815 secs left ... waiting ping -c 1 -w 3 mds04, 810 secs left ... waiting ping -c 1 -w 3 mds04, 805 secs left ... waiting ping -c 1 -w 3 mds04, 800 secs left ... waiting ping -c 1 -w 3 mds04, 795 secs left ... waiting ping -c 1 -w 3 mds04, 790 secs left ... 18:17:36 (1437787056) network interface is UP mount facets: mds4 Starting mds4: -o user_xattr,acl /dev/sdb1 /lustre/mds4 mds04: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 16383 mds04: Exit config file opensfs.sh Started lustre-MDT0003 ==== Checking the clients loads AFTER failover -- failure NOT OK mds4 has failed over 4 times, and counting... sleeping 389 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=5613 DURATION=86400 PERIOD=600 Wait mds4 recovery complete before doing next failover... affected facets: mds4 mds04: Exit config file opensfs.sh mds04: *.lustre-MDT0003.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c06: Exit config file opensfs.sh c03: Exit config file opensfs.sh c06: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c03: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c02: Exit config file opensfs.sh c05: Exit config file opensfs.sh c07: Exit config file opensfs.sh c08: Exit config file opensfs.sh c02: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c01: Exit config file opensfs.sh c04: Exit config file opensfs.sh c05: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds4 Failing mds4 on mds04 + runas -u di.wang ssh 192.168.0.1 pm -0 mds04 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds04] Command completed successfully reboot facets: mds4 + runas -u di.wang ssh 192.168.0.1 pm -1 mds04 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds04] Command completed successfully Failover mds4 to mds04 18:24:38 (1437787478) waiting for mds04 network 900 secs ... waiting ping -c 1 -w 3 mds04, 895 secs left ... waiting ping -c 1 -w 3 mds04, 890 secs left ... waiting ping -c 1 -w 3 mds04, 885 secs left ... waiting ping -c 1 -w 3 mds04, 880 secs left ... waiting ping -c 1 -w 3 mds04, 875 secs left ... waiting ping -c 1 -w 3 mds04, 870 secs left ... waiting ping -c 1 -w 3 mds04, 865 secs left ... waiting ping -c 1 -w 3 mds04, 860 secs left ... waiting ping -c 1 -w 3 mds04, 855 secs left ... waiting ping -c 1 -w 3 mds04, 850 secs left ... waiting ping -c 1 -w 3 mds04, 845 secs left ... waiting ping -c 1 -w 3 mds04, 840 secs left ... waiting ping -c 1 -w 3 mds04, 835 secs left ... waiting ping -c 1 -w 3 mds04, 830 secs left ... waiting ping -c 1 -w 3 mds04, 825 secs left ... waiting ping -c 1 -w 3 mds04, 820 secs left ... waiting ping -c 1 -w 3 mds04, 815 secs left ... waiting ping -c 1 -w 3 mds04, 810 secs left ... waiting ping -c 1 -w 3 mds04, 805 secs left ... waiting ping -c 1 -w 3 mds04, 800 secs left ... waiting ping -c 1 -w 3 mds04, 795 secs left ... waiting ping -c 1 -w 3 mds04, 790 secs left ... 18:27:36 (1437787656) network interface is UP mount facets: mds4 Starting mds4: -o user_xattr,acl /dev/sdb1 /lustre/mds4 mds04: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 16383 mds04: Exit config file opensfs.sh Started lustre-MDT0003 ==== Checking the clients loads AFTER failover -- failure NOT OK mds4 has failed over 5 times, and counting... sleeping 388 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=6214 DURATION=86400 PERIOD=600 Wait mds2 recovery complete before doing next failover... affected facets: mds2 mds02: Exit config file opensfs.sh mds02: *.lustre-MDT0001.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c05: Exit config file opensfs.sh c02: Exit config file opensfs.sh c08: Exit config file opensfs.sh c03: Exit config file opensfs.sh c06: Exit config file opensfs.sh c01: Exit config file opensfs.sh c07: Exit config file opensfs.sh c04: Exit config file opensfs.sh c05: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c02: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c03: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c06: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds2 Failing mds2 on mds02 + runas -u di.wang ssh 192.168.0.1 pm -0 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds02] Command completed successfully reboot facets: mds2 + runas -u di.wang ssh 192.168.0.1 pm -1 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds02] Command completed successfully Failover mds2 to mds02 18:34:56 (1437788096) waiting for mds02 network 900 secs ... waiting ping -c 1 -w 3 mds02, 895 secs left ... waiting ping -c 1 -w 3 mds02, 890 secs left ... waiting ping -c 1 -w 3 mds02, 885 secs left ... waiting ping -c 1 -w 3 mds02, 880 secs left ... waiting ping -c 1 -w 3 mds02, 875 secs left ... waiting ping -c 1 -w 3 mds02, 870 secs left ... waiting ping -c 1 -w 3 mds02, 865 secs left ... waiting ping -c 1 -w 3 mds02, 860 secs left ... waiting ping -c 1 -w 3 mds02, 855 secs left ... waiting ping -c 1 -w 3 mds02, 850 secs left ... waiting ping -c 1 -w 3 mds02, 845 secs left ... waiting ping -c 1 -w 3 mds02, 840 secs left ... waiting ping -c 1 -w 3 mds02, 835 secs left ... waiting ping -c 1 -w 3 mds02, 830 secs left ... waiting ping -c 1 -w 3 mds02, 825 secs left ... waiting ping -c 1 -w 3 mds02, 820 secs left ... waiting ping -c 1 -w 3 mds02, 815 secs left ... waiting ping -c 1 -w 3 mds02, 810 secs left ... waiting ping -c 1 -w 3 mds02, 805 secs left ... waiting ping -c 1 -w 3 mds02, 800 secs left ... waiting ping -c 1 -w 3 mds02, 795 secs left ... waiting ping -c 1 -w 3 mds02, 790 secs left ... 18:37:53 (1437788273) network interface is UP mount facets: mds2 Starting mds2: -o user_xattr,acl /dev/sdb1 /lustre/mds2 mds02: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 16383 mds02: Exit config file opensfs.sh Started lustre-MDT0001 ==== Checking the clients loads AFTER failover -- failure NOT OK mds2 has failed over 5 times, and counting... sleeping 372 seconds... riped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 stririped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 stririped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 stri==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=6831 DURATION=86400 PERIOD=600 Wait mds4 recovery complete before doing next failover... affected facets: mds4 mds04: Exit config file opensfs.sh mds04: *.lustre-MDT0003.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c03: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c06: Exit config file opensfs.sh c06: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c02: Exit config file opensfs.sh c05: Exit config file opensfs.sh c01: Exit config file opensfs.sh c08: Exit config file opensfs.sh c04: Exit config file opensfs.sh c07: Exit config file opensfs.sh c02: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds4 Failing mds4 on mds04 + runas -u di.wang ssh 192.168.0.1 pm -0 mds04 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds04] Command completed successfully reboot facets: mds4 + runas -u di.wang ssh 192.168.0.1 pm -1 mds04 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds04] Command completed successfully Failover mds4 to mds04 18:44:40 (1437788680) waiting for mds04 network 900 secs ... waiting ping -c 1 -w 3 mds04, 895 secs left ... waiting ping -c 1 -w 3 mds04, 890 secs left ... waiting ping -c 1 -w 3 mds04, 885 secs left ... waiting ping -c 1 -w 3 mds04, 880 secs left ... waiting ping -c 1 -w 3 mds04, 875 secs left ... waiting ping -c 1 -w 3 mds04, 870 secs left ... waiting ping -c 1 -w 3 mds04, 865 secs left ... waiting ping -c 1 -w 3 mds04, 860 secs left ... waiting ping -c 1 -w 3 mds04, 855 secs left ... waiting ping -c 1 -w 3 mds04, 850 secs left ... waiting ping -c 1 -w 3 mds04, 845 secs left ... waiting ping -c 1 -w 3 mds04, 840 secs left ... waiting ping -c 1 -w 3 mds04, 835 secs left ... waiting ping -c 1 -w 3 mds04, 830 secs left ... waiting ping -c 1 -w 3 mds04, 825 secs left ... waiting ping -c 1 -w 3 mds04, 820 secs left ... waiting ping -c 1 -w 3 mds04, 815 secs left ... waiting ping -c 1 -w 3 mds04, 810 secs left ... waiting ping -c 1 -w 3 mds04, 805 secs left ... waiting ping -c 1 -w 3 mds04, 800 secs left ... waiting ping -c 1 -w 3 mds04, 795 secs left ... waiting ping -c 1 -w 3 mds04, 790 secs left ... 18:47:38 (1437788858) network interface is UP mount facets: mds4 Starting mds4: -o user_xattr,acl /dev/sdb1 /lustre/mds4 mds04: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 16383 mds04: Exit config file opensfs.sh Started lustre-MDT0003 ==== Checking the clients loads AFTER failover -- failure NOT OK mds4 has failed over 6 times, and counting... sleeping 392 seconds... striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 s==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=7412 DURATION=86400 PERIOD=600 Wait mds3 recovery complete before doing next failover... affected facets: mds3 mds03: Exit config file opensfs.sh mds03: *.lustre-MDT0002.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c03: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c06: Exit config file opensfs.sh c06: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c02: Exit config file opensfs.sh c08: Exit config file opensfs.sh c05: Exit config file opensfs.sh c04: Exit config file opensfs.sh c07: Exit config file opensfs.sh c01: Exit config file opensfs.sh c02: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds3 Failing mds3 on mds03 + runas -u di.wang ssh 192.168.0.1 pm -0 mds03 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds03] Command completed successfully reboot facets: mds3 + runas -u di.wang ssh 192.168.0.1 pm -1 mds03 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds03] Command completed successfully Failover mds3 to mds03 18:54:40 (1437789280) waiting for mds03 network 900 secs ... waiting ping -c 1 -w 3 mds03, 895 secs left ... waiting ping -c 1 -w 3 mds03, 890 secs left ... waiting ping -c 1 -w 3 mds03, 885 secs left ... waiting ping -c 1 -w 3 mds03, 880 secs left ... waiting ping -c 1 -w 3 mds03, 875 secs left ... waiting ping -c 1 -w 3 mds03, 870 secs left ... waiting ping -c 1 -w 3 mds03, 865 secs left ... waiting ping -c 1 -w 3 mds03, 860 secs left ... waiting ping -c 1 -w 3 mds03, 855 secs left ... waiting ping -c 1 -w 3 mds03, 850 secs left ... waiting ping -c 1 -w 3 mds03, 845 secs left ... waiting ping -c 1 -w 3 mds03, 840 secs left ... waiting ping -c 1 -w 3 mds03, 835 secs left ... waiting ping -c 1 -w 3 mds03, 830 secs left ... waiting ping -c 1 -w 3 mds03, 825 secs left ... waiting ping -c 1 -w 3 mds03, 820 secs left ... waiting ping -c 1 -w 3 mds03, 815 secs left ... waiting ping -c 1 -w 3 mds03, 810 secs left ... waiting ping -c 1 -w 3 mds03, 805 secs left ... waiting ping -c 1 -w 3 mds03, 800 secs left ... waiting ping -c 1 -w 3 mds03, 795 secs left ... waiting ping -c 1 -w 3 mds03, 790 secs left ... 18:57:37 (1437789457) network interface is UP mount facets: mds3 Starting mds3: -o user_xattr,acl /dev/sda1 /lustre/mds3 mds03: mount.lustre: increased /sys/block/sda/queue/max_sectors_kb from 1024 to 16383 mds03: Exit config file opensfs.sh Started lustre-MDT0002 ==== Checking the clients loads AFTER failover -- failure NOT OK mds3 has failed over 2 times, and counting... sleeping 390 seconds... striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 s==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=8015 DURATION=86400 PERIOD=600 Wait mds2 recovery complete before doing next failover... affected facets: mds2 mds02: Exit config file opensfs.sh mds02: *.lustre-MDT0001.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c06: Exit config file opensfs.sh c03: Exit config file opensfs.sh c06: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c03: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c02: Exit config file opensfs.sh c05: Exit config file opensfs.sh c04: Exit config file opensfs.sh c01: Exit config file opensfs.sh c08: Exit config file opensfs.sh c02: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c07: Exit config file opensfs.sh c05: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds2 Failing mds2 on mds02 + runas -u di.wang ssh 192.168.0.1 pm -0 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds02] Command completed successfully reboot facets: mds2 + runas -u di.wang ssh 192.168.0.1 pm -1 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds02] Command completed successfully Failover mds2 to mds02 19:04:58 (1437789898) waiting for mds02 network 900 secs ... waiting ping -c 1 -w 3 mds02, 895 secs left ... waiting ping -c 1 -w 3 mds02, 890 secs left ... waiting ping -c 1 -w 3 mds02, 885 secs left ... waiting ping -c 1 -w 3 mds02, 880 secs left ... waiting ping -c 1 -w 3 mds02, 875 secs left ... waiting ping -c 1 -w 3 mds02, 870 secs left ... waiting ping -c 1 -w 3 mds02, 865 secs left ... waiting ping -c 1 -w 3 mds02, 860 secs left ... waiting ping -c 1 -w 3 mds02, 855 secs left ... waiting ping -c 1 -w 3 mds02, 850 secs left ... waiting ping -c 1 -w 3 mds02, 845 secs left ... waiting ping -c 1 -w 3 mds02, 840 secs left ... waiting ping -c 1 -w 3 mds02, 835 secs left ... waiting ping -c 1 -w 3 mds02, 830 secs left ... waiting ping -c 1 -w 3 mds02, 825 secs left ... waiting ping -c 1 -w 3 mds02, 820 secs left ... waiting ping -c 1 -w 3 mds02, 815 secs left ... waiting ping -c 1 -w 3 mds02, 810 secs left ... waiting ping -c 1 -w 3 mds02, 805 secs left ... waiting ping -c 1 -w 3 mds02, 800 secs left ... waiting ping -c 1 -w 3 mds02, 795 secs left ... waiting ping -c 1 -w 3 mds02, 790 secs left ... 19:07:55 (1437790075) network interface is UP mount facets: mds2 Starting mds2: -o user_xattr,acl /dev/sdb1 /lustre/mds2 mds02: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 16383 mds02: Exit config file opensfs.sh Started lustre-MDT0001 ==== Checking the clients loads AFTER failover -- failure NOT OK mds2 has failed over 6 times, and counting... sleeping 372 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=8633 DURATION=86400 PERIOD=600 Wait mds1 recovery complete before doing next failover... affected facets: mds1 mds01: Exit config file opensfs.sh mds01: *.lustre-MDT0000.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c03: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c06: Exit config file opensfs.sh c06: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c08: Exit config file opensfs.sh c02: Exit config file opensfs.sh c05: Exit config file opensfs.sh c07: Exit config file opensfs.sh c04: Exit config file opensfs.sh c01: Exit config file opensfs.sh c08: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c02: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds1 Failing mds1 on mds01 + runas -u di.wang ssh 192.168.0.1 pm -0 mds01 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds01] Command completed successfully reboot facets: mds1 + runas -u di.wang ssh 192.168.0.1 pm -1 mds01 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds01] Command completed successfully Failover mds1 to mds01 19:14:41 (1437790481) waiting for mds01 network 900 secs ... waiting ping -c 1 -w 3 mds01, 895 secs left ... waiting ping -c 1 -w 3 mds01, 890 secs left ... waiting ping -c 1 -w 3 mds01, 885 secs left ... waiting ping -c 1 -w 3 mds01, 880 secs left ... waiting ping -c 1 -w 3 mds01, 875 secs left ... waiting ping -c 1 -w 3 mds01, 870 secs left ... waiting ping -c 1 -w 3 mds01, 865 secs left ... waiting ping -c 1 -w 3 mds01, 860 secs left ... waiting ping -c 1 -w 3 mds01, 855 secs left ... waiting ping -c 1 -w 3 mds01, 850 secs left ... waiting ping -c 1 -w 3 mds01, 845 secs left ... waiting ping -c 1 -w 3 mds01, 840 secs left ... waiting ping -c 1 -w 3 mds01, 835 secs left ... waiting ping -c 1 -w 3 mds01, 830 secs left ... waiting ping -c 1 -w 3 mds01, 825 secs left ... waiting ping -c 1 -w 3 mds01, 820 secs left ... waiting ping -c 1 -w 3 mds01, 815 secs left ... waiting ping -c 1 -w 3 mds01, 810 secs left ... waiting ping -c 1 -w 3 mds01, 805 secs left ... waiting ping -c 1 -w 3 mds01, 800 secs left ... waiting ping -c 1 -w 3 mds01, 795 secs left ... waiting ping -c 1 -w 3 mds01, 790 secs left ... 19:17:39 (1437790659) network interface is UP mount facets: mds1 Starting mds1: -o user_xattr,acl /dev/sdf1 /lustre/mds1 mds01: mount.lustre: increased /sys/block/sdf/queue/max_sectors_kb from 1024 to 16383 mds01: Exit config file opensfs.sh Started lustre-MDT0000 ==== Checking the clients loads AFTER failover -- failure NOT OK mds1 has failed over 2 times, and counting... sleeping 389 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=9216 DURATION=86400 PERIOD=600 Wait mds4 recovery complete before doing next failover... affected facets: mds4 mds04: Exit config file opensfs.sh mds04: *.lustre-MDT0003.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c06: Exit config file opensfs.sh c03: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c06: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c05: Exit config file opensfs.sh c02: Exit config file opensfs.sh c08: Exit config file opensfs.sh c04: Exit config file opensfs.sh c01: Exit config file opensfs.sh c05: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c07: Exit config file opensfs.sh c02: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds4 Failing mds4 on mds04 + runas -u di.wang ssh 192.168.0.1 pm -0 mds04 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds04] Command completed successfully reboot facets: mds4 + runas -u di.wang ssh 192.168.0.1 pm -1 mds04 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds04] Command completed successfully Failover mds4 to mds04 19:24:41 (1437791081) waiting for mds04 network 900 secs ... waiting ping -c 1 -w 3 mds04, 895 secs left ... waiting ping -c 1 -w 3 mds04, 890 secs left ... waiting ping -c 1 -w 3 mds04, 885 secs left ... waiting ping -c 1 -w 3 mds04, 880 secs left ... waiting ping -c 1 -w 3 mds04, 875 secs left ... waiting ping -c 1 -w 3 mds04, 870 secs left ... waiting ping -c 1 -w 3 mds04, 865 secs left ... waiting ping -c 1 -w 3 mds04, 860 secs left ... waiting ping -c 1 -w 3 mds04, 855 secs left ... waiting ping -c 1 -w 3 mds04, 850 secs left ... waiting ping -c 1 -w 3 mds04, 845 secs left ... waiting ping -c 1 -w 3 mds04, 840 secs left ... waiting ping -c 1 -w 3 mds04, 835 secs left ... waiting ping -c 1 -w 3 mds04, 830 secs left ... waiting ping -c 1 -w 3 mds04, 825 secs left ... waiting ping -c 1 -w 3 mds04, 820 secs left ... waiting ping -c 1 -w 3 mds04, 815 secs left ... waiting ping -c 1 -w 3 mds04, 810 secs left ... waiting ping -c 1 -w 3 mds04, 805 secs left ... waiting ping -c 1 -w 3 mds04, 800 secs left ... waiting ping -c 1 -w 3 mds04, 795 secs left ... waiting ping -c 1 -w 3 mds04, 790 secs left ... 19:27:38 (1437791258) network interface is UP mount facets: mds4 Starting mds4: -o user_xattr,acl /dev/sdb1 /lustre/mds4 mds04: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 16383 mds04: Exit config file opensfs.sh Started lustre-MDT0003 ==== Checking the clients loads AFTER failover -- failure NOT OK mds4 has failed over 7 times, and counting... sleeping 390 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=9816 DURATION=86400 PERIOD=600 Wait mds3 recovery complete before doing next failover... affected facets: mds3 mds03: Exit config file opensfs.sh mds03: *.lustre-MDT0002.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c06: Exit config file opensfs.sh c03: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c06: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c08: Exit config file opensfs.sh c02: Exit config file opensfs.sh c07: Exit config file opensfs.sh c05: Exit config file opensfs.sh c04: Exit config file opensfs.sh c01: Exit config file opensfs.sh c08: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c02: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds3 Failing mds3 on mds03 + runas -u di.wang ssh 192.168.0.1 pm -0 mds03 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds03] Command completed successfully reboot facets: mds3 + runas -u di.wang ssh 192.168.0.1 pm -1 mds03 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds03] Command completed successfully Failover mds3 to mds03 19:34:42 (1437791682) waiting for mds03 network 900 secs ... waiting ping -c 1 -w 3 mds03, 895 secs left ... waiting ping -c 1 -w 3 mds03, 890 secs left ... waiting ping -c 1 -w 3 mds03, 885 secs left ... waiting ping -c 1 -w 3 mds03, 880 secs left ... waiting ping -c 1 -w 3 mds03, 875 secs left ... waiting ping -c 1 -w 3 mds03, 870 secs left ... waiting ping -c 1 -w 3 mds03, 865 secs left ... waiting ping -c 1 -w 3 mds03, 860 secs left ... waiting ping -c 1 -w 3 mds03, 855 secs left ... waiting ping -c 1 -w 3 mds03, 850 secs left ... waiting ping -c 1 -w 3 mds03, 845 secs left ... waiting ping -c 1 -w 3 mds03, 840 secs left ... waiting ping -c 1 -w 3 mds03, 835 secs left ... waiting ping -c 1 -w 3 mds03, 830 secs left ... waiting ping -c 1 -w 3 mds03, 825 secs left ... waiting ping -c 1 -w 3 mds03, 820 secs left ... waiting ping -c 1 -w 3 mds03, 815 secs left ... waiting ping -c 1 -w 3 mds03, 810 secs left ... waiting ping -c 1 -w 3 mds03, 805 secs left ... waiting ping -c 1 -w 3 mds03, 800 secs left ... waiting ping -c 1 -w 3 mds03, 795 secs left ... waiting ping -c 1 -w 3 mds03, 790 secs left ... 19:37:39 (1437791859) network interface is UP mount facets: mds3 Starting mds3: -o user_xattr,acl /dev/sda1 /lustre/mds3 mds03: mount.lustre: increased /sys/block/sda/queue/max_sectors_kb from 1024 to 16383 mds03: Exit config file opensfs.sh Started lustre-MDT0002 ==== Checking the clients loads AFTER failover -- failure NOT OK mds3 has failed over 3 times, and counting... sleeping 390 seconds... ped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 stripeped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 stripeped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 stripe==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=10416 DURATION=86400 PERIOD=600 Wait mds1 recovery complete before doing next failover... affected facets: mds1 mds01: Exit config file opensfs.sh mds01: *.lustre-MDT0000.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c03: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c06: Exit config file opensfs.sh c06: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c07: Exit config file opensfs.sh c02: Exit config file opensfs.sh c08: Exit config file opensfs.sh c01: Exit config file opensfs.sh c05: Exit config file opensfs.sh c04: Exit config file opensfs.sh c07: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c02: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds1 Failing mds1 on mds01 + runas -u di.wang ssh 192.168.0.1 pm -0 mds01 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds01] Command completed successfully reboot facets: mds1 + runas -u di.wang ssh 192.168.0.1 pm -1 mds01 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds01] Command completed successfully Failover mds1 to mds01 19:44:42 (1437792282) waiting for mds01 network 900 secs ... waiting ping -c 1 -w 3 mds01, 895 secs left ... waiting ping -c 1 -w 3 mds01, 890 secs left ... waiting ping -c 1 -w 3 mds01, 885 secs left ... waiting ping -c 1 -w 3 mds01, 880 secs left ... waiting ping -c 1 -w 3 mds01, 875 secs left ... waiting ping -c 1 -w 3 mds01, 870 secs left ... waiting ping -c 1 -w 3 mds01, 865 secs left ... waiting ping -c 1 -w 3 mds01, 860 secs left ... waiting ping -c 1 -w 3 mds01, 855 secs left ... waiting ping -c 1 -w 3 mds01, 850 secs left ... waiting ping -c 1 -w 3 mds01, 845 secs left ... waiting ping -c 1 -w 3 mds01, 840 secs left ... waiting ping -c 1 -w 3 mds01, 835 secs left ... waiting ping -c 1 -w 3 mds01, 830 secs left ... waiting ping -c 1 -w 3 mds01, 825 secs left ... waiting ping -c 1 -w 3 mds01, 820 secs left ... waiting ping -c 1 -w 3 mds01, 815 secs left ... waiting ping -c 1 -w 3 mds01, 810 secs left ... waiting ping -c 1 -w 3 mds01, 805 secs left ... waiting ping -c 1 -w 3 mds01, 800 secs left ... waiting ping -c 1 -w 3 mds01, 795 secs left ... waiting ping -c 1 -w 3 mds01, 790 secs left ... waiting ping -c 1 -w 3 mds01, 785 secs left ... 19:47:47 (1437792467) network interface is UP mount facets: mds1 Starting mds1: -o user_xattr,acl /dev/sdf1 /lustre/mds1 mds01: mount.lustre: increased /sys/block/sdf/queue/max_sectors_kb from 1024 to 16383 mds01: Exit config file opensfs.sh Started lustre-MDT0000 ==== Checking the clients loads AFTER failover -- failure NOT OK mds1 has failed over 3 times, and counting... sleeping 382 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=11024 DURATION=86400 PERIOD=600 Wait mds2 recovery complete before doing next failover... affected facets: mds2 mds02: Exit config file opensfs.sh mds02: *.lustre-MDT0001.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c06: Exit config file opensfs.sh c03: Exit config file opensfs.sh c06: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c03: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c02: Exit config file opensfs.sh c05: Exit config file opensfs.sh c08: Exit config file opensfs.sh c07: Exit config file opensfs.sh c02: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c04: Exit config file opensfs.sh c01: Exit config file opensfs.sh c05: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds2 Failing mds2 on mds02 + runas -u di.wang ssh 192.168.0.1 pm -0 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds02] Command completed successfully reboot facets: mds2 + runas -u di.wang ssh 192.168.0.1 pm -1 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds02] Command completed successfully Failover mds2 to mds02 19:55:00 (1437792900) waiting for mds02 network 900 secs ... waiting ping -c 1 -w 3 mds02, 895 secs left ... waiting ping -c 1 -w 3 mds02, 890 secs left ... waiting ping -c 1 -w 3 mds02, 885 secs left ... waiting ping -c 1 -w 3 mds02, 880 secs left ... waiting ping -c 1 -w 3 mds02, 875 secs left ... waiting ping -c 1 -w 3 mds02, 870 secs left ... waiting ping -c 1 -w 3 mds02, 865 secs left ... waiting ping -c 1 -w 3 mds02, 860 secs left ... waiting ping -c 1 -w 3 mds02, 855 secs left ... waiting ping -c 1 -w 3 mds02, 850 secs left ... waiting ping -c 1 -w 3 mds02, 845 secs left ... waiting ping -c 1 -w 3 mds02, 840 secs left ... waiting ping -c 1 -w 3 mds02, 835 secs left ... waiting ping -c 1 -w 3 mds02, 830 secs left ... waiting ping -c 1 -w 3 mds02, 825 secs left ... waiting ping -c 1 -w 3 mds02, 820 secs left ... waiting ping -c 1 -w 3 mds02, 815 secs left ... waiting ping -c 1 -w 3 mds02, 810 secs left ... waiting ping -c 1 -w 3 mds02, 805 secs left ... waiting ping -c 1 -w 3 mds02, 800 secs left ... waiting ping -c 1 -w 3 mds02, 795 secs left ... waiting ping -c 1 -w 3 mds02, 790 secs left ... 19:57:56 (1437793076) network interface is UP mount facets: mds2 Starting mds2: -o user_xattr,acl /dev/sdb1 /lustre/mds2 mds02: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 16383 mds02: Exit config file opensfs.sh Started lustre-MDT0001 ==== Checking the clients loads AFTER failover -- failure NOT OK mds2 has failed over 7 times, and counting... sleeping 374 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=11633 DURATION=86400 PERIOD=600 Wait mds3 recovery complete before doing next failover... affected facets: mds3 mds03: Exit config file opensfs.sh mds03: *.lustre-MDT0002.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c06: Exit config file opensfs.sh c03: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c06: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c02: Exit config file opensfs.sh c05: Exit config file opensfs.sh c01: Exit config file opensfs.sh c07: Exit config file opensfs.sh c04: Exit config file opensfs.sh c08: Exit config file opensfs.sh c02: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds3 Failing mds3 on mds03 + runas -u di.wang ssh 192.168.0.1 pm -0 mds03 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds03] Command completed successfully reboot facets: mds3 + runas -u di.wang ssh 192.168.0.1 pm -1 mds03 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds03] Command completed successfully Failover mds3 to mds03 20:04:44 (1437793484) waiting for mds03 network 900 secs ... waiting ping -c 1 -w 3 mds03, 895 secs left ... waiting ping -c 1 -w 3 mds03, 890 secs left ... waiting ping -c 1 -w 3 mds03, 885 secs left ... waiting ping -c 1 -w 3 mds03, 880 secs left ... waiting ping -c 1 -w 3 mds03, 875 secs left ... waiting ping -c 1 -w 3 mds03, 870 secs left ... waiting ping -c 1 -w 3 mds03, 865 secs left ... waiting ping -c 1 -w 3 mds03, 860 secs left ... waiting ping -c 1 -w 3 mds03, 855 secs left ... waiting ping -c 1 -w 3 mds03, 850 secs left ... waiting ping -c 1 -w 3 mds03, 845 secs left ... waiting ping -c 1 -w 3 mds03, 840 secs left ... waiting ping -c 1 -w 3 mds03, 835 secs left ... waiting ping -c 1 -w 3 mds03, 830 secs left ... waiting ping -c 1 -w 3 mds03, 825 secs left ... waiting ping -c 1 -w 3 mds03, 820 secs left ... waiting ping -c 1 -w 3 mds03, 815 secs left ... waiting ping -c 1 -w 3 mds03, 810 secs left ... waiting ping -c 1 -w 3 mds03, 805 secs left ... waiting ping -c 1 -w 3 mds03, 800 secs left ... waiting ping -c 1 -w 3 mds03, 795 secs left ... waiting ping -c 1 -w 3 mds03, 790 secs left ... 20:07:42 (1437793662) network interface is UP mount facets: mds3 Starting mds3: -o user_xattr,acl /dev/sda1 /lustre/mds3 mds03: mount.lustre: increased /sys/block/sda/queue/max_sectors_kb from 1024 to 16383 mds03: Exit config file opensfs.sh Started lustre-MDT0002 ==== Checking the clients loads AFTER failover -- failure NOT OK mds3 has failed over 4 times, and counting... sleeping 391 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=12216 DURATION=86400 PERIOD=600 Wait mds3 recovery complete before doing next failover... affected facets: mds3 mds03: Exit config file opensfs.sh mds03: *.lustre-MDT0002.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c06: Exit config file opensfs.sh c03: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c06: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c01: Exit config file opensfs.sh c08: Exit config file opensfs.sh c02: Exit config file opensfs.sh c05: Exit config file opensfs.sh c04: Exit config file opensfs.sh c07: Exit config file opensfs.sh c01: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c02: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds3 Failing mds3 on mds03 + runas -u di.wang ssh 192.168.0.1 pm -0 mds03 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds03] Command completed successfully reboot facets: mds3 + runas -u di.wang ssh 192.168.0.1 pm -1 mds03 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds03] Command completed successfully Failover mds3 to mds03 20:14:44 (1437794084) waiting for mds03 network 900 secs ... waiting ping -c 1 -w 3 mds03, 895 secs left ... waiting ping -c 1 -w 3 mds03, 890 secs left ... waiting ping -c 1 -w 3 mds03, 885 secs left ... waiting ping -c 1 -w 3 mds03, 880 secs left ... waiting ping -c 1 -w 3 mds03, 875 secs left ... waiting ping -c 1 -w 3 mds03, 870 secs left ... waiting ping -c 1 -w 3 mds03, 865 secs left ... waiting ping -c 1 -w 3 mds03, 860 secs left ... waiting ping -c 1 -w 3 mds03, 855 secs left ... waiting ping -c 1 -w 3 mds03, 850 secs left ... waiting ping -c 1 -w 3 mds03, 845 secs left ... waiting ping -c 1 -w 3 mds03, 840 secs left ... waiting ping -c 1 -w 3 mds03, 835 secs left ... waiting ping -c 1 -w 3 mds03, 830 secs left ... waiting ping -c 1 -w 3 mds03, 825 secs left ... waiting ping -c 1 -w 3 mds03, 820 secs left ... waiting ping -c 1 -w 3 mds03, 815 secs left ... waiting ping -c 1 -w 3 mds03, 810 secs left ... waiting ping -c 1 -w 3 mds03, 805 secs left ... waiting ping -c 1 -w 3 mds03, 800 secs left ... waiting ping -c 1 -w 3 mds03, 795 secs left ... waiting ping -c 1 -w 3 mds03, 790 secs left ... 20:17:42 (1437794262) network interface is UP mount facets: mds3 Starting mds3: -o user_xattr,acl /dev/sda1 /lustre/mds3 mds03: mount.lustre: increased /sys/block/sda/queue/max_sectors_kb from 1024 to 16383 mds03: Exit config file opensfs.sh Started lustre-MDT0002 ==== Checking the clients loads AFTER failover -- failure NOT OK mds3 has failed over 5 times, and counting... sleeping 387 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=12820 DURATION=86400 PERIOD=600 Wait mds2 recovery complete before doing next failover... affected facets: mds2 mds02: Exit config file opensfs.sh mds02: *.lustre-MDT0001.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c06: Exit config file opensfs.sh c03: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c06: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c02: Exit config file opensfs.sh c04: Exit config file opensfs.sh c05: Exit config file opensfs.sh c08: Exit config file opensfs.sh c01: Exit config file opensfs.sh c07: Exit config file opensfs.sh c02: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds2 Failing mds2 on mds02 + runas -u di.wang ssh 192.168.0.1 pm -0 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds02] Command completed successfully reboot facets: mds2 + runas -u di.wang ssh 192.168.0.1 pm -1 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds02] Command completed successfully Failover mds2 to mds02 20:25:00 (1437794700) waiting for mds02 network 900 secs ... waiting ping -c 1 -w 3 mds02, 895 secs left ... waiting ping -c 1 -w 3 mds02, 890 secs left ... waiting ping -c 1 -w 3 mds02, 885 secs left ... waiting ping -c 1 -w 3 mds02, 880 secs left ... waiting ping -c 1 -w 3 mds02, 875 secs left ... waiting ping -c 1 -w 3 mds02, 870 secs left ... waiting ping -c 1 -w 3 mds02, 865 secs left ... waiting ping -c 1 -w 3 mds02, 860 secs left ... waiting ping -c 1 -w 3 mds02, 855 secs left ... waiting ping -c 1 -w 3 mds02, 850 secs left ... waiting ping -c 1 -w 3 mds02, 845 secs left ... waiting ping -c 1 -w 3 mds02, 840 secs left ... waiting ping -c 1 -w 3 mds02, 835 secs left ... waiting ping -c 1 -w 3 mds02, 830 secs left ... waiting ping -c 1 -w 3 mds02, 825 secs left ... waiting ping -c 1 -w 3 mds02, 820 secs left ... waiting ping -c 1 -w 3 mds02, 815 secs left ... waiting ping -c 1 -w 3 mds02, 810 secs left ... waiting ping -c 1 -w 3 mds02, 805 secs left ... waiting ping -c 1 -w 3 mds02, 800 secs left ... waiting ping -c 1 -w 3 mds02, 795 secs left ... waiting ping -c 1 -w 3 mds02, 790 secs left ... 20:27:56 (1437794876) network interface is UP mount facets: mds2 Starting mds2: -o user_xattr,acl /dev/sdb1 /lustre/mds2 mds02: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 16383 mds02: Exit config file opensfs.sh Started lustre-MDT0001 ==== Checking the clients loads AFTER failover -- failure NOT OK mds2 has failed over 8 times, and counting... sleeping 375 seconds... d dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped d dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped d dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=13432 DURATION=86400 PERIOD=600 Wait mds2 recovery complete before doing next failover... affected facets: mds2 mds02: Exit config file opensfs.sh mds02: *.lustre-MDT0001.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c03: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c06: Exit config file opensfs.sh c06: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c08: Exit config file opensfs.sh c02: Exit config file opensfs.sh c01: Exit config file opensfs.sh c04: Exit config file opensfs.sh c05: Exit config file opensfs.sh c07: Exit config file opensfs.sh c08: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c02: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds2 Failing mds2 on mds02 + runas -u di.wang ssh 192.168.0.1 pm -0 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds02] Command completed successfully reboot facets: mds2 + runas -u di.wang ssh 192.168.0.1 pm -1 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds02] Command completed successfully Failover mds2 to mds02 20:35:00 (1437795300) waiting for mds02 network 900 secs ... waiting ping -c 1 -w 3 mds02, 895 secs left ... waiting ping -c 1 -w 3 mds02, 890 secs left ... waiting ping -c 1 -w 3 mds02, 885 secs left ... waiting ping -c 1 -w 3 mds02, 880 secs left ... waiting ping -c 1 -w 3 mds02, 875 secs left ... waiting ping -c 1 -w 3 mds02, 870 secs left ... waiting ping -c 1 -w 3 mds02, 865 secs left ... waiting ping -c 1 -w 3 mds02, 860 secs left ... waiting ping -c 1 -w 3 mds02, 855 secs left ... waiting ping -c 1 -w 3 mds02, 850 secs left ... waiting ping -c 1 -w 3 mds02, 845 secs left ... waiting ping -c 1 -w 3 mds02, 840 secs left ... waiting ping -c 1 -w 3 mds02, 835 secs left ... waiting ping -c 1 -w 3 mds02, 830 secs left ... waiting ping -c 1 -w 3 mds02, 825 secs left ... waiting ping -c 1 -w 3 mds02, 820 secs left ... waiting ping -c 1 -w 3 mds02, 815 secs left ... waiting ping -c 1 -w 3 mds02, 810 secs left ... waiting ping -c 1 -w 3 mds02, 805 secs left ... waiting ping -c 1 -w 3 mds02, 800 secs left ... waiting ping -c 1 -w 3 mds02, 795 secs left ... waiting ping -c 1 -w 3 mds02, 790 secs left ... 20:37:58 (1437795478) network interface is UP mount facets: mds2 Starting mds2: -o user_xattr,acl /dev/sdb1 /lustre/mds2 mds02: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 16383 mds02: Exit config file opensfs.sh Started lustre-MDT0001 ==== Checking the clients loads AFTER failover -- failure NOT OK mds2 has failed over 9 times, and counting... sleeping 371 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=14036 DURATION=86400 PERIOD=600 Wait mds3 recovery complete before doing next failover... affected facets: mds3 mds03: Exit config file opensfs.sh mds03: *.lustre-MDT0002.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c06: Exit config file opensfs.sh c03: Exit config file opensfs.sh c06: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c03: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c02: Exit config file opensfs.sh c05: Exit config file opensfs.sh c08: Exit config file opensfs.sh c01: Exit config file opensfs.sh c07: Exit config file opensfs.sh c04: Exit config file opensfs.sh c02: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds3 Failing mds3 on mds03 + runas -u di.wang ssh 192.168.0.1 pm -0 mds03 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds03] Command completed successfully reboot facets: mds3 + runas -u di.wang ssh 192.168.0.1 pm -1 mds03 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds03] Command completed successfully Failover mds3 to mds03 20:44:43 (1437795883) waiting for mds03 network 900 secs ... waiting ping -c 1 -w 3 mds03, 895 secs left ... waiting ping -c 1 -w 3 mds03, 890 secs left ... waiting ping -c 1 -w 3 mds03, 885 secs left ... waiting ping -c 1 -w 3 mds03, 880 secs left ... waiting ping -c 1 -w 3 mds03, 875 secs left ... waiting ping -c 1 -w 3 mds03, 870 secs left ... waiting ping -c 1 -w 3 mds03, 865 secs left ... waiting ping -c 1 -w 3 mds03, 860 secs left ... waiting ping -c 1 -w 3 mds03, 855 secs left ... waiting ping -c 1 -w 3 mds03, 850 secs left ... waiting ping -c 1 -w 3 mds03, 845 secs left ... waiting ping -c 1 -w 3 mds03, 840 secs left ... waiting ping -c 1 -w 3 mds03, 835 secs left ... waiting ping -c 1 -w 3 mds03, 830 secs left ... waiting ping -c 1 -w 3 mds03, 825 secs left ... waiting ping -c 1 -w 3 mds03, 820 secs left ... waiting ping -c 1 -w 3 mds03, 815 secs left ... waiting ping -c 1 -w 3 mds03, 810 secs left ... waiting ping -c 1 -w 3 mds03, 805 secs left ... waiting ping -c 1 -w 3 mds03, 800 secs left ... waiting ping -c 1 -w 3 mds03, 795 secs left ... waiting ping -c 1 -w 3 mds03, 790 secs left ... 20:47:40 (1437796060) network interface is UP mount facets: mds3 Starting mds3: -o user_xattr,acl /dev/sda1 /lustre/mds3 mds03: mount.lustre: increased /sys/block/sda/queue/max_sectors_kb from 1024 to 16383 mds03: Exit config file opensfs.sh Started lustre-MDT0002 ==== Checking the clients loads AFTER failover -- failure NOT OK mds3 has failed over 6 times, and counting... sleeping 393 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=14615 DURATION=86400 PERIOD=600 Wait mds3 recovery complete before doing next failover... affected facets: mds3 mds03: Exit config file opensfs.sh mds03: *.lustre-MDT0002.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c06: Exit config file opensfs.sh c03: Exit config file opensfs.sh c06: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c03: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c02: Exit config file opensfs.sh c05: Exit config file opensfs.sh c04: Exit config file opensfs.sh c07: Exit config file opensfs.sh c08: Exit config file opensfs.sh c01: Exit config file opensfs.sh c02: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds3 Failing mds3 on mds03 + runas -u di.wang ssh 192.168.0.1 pm -0 mds03 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds03] Command completed successfully reboot facets: mds3 + runas -u di.wang ssh 192.168.0.1 pm -1 mds03 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds03] Command completed successfully Failover mds3 to mds03 20:54:44 (1437796484) waiting for mds03 network 900 secs ... waiting ping -c 1 -w 3 mds03, 895 secs left ... waiting ping -c 1 -w 3 mds03, 890 secs left ... waiting ping -c 1 -w 3 mds03, 885 secs left ... waiting ping -c 1 -w 3 mds03, 880 secs left ... waiting ping -c 1 -w 3 mds03, 875 secs left ... waiting ping -c 1 -w 3 mds03, 870 secs left ... waiting ping -c 1 -w 3 mds03, 865 secs left ... waiting ping -c 1 -w 3 mds03, 860 secs left ... waiting ping -c 1 -w 3 mds03, 855 secs left ... waiting ping -c 1 -w 3 mds03, 850 secs left ... waiting ping -c 1 -w 3 mds03, 845 secs left ... waiting ping -c 1 -w 3 mds03, 840 secs left ... waiting ping -c 1 -w 3 mds03, 835 secs left ... waiting ping -c 1 -w 3 mds03, 830 secs left ... waiting ping -c 1 -w 3 mds03, 825 secs left ... waiting ping -c 1 -w 3 mds03, 820 secs left ... waiting ping -c 1 -w 3 mds03, 815 secs left ... waiting ping -c 1 -w 3 mds03, 810 secs left ... waiting ping -c 1 -w 3 mds03, 805 secs left ... waiting ping -c 1 -w 3 mds03, 800 secs left ... waiting ping -c 1 -w 3 mds03, 795 secs left ... waiting ping -c 1 -w 3 mds03, 790 secs left ... 20:57:40 (1437796660) network interface is UP mount facets: mds3 Starting mds3: -o user_xattr,acl /dev/sda1 /lustre/mds3 mds03: mount.lustre: increased /sys/block/sda/queue/max_sectors_kb from 1024 to 16383 mds03: Exit config file opensfs.sh Started lustre-MDT0002 ==== Checking the clients loads AFTER failover -- failure NOT OK mds3 has failed over 7 times, and counting... sleeping 390 seconds... triped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 striped dir -i2 -c4 /lustre/lustre/d0.dd-c03 st==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=15218 DURATION=86400 PERIOD=600 Wait mds1 recovery complete before doing next failover... affected facets: mds1 mds01: Exit config file opensfs.sh mds01: *.lustre-MDT0000.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c06: Exit config file opensfs.sh c03: Exit config file opensfs.sh c03: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c06: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c07: Exit config file opensfs.sh c05: Exit config file opensfs.sh c01: Exit config file opensfs.sh c08: Exit config file opensfs.sh c02: Exit config file opensfs.sh c04: Exit config file opensfs.sh c07: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c02: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds1 Failing mds1 on mds01 + runas -u di.wang ssh 192.168.0.1 pm -0 mds01 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds01] Command completed successfully reboot facets: mds1 + runas -u di.wang ssh 192.168.0.1 pm -1 mds01 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds01] Command completed successfully Failover mds1 to mds01 21:04:45 (1437797085) waiting for mds01 network 900 secs ... waiting ping -c 1 -w 3 mds01, 895 secs left ... waiting ping -c 1 -w 3 mds01, 890 secs left ... waiting ping -c 1 -w 3 mds01, 885 secs left ... waiting ping -c 1 -w 3 mds01, 880 secs left ... waiting ping -c 1 -w 3 mds01, 875 secs left ... waiting ping -c 1 -w 3 mds01, 870 secs left ... waiting ping -c 1 -w 3 mds01, 865 secs left ... waiting ping -c 1 -w 3 mds01, 860 secs left ... waiting ping -c 1 -w 3 mds01, 855 secs left ... waiting ping -c 1 -w 3 mds01, 850 secs left ... waiting ping -c 1 -w 3 mds01, 845 secs left ... waiting ping -c 1 -w 3 mds01, 840 secs left ... waiting ping -c 1 -w 3 mds01, 835 secs left ... waiting ping -c 1 -w 3 mds01, 830 secs left ... waiting ping -c 1 -w 3 mds01, 825 secs left ... waiting ping -c 1 -w 3 mds01, 820 secs left ... waiting ping -c 1 -w 3 mds01, 815 secs left ... waiting ping -c 1 -w 3 mds01, 810 secs left ... waiting ping -c 1 -w 3 mds01, 805 secs left ... waiting ping -c 1 -w 3 mds01, 800 secs left ... waiting ping -c 1 -w 3 mds01, 795 secs left ... waiting ping -c 1 -w 3 mds01, 790 secs left ... waiting ping -c 1 -w 3 mds01, 785 secs left ... 21:07:49 (1437797269) network interface is UP mount facets: mds1 Starting mds1: -o user_xattr,acl /dev/sdf1 /lustre/mds1 mds01: mount.lustre: increased /sys/block/sdf/queue/max_sectors_kb from 1024 to 16383 mds01: Exit config file opensfs.sh Started lustre-MDT0000 ==== Checking the clients loads AFTER failover -- failure NOT OK mds1 has failed over 4 times, and counting... sleeping 383 seconds... triped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 striped dir -i1 -c4 /lustre/lustre/d0.dd-c06 st==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=15826 DURATION=86400 PERIOD=600 Wait mds3 recovery complete before doing next failover... affected facets: mds3 mds03: Exit config file opensfs.sh mds03: *.lustre-MDT0002.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c06: Exit config file opensfs.sh c03: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c06: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c02: Exit config file opensfs.sh c04: Exit config file opensfs.sh c08: Exit config file opensfs.sh c07: Exit config file opensfs.sh c05: Exit config file opensfs.sh c02: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c01: Exit config file opensfs.sh c01: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds3 Failing mds3 on mds03 + runas -u di.wang ssh 192.168.0.1 pm -0 mds03 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds03] Command completed successfully reboot facets: mds3 + runas -u di.wang ssh 192.168.0.1 pm -1 mds03 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds03] Command completed successfully Failover mds3 to mds03 21:14:46 (1437797686) waiting for mds03 network 900 secs ... waiting ping -c 1 -w 3 mds03, 895 secs left ... waiting ping -c 1 -w 3 mds03, 890 secs left ... waiting ping -c 1 -w 3 mds03, 885 secs left ... waiting ping -c 1 -w 3 mds03, 880 secs left ... waiting ping -c 1 -w 3 mds03, 875 secs left ... waiting ping -c 1 -w 3 mds03, 870 secs left ... waiting ping -c 1 -w 3 mds03, 865 secs left ... waiting ping -c 1 -w 3 mds03, 860 secs left ... waiting ping -c 1 -w 3 mds03, 855 secs left ... waiting ping -c 1 -w 3 mds03, 850 secs left ... waiting ping -c 1 -w 3 mds03, 845 secs left ... waiting ping -c 1 -w 3 mds03, 840 secs left ... waiting ping -c 1 -w 3 mds03, 835 secs left ... waiting ping -c 1 -w 3 mds03, 830 secs left ... waiting ping -c 1 -w 3 mds03, 825 secs left ... waiting ping -c 1 -w 3 mds03, 820 secs left ... waiting ping -c 1 -w 3 mds03, 815 secs left ... waiting ping -c 1 -w 3 mds03, 810 secs left ... waiting ping -c 1 -w 3 mds03, 805 secs left ... waiting ping -c 1 -w 3 mds03, 800 secs left ... waiting ping -c 1 -w 3 mds03, 795 secs left ... waiting ping -c 1 -w 3 mds03, 790 secs left ... 21:17:44 (1437797864) network interface is UP mount facets: mds3 Starting mds3: -o user_xattr,acl /dev/sda1 /lustre/mds3 mds03: mount.lustre: increased /sys/block/sda/queue/max_sectors_kb from 1024 to 16383 mds03: Exit config file opensfs.sh Started lustre-MDT0002 ==== Checking the clients loads AFTER failover -- failure NOT OK mds3 has failed over 8 times, and counting... sleeping 388 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=16422 DURATION=86400 PERIOD=600 Wait mds2 recovery complete before doing next failover... affected facets: mds2 mds02: Exit config file opensfs.sh mds02: *.lustre-MDT0001.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c06: Exit config file opensfs.sh c06: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c03: Exit config file opensfs.sh c03: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c02: Exit config file opensfs.sh c01: Exit config file opensfs.sh c07: Exit config file opensfs.sh c08: Exit config file opensfs.sh c04: Exit config file opensfs.sh c05: Exit config file opensfs.sh c01: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c02: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds2 Failing mds2 on mds02 + runas -u di.wang ssh 192.168.0.1 pm -0 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds02] Command completed successfully reboot facets: mds2 + runas -u di.wang ssh 192.168.0.1 pm -1 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds02] Command completed successfully Failover mds2 to mds02 21:24:50 (1437798290) waiting for mds02 network 900 secs ... waiting ping -c 1 -w 3 mds02, 895 secs left ... waiting ping -c 1 -w 3 mds02, 890 secs left ... waiting ping -c 1 -w 3 mds02, 885 secs left ... waiting ping -c 1 -w 3 mds02, 880 secs left ... waiting ping -c 1 -w 3 mds02, 875 secs left ... waiting ping -c 1 -w 3 mds02, 870 secs left ... waiting ping -c 1 -w 3 mds02, 865 secs left ... waiting ping -c 1 -w 3 mds02, 860 secs left ... waiting ping -c 1 -w 3 mds02, 855 secs left ... waiting ping -c 1 -w 3 mds02, 850 secs left ... waiting ping -c 1 -w 3 mds02, 845 secs left ... waiting ping -c 1 -w 3 mds02, 840 secs left ... waiting ping -c 1 -w 3 mds02, 835 secs left ... waiting ping -c 1 -w 3 mds02, 830 secs left ... waiting ping -c 1 -w 3 mds02, 825 secs left ... waiting ping -c 1 -w 3 mds02, 820 secs left ... waiting ping -c 1 -w 3 mds02, 815 secs left ... waiting ping -c 1 -w 3 mds02, 810 secs left ... waiting ping -c 1 -w 3 mds02, 805 secs left ... waiting ping -c 1 -w 3 mds02, 800 secs left ... waiting ping -c 1 -w 3 mds02, 795 secs left ... waiting ping -c 1 -w 3 mds02, 790 secs left ... 21:27:48 (1437798468) network interface is UP mount facets: mds2 Starting mds2: -o user_xattr,acl /dev/sdb1 /lustre/mds2 mds02: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 16383 mds02: Exit config file opensfs.sh Started lustre-MDT0001 ==== Checking the clients loads AFTER failover -- failure NOT OK mds2 has failed over 10 times, and counting... sleeping 384 seconds... dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i2 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i1 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped dir -i3 -c4 /lustre/lustre/d0.tar-c01 striped dir -i0 -c4 /lustre/lustre/d0.tar-c01 striped didir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped didir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped di==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=17026 DURATION=86400 PERIOD=600 Wait mds3 recovery complete before doing next failover... affected facets: mds3 mds03: Exit config file opensfs.sh mds03: *.lustre-MDT0002.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c03: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c06: Exit config file opensfs.sh c06: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c02: Exit config file opensfs.sh c08: Exit config file opensfs.sh c05: Exit config file opensfs.sh c07: Exit config file opensfs.sh c01: Exit config file opensfs.sh c04: Exit config file opensfs.sh c02: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds3 Failing mds3 on mds03 + runas -u di.wang ssh 192.168.0.1 pm -0 mds03 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds03] Command completed successfully reboot facets: mds3 + runas -u di.wang ssh 192.168.0.1 pm -1 mds03 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds03] Command completed successfully Failover mds3 to mds03 21:34:46 (1437798886) waiting for mds03 network 900 secs ... waiting ping -c 1 -w 3 mds03, 895 secs left ... waiting ping -c 1 -w 3 mds03, 890 secs left ... waiting ping -c 1 -w 3 mds03, 885 secs left ... waiting ping -c 1 -w 3 mds03, 880 secs left ... waiting ping -c 1 -w 3 mds03, 875 secs left ... waiting ping -c 1 -w 3 mds03, 870 secs left ... waiting ping -c 1 -w 3 mds03, 865 secs left ... waiting ping -c 1 -w 3 mds03, 860 secs left ... waiting ping -c 1 -w 3 mds03, 855 secs left ... waiting ping -c 1 -w 3 mds03, 850 secs left ... waiting ping -c 1 -w 3 mds03, 845 secs left ... waiting ping -c 1 -w 3 mds03, 840 secs left ... waiting ping -c 1 -w 3 mds03, 835 secs left ... waiting ping -c 1 -w 3 mds03, 830 secs left ... waiting ping -c 1 -w 3 mds03, 825 secs left ... waiting ping -c 1 -w 3 mds03, 820 secs left ... waiting ping -c 1 -w 3 mds03, 815 secs left ... waiting ping -c 1 -w 3 mds03, 810 secs left ... waiting ping -c 1 -w 3 mds03, 805 secs left ... waiting ping -c 1 -w 3 mds03, 800 secs left ... waiting ping -c 1 -w 3 mds03, 795 secs left ... waiting ping -c 1 -w 3 mds03, 790 secs left ... 21:37:44 (1437799064) network interface is UP mount facets: mds3 Starting mds3: -o user_xattr,acl /dev/sda1 /lustre/mds3 mds03: mount.lustre: increased /sys/block/sda/queue/max_sectors_kb from 1024 to 16383 mds03: Exit config file opensfs.sh Started lustre-MDT0002 ==== Checking the clients loads AFTER failover -- failure NOT OK mds3 has failed over 9 times, and counting... sleeping 392 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=17618 DURATION=86400 PERIOD=600 Wait mds1 recovery complete before doing next failover... affected facets: mds1 mds01: Exit config file opensfs.sh mds01: *.lustre-MDT0000.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c06: Exit config file opensfs.sh c03: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c06: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c02: Exit config file opensfs.sh c08: Exit config file opensfs.sh c05: Exit config file opensfs.sh c07: Exit config file opensfs.sh c04: Exit config file opensfs.sh c02: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c01: Exit config file opensfs.sh c05: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds1 Failing mds1 on mds01 + runas -u di.wang ssh 192.168.0.1 pm -0 mds01 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds01] Command completed successfully reboot facets: mds1 + runas -u di.wang ssh 192.168.0.1 pm -1 mds01 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds01] Command completed successfully Failover mds1 to mds01 21:44:46 (1437799486) waiting for mds01 network 900 secs ... waiting ping -c 1 -w 3 mds01, 895 secs left ... waiting ping -c 1 -w 3 mds01, 890 secs left ... waiting ping -c 1 -w 3 mds01, 885 secs left ... waiting ping -c 1 -w 3 mds01, 880 secs left ... waiting ping -c 1 -w 3 mds01, 875 secs left ... waiting ping -c 1 -w 3 mds01, 870 secs left ... waiting ping -c 1 -w 3 mds01, 865 secs left ... waiting ping -c 1 -w 3 mds01, 860 secs left ... waiting ping -c 1 -w 3 mds01, 855 secs left ... waiting ping -c 1 -w 3 mds01, 850 secs left ... waiting ping -c 1 -w 3 mds01, 845 secs left ... waiting ping -c 1 -w 3 mds01, 840 secs left ... waiting ping -c 1 -w 3 mds01, 835 secs left ... waiting ping -c 1 -w 3 mds01, 830 secs left ... waiting ping -c 1 -w 3 mds01, 825 secs left ... waiting ping -c 1 -w 3 mds01, 820 secs left ... waiting ping -c 1 -w 3 mds01, 815 secs left ... waiting ping -c 1 -w 3 mds01, 810 secs left ... waiting ping -c 1 -w 3 mds01, 805 secs left ... waiting ping -c 1 -w 3 mds01, 800 secs left ... waiting ping -c 1 -w 3 mds01, 795 secs left ... waiting ping -c 1 -w 3 mds01, 790 secs left ... waiting ping -c 1 -w 3 mds01, 785 secs left ... 21:47:50 (1437799670) network interface is UP mount facets: mds1 Starting mds1: -o user_xattr,acl /dev/sdf1 /lustre/mds1 mds01: mount.lustre: increased /sys/block/sdf/queue/max_sectors_kb from 1024 to 16383 mds01: Exit config file opensfs.sh Started lustre-MDT0000 ==== Checking the clients loads AFTER failover -- failure NOT OK mds1 has failed over 5 times, and counting... sleeping 382 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=18229 DURATION=86400 PERIOD=600 Wait mds1 recovery complete before doing next failover... affected facets: mds1 mds01: Exit config file opensfs.sh mds01: *.lustre-MDT0000.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c03: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c06: Exit config file opensfs.sh c06: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c08: Exit config file opensfs.sh c05: Exit config file opensfs.sh c07: Exit config file opensfs.sh c04: Exit config file opensfs.sh c02: Exit config file opensfs.sh c01: Exit config file opensfs.sh c08: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c02: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds1 Failing mds1 on mds01 + runas -u di.wang ssh 192.168.0.1 pm -0 mds01 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds01] Command completed successfully reboot facets: mds1 + runas -u di.wang ssh 192.168.0.1 pm -1 mds01 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds01] Command completed successfully Failover mds1 to mds01 21:54:48 (1437800088) waiting for mds01 network 900 secs ... waiting ping -c 1 -w 3 mds01, 895 secs left ... waiting ping -c 1 -w 3 mds01, 890 secs left ... waiting ping -c 1 -w 3 mds01, 885 secs left ... waiting ping -c 1 -w 3 mds01, 880 secs left ... waiting ping -c 1 -w 3 mds01, 875 secs left ... waiting ping -c 1 -w 3 mds01, 870 secs left ... waiting ping -c 1 -w 3 mds01, 865 secs left ... waiting ping -c 1 -w 3 mds01, 860 secs left ... waiting ping -c 1 -w 3 mds01, 855 secs left ... waiting ping -c 1 -w 3 mds01, 850 secs left ... waiting ping -c 1 -w 3 mds01, 845 secs left ... waiting ping -c 1 -w 3 mds01, 840 secs left ... waiting ping -c 1 -w 3 mds01, 835 secs left ... waiting ping -c 1 -w 3 mds01, 830 secs left ... waiting ping -c 1 -w 3 mds01, 825 secs left ... waiting ping -c 1 -w 3 mds01, 820 secs left ... waiting ping -c 1 -w 3 mds01, 815 secs left ... waiting ping -c 1 -w 3 mds01, 810 secs left ... waiting ping -c 1 -w 3 mds01, 805 secs left ... waiting ping -c 1 -w 3 mds01, 800 secs left ... waiting ping -c 1 -w 3 mds01, 795 secs left ... waiting ping -c 1 -w 3 mds01, 790 secs left ... waiting ping -c 1 -w 3 mds01, 785 secs left ... 21:57:52 (1437800272) network interface is UP mount facets: mds1 Starting mds1: -o user_xattr,acl /dev/sdf1 /lustre/mds1 mds01: mount.lustre: increased /sys/block/sdf/queue/max_sectors_kb from 1024 to 16383 mds01: Exit config file opensfs.sh Started lustre-MDT0000 ==== Checking the clients loads AFTER failover -- failure NOT OK mds1 has failed over 6 times, and counting... sleeping 381 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=18831 DURATION=86400 PERIOD=600 Wait mds4 recovery complete before doing next failover... affected facets: mds4 mds04: Exit config file opensfs.sh mds04: *.lustre-MDT0003.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c06: Exit config file opensfs.sh c03: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c06: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c08: Exit config file opensfs.sh c07: Exit config file opensfs.sh c02: Exit config file opensfs.sh c01: Exit config file opensfs.sh c05: Exit config file opensfs.sh c04: Exit config file opensfs.sh c08: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c02: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds4 Failing mds4 on mds04 + runas -u di.wang ssh 192.168.0.1 pm -0 mds04 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds04] Command completed successfully reboot facets: mds4 + runas -u di.wang ssh 192.168.0.1 pm -1 mds04 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds04] Command completed successfully Failover mds4 to mds04 22:04:48 (1437800688) waiting for mds04 network 900 secs ... waiting ping -c 1 -w 3 mds04, 895 secs left ... waiting ping -c 1 -w 3 mds04, 890 secs left ... waiting ping -c 1 -w 3 mds04, 885 secs left ... waiting ping -c 1 -w 3 mds04, 880 secs left ... waiting ping -c 1 -w 3 mds04, 875 secs left ... waiting ping -c 1 -w 3 mds04, 870 secs left ... waiting ping -c 1 -w 3 mds04, 865 secs left ... waiting ping -c 1 -w 3 mds04, 860 secs left ... waiting ping -c 1 -w 3 mds04, 855 secs left ... waiting ping -c 1 -w 3 mds04, 850 secs left ... waiting ping -c 1 -w 3 mds04, 845 secs left ... waiting ping -c 1 -w 3 mds04, 840 secs left ... waiting ping -c 1 -w 3 mds04, 835 secs left ... waiting ping -c 1 -w 3 mds04, 830 secs left ... waiting ping -c 1 -w 3 mds04, 825 secs left ... waiting ping -c 1 -w 3 mds04, 820 secs left ... waiting ping -c 1 -w 3 mds04, 815 secs left ... waiting ping -c 1 -w 3 mds04, 810 secs left ... waiting ping -c 1 -w 3 mds04, 805 secs left ... waiting ping -c 1 -w 3 mds04, 800 secs left ... waiting ping -c 1 -w 3 mds04, 795 secs left ... waiting ping -c 1 -w 3 mds04, 790 secs left ... 22:07:46 (1437800866) network interface is UP mount facets: mds4 Starting mds4: -o user_xattr,acl /dev/sdb1 /lustre/mds4 mds04: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 16383 mds04: Exit config file opensfs.sh Started lustre-MDT0003 ==== Checking the clients loads AFTER failover -- failure NOT OK mds4 has failed over 8 times, and counting... sleeping 388 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=19424 DURATION=86400 PERIOD=600 Wait mds2 recovery complete before doing next failover... affected facets: mds2 mds02: Exit config file opensfs.sh mds02: *.lustre-MDT0001.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c03: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c06: Exit config file opensfs.sh c05: Exit config file opensfs.sh c06: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c04: Exit config file opensfs.sh c07: Exit config file opensfs.sh c08: Exit config file opensfs.sh c01: Exit config file opensfs.sh c02: Exit config file opensfs.sh c05: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c02: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds2 Failing mds2 on mds02 + runas -u di.wang ssh 192.168.0.1 pm -0 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds02] Command completed successfully reboot facets: mds2 + runas -u di.wang ssh 192.168.0.1 pm -1 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds02] Command completed successfully Failover mds2 to mds02 22:15:06 (1437801306) waiting for mds02 network 900 secs ... waiting ping -c 1 -w 3 mds02, 895 secs left ... waiting ping -c 1 -w 3 mds02, 890 secs left ... waiting ping -c 1 -w 3 mds02, 885 secs left ... waiting ping -c 1 -w 3 mds02, 880 secs left ... waiting ping -c 1 -w 3 mds02, 875 secs left ... waiting ping -c 1 -w 3 mds02, 870 secs left ... waiting ping -c 1 -w 3 mds02, 865 secs left ... waiting ping -c 1 -w 3 mds02, 860 secs left ... waiting ping -c 1 -w 3 mds02, 855 secs left ... waiting ping -c 1 -w 3 mds02, 850 secs left ... waiting ping -c 1 -w 3 mds02, 845 secs left ... waiting ping -c 1 -w 3 mds02, 840 secs left ... waiting ping -c 1 -w 3 mds02, 835 secs left ... waiting ping -c 1 -w 3 mds02, 830 secs left ... waiting ping -c 1 -w 3 mds02, 825 secs left ... waiting ping -c 1 -w 3 mds02, 820 secs left ... waiting ping -c 1 -w 3 mds02, 815 secs left ... waiting ping -c 1 -w 3 mds02, 810 secs left ... waiting ping -c 1 -w 3 mds02, 805 secs left ... waiting ping -c 1 -w 3 mds02, 800 secs left ... waiting ping -c 1 -w 3 mds02, 795 secs left ... waiting ping -c 1 -w 3 mds02, 790 secs left ... 22:18:02 (1437801482) network interface is UP mount facets: mds2 Starting mds2: -o user_xattr,acl /dev/sdb1 /lustre/mds2 mds02: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 16383 mds02: Exit config file opensfs.sh Started lustre-MDT0001 ==== Checking the clients loads AFTER failover -- failure NOT OK mds2 has failed over 11 times, and counting... sleeping 373 seconds... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=20039 DURATION=86400 PERIOD=600 Wait mds2 recovery complete before doing next failover... affected facets: mds2 mds02: Exit config file opensfs.sh mds02: *.lustre-MDT0001.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover... c03: Exit config file opensfs.sh c06: Exit config file opensfs.sh c03: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c06: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c02: Exit config file opensfs.sh c01: Exit config file opensfs.sh c08: Exit config file opensfs.sh c05: Exit config file opensfs.sh c04: Exit config file opensfs.sh c07: Exit config file opensfs.sh c02: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c01: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c08: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c05: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c04: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec c07: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec Starting failover on mds2 Failing mds2 on mds02 + runas -u di.wang ssh 192.168.0.1 pm -0 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-0] [mds02] Command completed successfully reboot facets: mds2 + runas -u di.wang ssh 192.168.0.1 pm -1 mds02 running as uid/gid/euid/egid 503/503/503/503, groups: [ssh] [192.168.0.1] [pm] [-1] [mds02] Command completed successfully Failover mds2 to mds02 22:25:05 (1437801905) waiting for mds02 network 900 secs ... waiting ping -c 1 -w 3 mds02, 895 secs left ... waiting ping -c 1 -w 3 mds02, 890 secs left ... waiting ping -c 1 -w 3 mds02, 885 secs left ... waiting ping -c 1 -w 3 mds02, 880 secs left ... waiting ping -c 1 -w 3 mds02, 875 secs left ... waiting ping -c 1 -w 3 mds02, 870 secs left ... waiting ping -c 1 -w 3 mds02, 865 secs left ... waiting ping -c 1 -w 3 mds02, 860 secs left ... waiting ping -c 1 -w 3 mds02, 855 secs left ... waiting ping -c 1 -w 3 mds02, 850 secs left ... waiting ping -c 1 -w 3 mds02, 845 secs left ... waiting ping -c 1 -w 3 mds02, 840 secs left ... waiting ping -c 1 -w 3 mds02, 835 secs left ... waiting ping -c 1 -w 3 mds02, 830 secs left ... waiting ping -c 1 -w 3 mds02, 825 secs left ... waiting ping -c 1 -w 3 mds02, 820 secs left ... waiting ping -c 1 -w 3 mds02, 815 secs left ... waiting ping -c 1 -w 3 mds02, 810 secs left ... waiting ping -c 1 -w 3 mds02, 805 secs left ... waiting ping -c 1 -w 3 mds02, 800 secs left ... waiting ping -c 1 -w 3 mds02, 795 secs left ... waiting ping -c 1 -w 3 mds02, 790 secs left ... 22:28:02 (1437802082) network interface is UP mount facets: mds2 Starting mds2: -o user_xattr,acl /dev/sdb1 /lustre/mds2 mds02: mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 1024 to 16383 mds02: Exit config file opensfs.sh Started lustre-MDT0001 ==== Checking the clients loads AFTER failover -- failure NOT OK WARNING: failover and two check_client_loads time exceeded SERVER_FAILOVER_PERIOD - MINSLEEP! Failed to load the filesystem with I/O for a minimum period of 120 1 times ( REQFAIL=4 ). This iteration, the load was only applied for sleep=-1237 seconds. Estimated max recovery time: 1475 Probably the hardware is taking excessively long time to boot. Try to increase SERVER_FAILOVER_PERIOD (current is 600), bug 20918 mds2 has failed over 12 times, and counting... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=22249 DURATION=86400 PERIOD=600 Wait mds2 recovery complete before doing next failover... affected facets: mds2 mds02: Exit config file opensfs.sh mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1470 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1465 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1460 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1455 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1450 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1445 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1440 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1435 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1430 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1425 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1420 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1415 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1410 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1405 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1400 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1395 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1390 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1385 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1380 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1375 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1370 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1365 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1360 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1355 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1350 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1345 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1340 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1335 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1330 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1325 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1320 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1315 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1310 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1305 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1300 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1295 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1290 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1285 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1280 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1275 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1270 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1265 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1260 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1255 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1250 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1245 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1240 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1235 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1230 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1225 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1220 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1215 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1210 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1205 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1200 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1195 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1190 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1185 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1180 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1175 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1170 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1165 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1160 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1155 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1150 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1145 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1140 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1135 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1130 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1125 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1120 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1115 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1110 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1105 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1100 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1095 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1090 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1085 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1080 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1075 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1070 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1065 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1060 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1055 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1050 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1045 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1040 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1035 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1030 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1025 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1020 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1015 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1010 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1005 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 1000 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 995 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 990 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 985 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 980 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 975 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 970 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 965 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 960 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 955 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 950 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 945 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 940 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 935 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 930 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 925 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 920 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 915 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 910 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 905 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 900 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 895 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 890 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 885 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 880 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 875 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 870 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 865 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 860 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 855 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 850 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 845 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 840 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 835 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 830 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 825 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 820 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 815 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 810 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 805 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 800 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 795 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 790 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 785 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 780 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 775 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 770 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 765 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 760 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 755 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 750 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 745 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 740 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 735 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 730 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 725 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 720 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 715 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 710 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 705 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 700 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 695 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 690 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 685 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 680 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 675 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 670 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 665 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 660 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 655 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 650 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 645 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 640 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 635 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 630 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 625 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 620 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 615 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 610 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 605 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 600 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 595 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 590 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 585 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 580 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 575 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 570 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 565 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 560 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 555 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 550 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 545 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 540 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 535 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 530 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 525 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 520 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 515 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 510 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 505 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 500 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 495 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 490 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 485 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 480 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 475 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 470 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 465 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 460 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 455 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 450 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 445 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 440 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 435 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 430 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 425 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 420 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 415 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 410 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 405 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 400 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 395 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 390 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 385 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 380 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 375 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 370 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 365 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 360 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 355 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 350 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 345 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 340 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 335 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 330 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 325 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 320 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 315 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 310 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 305 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 300 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 295 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 290 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 285 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 280 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 275 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 270 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 265 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 260 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 255 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 250 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 245 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 240 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 235 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 230 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 225 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 220 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 215 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 210 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 205 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 200 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 195 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 190 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 185 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 180 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 175 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 170 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 165 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 160 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 155 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 150 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 145 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 140 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 135 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 130 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 125 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 120 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 115 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 110 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 105 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 100 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 95 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 90 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 85 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 80 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 75 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 70 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 65 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 60 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 55 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 50 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 45 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 40 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 35 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 30 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 25 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 20 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 15 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 10 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 5 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status status: RECOVERING mds02: Waiting 0 secs for *.lustre-MDT0001.recovery_status recovery done. status: RECOVERING mds02: *.lustre-MDT0001.recovery_status recovery not done in 1475 sec. status: RECOVERING mds2 recovery is not completed! 2015-07-24 23:19:51 Terminating clients loads ... Duration: 86400 Server failover period: 600 seconds Exited after: 22249 seconds Number of failovers before exit: mds1: 6 times mds2: 12 times mds3: 9 times mds4: 8 times ost1: 0 times ost2: 0 times ost3: 0 times ost4: 0 times Status: FAIL: rc=7 r -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i1 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i3 -c4 /lustre/lustre/d0.tar-c04 striped dir -i2 -c4 /lustre/lustre/d0.tar-c04 striped dir -i0 -c4 /lustre/lustre/d0.tar-c04 status script Total(sec) E(xcluded) S(low) ------------------------------------------r -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i0 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 striped dir -i2 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i3 -c4 /lustre/lustre/d0.tar-c07 striped dir -i1 -c4 /lustre/lustre/d0.tar-c07 status script Total(sec) E(xcluded) S(low) ------------------------------------------/usr/lib64/lustre/tests/recovery-mds-scale.sh: line 103: 10055 Killed do_node $client "PATH=$PATH MOUNT=$MOUNT ERRORS_OK=$ERRORS_OK BREAK_ON_ERROR=$BREAK_ON_ERROR END_RUN_FILE=$END_RUN_FILE LOAD_PID_FILE=$LOAD_PID_FILE TESTLOG_PREFIX=$TESTLOG_PREFIX TESTNAME=$TESTNAME DBENCH_LIB=$DBENCH_LIB DBENCH_SRC=$DBENCH_SRC CLIENT_COUNT=$((CLIENTCOUNT - 1)) LFS=$LFS MDSCOUNT=$MDSCOUNT NODENUM=$nodenum run_${load}.sh" /usr/lib64/lustre/tests/recovery-mds-scale.sh: line 103: 10202 Killed do_node $client "PATH=$PATH MOUNT=$MOUNT ERRORS_OK=$ERRORS_OK BREAK_ON_ERROR=$BREAK_ON_ERROR END_RUN_FILE=$END_RUN_FILE LOAD_PID_FILE=$LOAD_PID_FILE TESTLOG_PREFIX=$TESTLOG_PREFIX TESTNAME=$TESTNAME DBENCH_LIB=$DBENCH_LIB DBENCH_SRC=$DBENCH_SRC CLIENT_COUNT=$((CLIENTCOUNT - 1)) LFS=$LFS MDSCOUNT=$MDSCOUNT NODENUM=$nodenum run_${load}.sh" /usr/lib64/lustre/tests/recovery-mds-scale.sh: line 103: 10349 Killed do_node $client "PATH=$PATH MOUNT=$MOUNT ERRORS_OK=$ERRORS_OK BREAK_ON_ERROR=$BREAK_ON_ERROR END_RUN_FILE=$END_RUN_FILE LOAD_PID_FILE=$LOAD_PID_FILE TESTLOG_PREFIX=$TESTLOG_PREFIX TESTNAME=$TESTNAME DBENCH_LIB=$DBENCH_LIB DBENCH_SRC=$DBENCH_SRC CLIENT_COUNT=$((CLIENTCOUNT - 1)) LFS=$LFS MDSCOUNT=$MDSCOUNT NODENUM=$nodenum run_${load}.sh" /usr/lib64/lustre/tests/recovery-mds-scale.sh: line 103: 10496 Killed do_node $client "PATH=$PATH MOUNT=$MOUNT ERRORS_OK=$ERRORS_OK BREAK_ON_ERROR=$BREAK_ON_ERROR END_RUN_FILE=$END_RUN_FILE LOAD_PID_FILE=$LOAD_PID_FILE TESTLOG_PREFIX=$TESTLOG_PREFIX TESTNAME=$TESTNAME DBENCH_LIB=$DBENCH_LIB DBENCH_SRC=$DBENCH_SRC CLIENT_COUNT=$((CLIENTCOUNT - 1)) LFS=$LFS MDSCOUNT=$MDSCOUNT NODENUM=$nodenum run_${load}.sh" /usr/lib64/lustre/tests/recovery-mds-scale.sh: line 103: 10643 Killed do_node $client "PATH=$PATH MOUNT=$MOUNT ERRORS_OK=$ERRORS_OK BREAK_ON_ERROR=$BREAK_ON_ERROR END_RUN_FILE=$END_RUN_FILE LOAD_PID_FILE=$LOAD_PID_FILE TESTLOG_PREFIX=$TESTLOG_PREFIX TESTNAME=$TESTNAME DBENCH_LIB=$DBENCH_LIB DBENCH_SRC=$DBENCH_SRC CLIENT_COUNT=$((CLIENTCOUNT - 1)) LFS=$LFS MDSCOUNT=$MDSCOUNT NODENUM=$nodenum run_${load}.sh" /usr/lib64/lustre/tests/recovery-mds-scale.sh: line 103: 10790 Killed do_node $client "PATH=$PATH MOUNT=$MOUNT ERRORS_OK=$ERRORS_OK BREAK_ON_ERROR=$BREAK_ON_ERROR END_RUN_FILE=$END_RUN_FILE LOAD_PID_FILE=$LOAD_PID_FILE TESTLOG_PREFIX=$TESTLOG_PREFIX TESTNAME=$TESTNAME DBENCH_LIB=$DBENCH_LIB DBENCH_SRC=$DBENCH_SRC CLIENT_COUNT=$((CLIENTCOUNT - 1)) LFS=$LFS MDSCOUNT=$MDSCOUNT NODENUM=$nodenum run_${load}.sh" /usr/lib64/lustre/tests/recovery-mds-scale.sh: line 103: 10937 Killed do_node $client "PATH=$PATH MOUNT=$MOUNT ERRORS_OK=$ERRORS_OK BREAK_ON_ERROR=$BREAK_ON_ERROR END_RUN_FILE=$END_RUN_FILE LOAD_PID_FILE=$LOAD_PID_FILE TESTLOG_PREFIX=$TESTLOG_PREFIX TESTNAME=$TESTNAME DBENCH_LIB=$DBENCH_LIB DBENCH_SRC=$DBENCH_SRC CLIENT_COUNT=$((CLIENTCOUNT - 1)) LFS=$LFS MDSCOUNT=$MDSCOUNT NODENUM=$nodenum run_${load}.sh" /usr/lib64/lustre/tests/recovery-mds-scale.sh: line 103: 11084 Killed do_node $client "PATH=$PATH MOUNT=$MOUNT ERRORS_OK=$ERRORS_OK BREAK_ON_ERROR=$BREAK_ON_ERROR END_RUN_FILE=$END_RUN_FILE LOAD_PID_FILE=$LOAD_PID_FILE TESTLOG_PREFIX=$TESTLOG_PREFIX TESTNAME=$TESTNAME DBENCH_LIB=$DBENCH_LIB DBENCH_SRC=$DBENCH_SRC CLIENT_COUNT=$((CLIENTCOUNT - 1)) LFS=$LFS MDSCOUNT=$MDSCOUNT NODENUM=$nodenum run_${load}.sh" Dumping lctl log to /tmp/test_logs/2015-07-24/163748/recovery-mds-scale.test_failover_mds.*.1437805191.log mds02: Host key verification failed. mds02: rsync: connection unexpectedly closed (0 bytes received so far) [sender] mds02: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6] oss02: Host key verification failed. oss02: rsync: connection unexpectedly closed (0 bytes received so far) [sender] oss02: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6] mds03: Host key verification failed. mds03: rsync: connection unexpectedly closed (0 bytes received so far) [sender] mds03: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6] mds01: Host key verification failed. mds01: rsync: connection unexpectedly closed (0 bytes received so far) [sender] mds01: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6] mds04: Host key verification failed. mds04: rsync: connection unexpectedly closed (0 bytes received so far) [sender] mds04: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6] oss01: Host key verification failed. oss01: rsync: connection unexpectedly closed (0 bytes received so far) [sender] oss01: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6] test_failover_mds returned 255 FAIL failover_mds (23739s)