Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.3.0
-
None
-
lustre-master build #353 RHEL6-x8_64 for both server and client
-
3
-
10215
Description
Running recovery-mds-scale FLAVOR=mds for about 2 hours(MDS fail over 14 times), network is not available for standby MDS server and it cannot be access after that even doing power cycle. I got this similar issue twice.
==== Checking the clients loads AFTER failover -- failure NOT OK mds1 has failed over 14 times, and counting... sleeping 501 seconds ... ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=7904 DURATION=86400 PERIOD=600 Wait mds1 recovery complete before doing next failover .... affected facets: mds1 client-6: *.lustre-MDT0000.recovery_status status: COMPLETE Checking clients are in FULL state before doing next failover client-18: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec client-12: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec client-18: cannot run remote command on client-12,client-13,client-17,client-18 with client-12: cannot run remote command on client-12,client-13,client-17,client-18 with client-17: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec client-17: cannot run remote command on client-12,client-13,client-17,client-18 with client-13: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec client-13: cannot run remote command on client-12,client-13,client-17,client-18 with Starting failover on mds1 Failing mds1 on node client-6 + pm -h powerman --off client-6 Command completed successfully affected facets: mds1 + pm -h powerman --on client-6 Command completed successfully Failover mds1 to client-2 15:35:30 (1322609730) waiting for client-2 network 900 secs ... waiting ping -c 1 -w 3 client-2, 895 secs left ... waiting ping -c 1 -w 3 client-2, 890 secs left ... waiting ping -c 1 -w 3 client-2, 885 secs left ... waiting ping -c 1 -w 3 client-2, 880 secs left ... waiting ping -c 1 -w 3 client-2, 875 secs left ... waiting ping -c 1 -w 3 client-2, 870 secs left ... waiting ping -c 1 -w 3 client-2, 865 secs left ... waiting ping -c 1 -w 3 client-2, 860 secs left ... waiting ping -c 1 -w 3 client-2, 855 secs left ... waiting ping -c 1 -w 3 client-2, 850 secs left ... waiting ping -c 1 -w 3 client-2, 845 secs left ... waiting ping -c 1 -w 3 client-2, 840 secs left ... waiting ping -c 1 -w 3 client-2, 835 secs left ... waiting ping -c 1 -w 3 client-2, 830 secs left ... waiting ping -c 1 -w 3 client-2, 825 secs left ... waiting ping -c 1 -w 3 client-2, 820 secs left ...
Attachments
Issue Links
- is related to
-
LU-893 system hang when running recovery-mds-scale FLAVOR=OSS
-
- Resolved
-
- Trackbacks
-
Lustre 2.2.0 mini release testing tracker
Lustre 2.2.0 Mini Release Tag: 2.1.52.0 Build: https://newbuild.whamcloud....
-
Lustre 2.2.0 release testing tracker
Lustre 2.2.0 RC1 Tag: 2.2.0RC1 Build: https://build.whamcloud.com/job/lustreb22/11/ Google doc: https://docs.google.com/a/whamcloud.com/spreadsheet/ccc?key=0AkK5hBTd2cvHdDFsSWt2RlBocE5kdi03OUYtX21ZYkE#gid=3 Lustre 2.2....