Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-885

recovery-mds-scale (FLAVOR=mds) fail, network is not avaliable

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.3.0
    • None
    • lustre-master build #353 RHEL6-x8_64 for both server and client
    • 3
    • 10215

    Description

      Running recovery-mds-scale FLAVOR=mds for about 2 hours(MDS fail over 14 times), network is not available for standby MDS server and it cannot be access after that even doing power cycle. I got this similar issue twice.

       
      ==== Checking the clients loads AFTER  failover -- failure NOT OK
      mds1 has failed over 14 times, and counting...
      sleeping 501 seconds ... 
      ==== Checking the clients loads BEFORE failover -- failure NOT OK     ELAPSED=7904 DURATION=86400 PERIOD=600
      Wait mds1 recovery complete before doing next failover ....
      affected facets: mds1
      client-6: *.lustre-MDT0000.recovery_status status: COMPLETE
      Checking clients are in FULL state before doing next failover
      client-18: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
      client-12: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
      client-18: cannot run remote command on client-12,client-13,client-17,client-18 with 
      client-12: cannot run remote command on client-12,client-13,client-17,client-18 with 
      client-17: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
      client-17: cannot run remote command on client-12,client-13,client-17,client-18 with 
      client-13: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
      client-13: cannot run remote command on client-12,client-13,client-17,client-18 with 
      Starting failover on mds1
      Failing mds1 on node client-6
      + pm -h powerman --off client-6
      Command completed successfully
      affected facets: mds1
      + pm -h powerman --on client-6
      Command completed successfully
      Failover mds1 to client-2
      15:35:30 (1322609730) waiting for client-2 network 900 secs ...
      waiting ping -c 1 -w 3 client-2, 895 secs left ...
      waiting ping -c 1 -w 3 client-2, 890 secs left ...
      waiting ping -c 1 -w 3 client-2, 885 secs left ...
      waiting ping -c 1 -w 3 client-2, 880 secs left ...
      waiting ping -c 1 -w 3 client-2, 875 secs left ...
      waiting ping -c 1 -w 3 client-2, 870 secs left ...
      waiting ping -c 1 -w 3 client-2, 865 secs left ...
      waiting ping -c 1 -w 3 client-2, 860 secs left ...
      waiting ping -c 1 -w 3 client-2, 855 secs left ...
      waiting ping -c 1 -w 3 client-2, 850 secs left ...
      waiting ping -c 1 -w 3 client-2, 845 secs left ...
      waiting ping -c 1 -w 3 client-2, 840 secs left ...
      waiting ping -c 1 -w 3 client-2, 835 secs left ...
      waiting ping -c 1 -w 3 client-2, 830 secs left ...
      waiting ping -c 1 -w 3 client-2, 825 secs left ...
      waiting ping -c 1 -w 3 client-2, 820 secs left ...
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              sarah Sarah Liu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: