Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9931

recovery-*-scale REQFAIL calculation defect

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.11.0
    • Lustre 2.10.0
    • 3
    • 9223372036854775807

    Description

      REQFAIL is the number of times that a sleep is allowed to be
      less than $MINSLEEP before the test is considered a fail.
      The result of

        "DURATION / SERVER_FAILOVER_PERIOD * REQFAIL_PERCENT / 100"
      

      may not be an integer (165.6) and test fails with :

        "Failed to load with for a minimum
        period of 166 times ( REQFAIL=165 )".
      

      The example of test failure :

      ==== Checking the clients loads AFTER failed client reintegrated -- failure NOT OK
      WARNING: failover, client reintegration and check_client_loads time exceeded SERVER_FAILOVER_PERIOD - MINSLEEP!
      Failed to load the filesystem with I/O for a minimum period of 120 166 times ( REQFAIL=165 ).
      This iteration, the load was only applied for sleep=63 seconds.
      Estimated max recovery time : 1475
      Probably the hardware is taking excessively long time to boot.
      Try to increase SERVER_FAILOVER_PERIOD (current is 300), bug 20918
      2017-06-06 20:08:31 Terminating clients loads ...
      Duration:               49680
      Server failover period: 300 seconds
      Exited after:           49810 seconds
      Number of failovers before exit:
      mds1 failed over 166 times
      Status: FAIL: rc=6
      

      Attachments

        Activity

          People

            jamesanunez James Nunez (Inactive)
            egryaznova Elena Gryaznova
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: