[LU-9931] recovery-*-scale REQFAIL calculation defect Created: 30/Aug/17  Updated: 05/Oct/17  Resolved: 30/Sep/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: Lustre 2.11.0

Type: Bug Priority: Minor
Reporter: Elena Gryaznova Assignee: James Nunez (Inactive)
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Duplicate
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

REQFAIL is the number of times that a sleep is allowed to be
less than $MINSLEEP before the test is considered a fail.
The result of

  "DURATION / SERVER_FAILOVER_PERIOD * REQFAIL_PERCENT / 100"

may not be an integer (165.6) and test fails with :

  "Failed to load with for a minimum
  period of 166 times ( REQFAIL=165 )".

The example of test failure :

==== Checking the clients loads AFTER failed client reintegrated -- failure NOT OK
WARNING: failover, client reintegration and check_client_loads time exceeded SERVER_FAILOVER_PERIOD - MINSLEEP!
Failed to load the filesystem with I/O for a minimum period of 120 166 times ( REQFAIL=165 ).
This iteration, the load was only applied for sleep=63 seconds.
Estimated max recovery time : 1475
Probably the hardware is taking excessively long time to boot.
Try to increase SERVER_FAILOVER_PERIOD (current is 300), bug 20918
2017-06-06 20:08:31 Terminating clients loads ...
Duration:               49680
Server failover period: 300 seconds
Exited after:           49810 seconds
Number of failovers before exit:
mds1 failed over 166 times
Status: FAIL: rc=6


 Comments   
Comment by Gerrit Updater [ 30/Aug/17 ]

Elena Gryaznova (elena.gryaznova@seagate.com) uploaded a new patch: https://review.whamcloud.com/28797
Subject: LU-9931 tests: : fix REQFAIL calculation
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5e570b1f3689f110ae31116cfe7233be85a858ff

Comment by Gerrit Updater [ 30/Sep/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28797/
Subject: LU-9931 tests: : fix REQFAIL calculation
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c2664dcff4ae3ee41cc8a1e542465d12c5e764c2

Comment by Peter Jones [ 30/Sep/17 ]

Landed for 2.11

Generated at Sat Feb 10 02:30:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.