[LU-9931] recovery-*-scale REQFAIL calculation defect Created: 30/Aug/17 Updated: 05/Oct/17 Resolved: 30/Sep/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0 |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Elena Gryaznova | Assignee: | James Nunez (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
REQFAIL is the number of times that a sleep is allowed to be "DURATION / SERVER_FAILOVER_PERIOD * REQFAIL_PERCENT / 100"
may not be an integer (165.6) and test fails with : "Failed to load with for a minimum
period of 166 times ( REQFAIL=165 )".
The example of test failure : ==== Checking the clients loads AFTER failed client reintegrated -- failure NOT OK WARNING: failover, client reintegration and check_client_loads time exceeded SERVER_FAILOVER_PERIOD - MINSLEEP! Failed to load the filesystem with I/O for a minimum period of 120 166 times ( REQFAIL=165 ). This iteration, the load was only applied for sleep=63 seconds. Estimated max recovery time : 1475 Probably the hardware is taking excessively long time to boot. Try to increase SERVER_FAILOVER_PERIOD (current is 300), bug 20918 2017-06-06 20:08:31 Terminating clients loads ... Duration: 49680 Server failover period: 300 seconds Exited after: 49810 seconds Number of failovers before exit: mds1 failed over 166 times Status: FAIL: rc=6 |
| Comments |
| Comment by Gerrit Updater [ 30/Aug/17 ] |
|
Elena Gryaznova (elena.gryaznova@seagate.com) uploaded a new patch: https://review.whamcloud.com/28797 |
| Comment by Gerrit Updater [ 30/Sep/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28797/ |
| Comment by Peter Jones [ 30/Sep/17 ] |
|
Landed for 2.11 |