Reduce the run time of the failover test group (LU-11841)

[LU-9798] split recovery-mds-scale into two test sets: recovery-mds-scale and recovery-ost-scale Created: 25/Jul/17  Updated: 14/Jun/23

Status: In Progress
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.1, Lustre 2.11.0
Fix Version/s: None

Type: Technical task Priority: Minor
Reporter: James Casper Assignee: Alex Deiter
Resolution: Unresolved Votes: 0
Labels: None
Environment:

any config that is tested with recovery-mds-scale


Issue Links:
Related
is related to LU-14772 split conf-sanity into 2 or 3 parts In Progress
Project: Test Infrastructure
Rank (Obsolete): 9223372036854775807

 Description   

We would like to have the MDS and OST failover tests that are run by recovery-mds-scale to be run by two separate test sets:

recovery-mds-scale
recovery-ost-scale

This was successfully tested with patches 28155 and 28195.

With 28155, the following lines in recovery-mds-scale were removed:
239 test_failover_ost()

{ 240 # failover a random OST 241 failover_target OST 242 }

243 run_test failover_ost "failover OST"
244

With 28195, the following lines in recovery-mds-scale were removed:
233 test_failover_mds()

{ 234 # failover a random MDS 235 failover_target MDS 236 }

237 run_test failover_mds "failover MDS"
238



 Comments   
Comment by Gerrit Updater [ 25/Jul/17 ]

James Casper (jamesx.casper@intel.com) uploaded a new patch: https://review.whamcloud.com/28208
Subject: LU-9798 test: Separate server recovery tests
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a946b5357b92164e259d1b40e56bad98722a92a2

Comment by Gerrit Updater [ 25/Jul/17 ]

James Casper (jamesx.casper@intel.com) uploaded a new patch: https://review.whamcloud.com/28209
Subject: LU-9798 test: Separate server recovery tests
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3076d395562427b637af5f360e2b65fd09f19eb8

Comment by James Casper [ 25/Jul/17 ]

Patches 28208 and 28209 can be ignored. They are each missing a file. Another patch will be submitted that has both test sets (recovery-mds-scale (modified) AND recovery-ost-scale (added)).

Comment by Gerrit Updater [ 25/Jul/17 ]

James Casper (jamesx.casper@intel.com) uploaded a new patch: https://review.whamcloud.com/28211
Subject: LU-9798 test: Separate server recovery tests
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 67f5b9097687049b8f7c4579f134c7de904abbe7

Comment by Gerrit Updater [ 27/Jul/17 ]

James Casper (jamesx.casper@intel.com) uploaded a new patch: https://review.whamcloud.com/28260
Subject: LU-9798 test: Separate server recovery tests
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 23823b2b130d777489effbeae05c6cdf43f9eb64

Comment by Gerrit Updater [ 12/Oct/17 ]

James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/29586
Subject: LU-9798 tests: separate server recovery tests
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c98d3ac44798ae77507b13634b4c636ccaae02fe

Comment by Gerrit Updater [ 09/Jan/19 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33998
Subject: LU-9798 tests: create OSS recovery script
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 008484e1804a92f6fcdb9ba73fbfe93b1d8f92a9

Comment by James Nunez (Inactive) [ 09/Jan/19 ]

Patch https://review.whamcloud.com/33998 creates recovery-oss-scale.sh. When this lands, we can make use of https://review.whamcloud.com/29586 to create a library to remove all the duplicate code.

Comment by Gerrit Updater [ 26/Apr/21 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43461
Subject: LU-9798 tests: split recovery-mds-scale
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9b2f533eca1a204cb69a5f4d9fdbcead3526e4aa

Comment by Minh Diep [ 26/Jan/23 ]

Deiter could you please continue the work to rebase, test and finish?

Comment by Andreas Dilger [ 10/May/23 ]

The other test that should be split is conf-sanity to reduce the 10-12h test time.

Comment by Alex Deiter [ 07/Jun/23 ]

Hello mdiep,

Done - please review https://review.whamcloud.com/c/fs/lustre-release/+/43461

Thank you!

Comment by Gerrit Updater [ 14/Jun/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/43461/
Subject: LU-9798 tests: split server recovery tests
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2c12b93ccb5768af99e943eac9e923d39146408f

Generated at Sat Feb 10 02:29:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.