[LU-14772] split conf-sanity into 2 or 3 parts Created: 18/Jun/21  Updated: 12/Sep/23

Status: In Progress
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major
Reporter: Andreas Dilger Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: tests

Issue Links:
Related
is related to LU-9201 reduce llmount.sh startup time Resolved
is related to LU-11820 reduce conf-sanity test duration Resolved
is related to LU-15898 Move sanity/802a and sanity/115 to co... Resolved
is related to LU-16280 Make conf-sanity/117 call setup() onl... Resolved
is related to LU-11643 create disk images for Lustre 2.10 an... Resolved
is related to LU-14773 reduce run_one() overhead Open
is related to LU-9798 split recovery-mds-scale into two tes... In Progress
is related to LU-14853 conf-sanity: create ldiskfs and zfs f... Open
Rank (Obsolete): 9223372036854775807

 Description   

The current conf-sanity test (the entire review-dne-part-3 test session) completes in about 10h45m, compared to the other test sessions that finish in 4h or less according to Maloo statistics.

It would be useful to split conf-sanity into two or three separate test scripts, or at least separate test runs, so that they can be tested in parallel, and retested independently to reduce the impact of a subtest failure. I think it would be prudent to keep the existing subtest numbers, to facilitate mapping test results/failures between the old and new scripts.

A large fraction of the test time of conf-sanity is reformatting and remounting the filesystem. This is often done unnecessarily - at the start of a test to change configuration parameters for that test, and then again at the end to "restore" the configuration to the default. It makes sense to split subtests into (at least) two categories - those that expect the filesystem to be unmounted at the start and do their own custom formatting (leaving the filesystem unmounted at the end and in an unk), and those that expect the filesystem to be mounted and can work with the existing filesystem configuration.

Avoiding spurious reformat/remount itself could itself reduce the test duration significantly, since the average conf-sanity subtest time is 220 seconds, while the average sanity subtest time is 24 seconds (many only 6 seconds long).

A third category of subtests would be the "old version upgrade" test_32[abcde], which themselves take 2700s, and will continue to grow as upgrade images for new releases are added (LU-11643, LU-14853).

A fourth category of subtests could be those that "need a separate MGS", since these are always skipped in our regular testing. While this is only 8 of 184 subtests, having it as a separate test session would allow it to run with a different configuration and improve test coverage.



 Comments   
Comment by Andreas Dilger [ 19/Jun/21 ]

James, we previously discussed this, but I couldn't find a ticket describing what needed to be done.

Generated at Sat Feb 10 03:12:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.