[LU-14773] reduce run_one() overhead - Whamcloud Community JIRA

Details

Type: Task
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
- easy
- test_script_improvements

Rank (Obsolete):
9223372036854775807

Description

There could be some simple changes made to reduce individual subtest time and unmount/mount/format times that would help speed up every test session .

Individual sanity subtests that are not doing more than "touch file; check if file exists" currently take 6-7 seconds because they are doing a lot of different things in the background in run_one() with multiple "do_nodes" commands:

reset fail_loc
check if the network is working on every node (kind of pointless given that other commands are being run on the nodes before and after this check)
check grant correctness
check dmesg for VFS inodes busy
check for LBUG
check for multiop still running

When sanity was first written, these subtests took a fraction of a second each (i.e. they would scroll quickly up the screen). While I think the above checks are useful, the overhead could be reduced.

I think the large part of this slowness is that each of these checks runs as a separate ssh/mcmd command, to each remote VM in series, and each ssh invocation is relatively slow.

Speeding up the ssh invocation itself (via do_facet()/do_node()) would of course be desirable, but is not something I can control directly.

Running the per-node checks in parallel would be a win (e.g. use real "pdsh" or "clush"), as would combining all of the checks into a single command that is run with a single ssh invocation to each node. The latter is something that can be done directly in test-framework, and is the main target of this ticket.

Attachments

Issue Links

is related to

LU-14936 sanity test_140 returned 1

Open

is related to

LU-14772 split conf-sanity into 2 or 3 parts

In Progress

Activity

[LU-14773] reduce run_one() overhead

Gerrit Updater added a comment - 25/Aug/21 6:23 AM

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44034/
Subject: LU-14773 tests: quiet down some verbose messages
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 86f16910645d9d9cad17c0f53ca1a375121e3f4c

Gerrit Updater added a comment - 25/Aug/21 6:23 AM "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44034/ Subject: LU-14773 tests: quiet down some verbose messages Project: fs/lustre-release Branch: master Current Patch Set: Commit: 86f16910645d9d9cad17c0f53ca1a375121e3f4c

Gerrit Updater added a comment - 18/Aug/21 1:59 AM

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44033/
Subject: LU-14773 tests: skip check_network() on working node
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 67752f6db2c1a7062a73bd6674ee53ad670b392e

Gerrit Updater added a comment - 18/Aug/21 1:59 AM "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44033/ Subject: LU-14773 tests: skip check_network() on working node Project: fs/lustre-release Branch: master Current Patch Set: Commit: 67752f6db2c1a7062a73bd6674ee53ad670b392e

Gerrit Updater added a comment - 18/Jun/21 9:37 PM

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/44034
Subject: LU-14773 tests: quiet down some verbose messages
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f23adfd71ee46dfbbf18b8b544ad311c96468fd3

Gerrit Updater added a comment - 18/Jun/21 9:37 PM Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/44034 Subject: LU-14773 tests: quiet down some verbose messages Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f23adfd71ee46dfbbf18b8b544ad311c96468fd3

Andreas Dilger added a comment - 18/Jun/21 9:02 PM

Note that the 44033 patch is NOT the only thing that should be fixed, but is a simple patch that may produce immediate benefits (at a minimum it will avoid a lot of useless visual clutter in the subtest logs from the check_network() output).

Andreas Dilger added a comment - 18/Jun/21 9:02 PM Note that the 44033 patch is NOT the only thing that should be fixed, but is a simple patch that may produce immediate benefits (at a minimum it will avoid a lot of useless visual clutter in the subtest logs from the check_network() output).

Gerrit Updater added a comment - 18/Jun/21 9:00 PM

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/44033
Subject: LU-14773 tests: skip check_network() on working node
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d085468b6bc06c4bbdfcfbc26afa23d4b752aa64

Gerrit Updater added a comment - 18/Jun/21 9:00 PM Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/44033 Subject: LU-14773 tests: skip check_network() on working node Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d085468b6bc06c4bbdfcfbc26afa23d4b752aa64

reduce run_one() overhead

Details

Description

Attachments

Issue Links

Activity

People

Dates