Details
-
Task
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
9223372036854775807
Description
There could be some simple changes made to reduce individual subtest time and unmount/mount/format times that would help speed up every test session .
Individual sanity subtests that are not doing more than "touch file; check if file exists" currently take 6-7 seconds because they are doing a lot of different things in the background in run_one() with multiple "do_nodes" commands:
- reset fail_loc
- check if the network is working on every node (kind of pointless given that other commands are being run on the nodes before and after this check)
- check grant correctness
- check dmesg for VFS inodes busy
- check for LBUG
- check for multiop still running
When sanity was first written, these subtests took a fraction of a second each (i.e. they would scroll quickly up the screen). While I think the above checks are useful, the overhead could be reduced.
I think the large part of this slowness is that each of these checks runs as a separate ssh/mcmd command, to each remote VM in series, and each ssh invocation is relatively slow.
Speeding up the ssh invocation itself (via do_facet()/do_node()) would of course be desirable, but is not something I can control directly.
Running the per-node checks in parallel would be a win (e.g. use real "pdsh" or "clush"), as would combining all of the checks into a single command that is run with a single ssh invocation to each node. The latter is something that can be done directly in test-framework, and is the main target of this ticket.
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44034/
Subject: LU-14773 tests: quiet down some verbose messages
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 86f16910645d9d9cad17c0f53ca1a375121e3f4c