Details
-
Task
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
9223372036854775807
Description
There could be some simple changes made to reduce individual subtest time and unmount/mount/format times that would help speed up every test session .
Individual sanity subtests that are not doing more than "touch file; check if file exists" currently take 6-7 seconds because they are doing a lot of different things in the background in run_one() with multiple "do_nodes" commands:
- reset fail_loc
- check if the network is working on every node (kind of pointless given that other commands are being run on the nodes before and after this check)
- check grant correctness
- check dmesg for VFS inodes busy
- check for LBUG
- check for multiop still running
When sanity was first written, these subtests took a fraction of a second each (i.e. they would scroll quickly up the screen). While I think the above checks are useful, the overhead could be reduced.
I think the large part of this slowness is that each of these checks runs as a separate ssh/mcmd command, to each remote VM in series, and each ssh invocation is relatively slow.
Speeding up the ssh invocation itself (via do_facet()/do_node()) would of course be desirable, but is not something I can control directly.
Running the per-node checks in parallel would be a win (e.g. use real "pdsh" or "clush"), as would combining all of the checks into a single command that is run with a single ssh invocation to each node. The latter is something that can be done directly in test-framework, and is the main target of this ticket.